Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.7] subprocess.call fails with unicode strings in command line #45239

Closed
mclausch mannequin opened this issue Jul 24, 2007 · 16 comments
Closed

[2.7] subprocess.call fails with unicode strings in command line #45239

mclausch mannequin opened this issue Jul 24, 2007 · 16 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@mclausch
Copy link
Mannequin

mclausch mannequin commented Jul 24, 2007

BPO 1759845
Nosy @terryjreedy, @amauryfa, @tjguk, @Safihre
Files
  • CreateProcessW.patch
  • Python-2.5.2-subprocess.patch: Alternate Python-only patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/tjguk'
    closed_at = <Date 2010-08-04.02:48:10.483>
    created_at = <Date 2007-07-24.18:24:11.000>
    labels = ['type-feature', 'library']
    title = '[2.7] subprocess.call fails with unicode strings in command line'
    updated_at = <Date 2017-10-04.10:00:39.706>
    user = 'https://bugs.python.org/mclausch'

    bugs.python.org fields:

    activity = <Date 2017-10-04.10:00:39.706>
    actor = 'vstinner'
    assignee = 'tim.golden'
    closed = True
    closed_date = <Date 2010-08-04.02:48:10.483>
    closer = 'terry.reedy'
    components = ['Library (Lib)']
    creation = <Date 2007-07-24.18:24:11.000>
    creator = 'mclausch'
    dependencies = []
    files = ['9580', '11674']
    hgrepos = []
    issue_num = 1759845
    keywords = ['patch']
    message_count = 16.0
    messages = ['32546', '32547', '32548', '63176', '74142', '87566', '87580', '87597', '87605', '112739', '112767', '112825', '112835', '112854', '113288', '303677']
    nosy_count = 12.0
    nosy_names = ['terry.reedy', 'amaury.forgeotdarc', 'gregcouch', 'andersjm', 'ocean-city', 'mclausch', 'brotch', 'tim.golden', 'kcwu', 'jnoller', 'xianyiteng', 'Safihre']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue1759845'
    versions = ['Python 2.7']

    @mclausch
    Copy link
    Mannequin Author

    mclausch mannequin commented Jul 24, 2007

    On Windows, subprocess.call() fails with an exception if either the executable or any of the arguments contain upper level characters. See below:

    >> cmd = [ u'test_\xc5_exec.bat', u'arg1', u'arg2' ]
    >> subprocess.call(cmd)

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Python25\lib\subprocess.py", line 443, in call
        return Popen(*popenargs, **kwargs).wait()
      File "C:\Python25\lib\subprocess.py", line 593, in __init__
        errread, errwrite)
      File "C:\Python25\lib\subprocess.py", line 815, in _execute_child
        startupinfo)
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in position 5: ordinal not in range(128)

    @mclausch mclausch mannequin added stdlib Python modules in the Lib dir labels Jul 24, 2007
    @brotch
    Copy link
    Mannequin

    brotch mannequin commented Aug 5, 2007

    Python's default character coding is 'ascii' which can't convert unicode > 127 into chars.

    Forcing the unicode string to encode as 'iso-8859-1'

    eg.
    subprocess.call(cmd.encode('iso-8859-1'))

    resolves the problem and runs the correct command.

    @mclausch
    Copy link
    Mannequin Author

    mclausch mannequin commented Aug 20, 2007

    Sorry, I should have been more specific. I'm looking for a general solution, not just one for characters in iso-8859-1. For instance, I need to execute a subprocess where the executable or the arguments may contain Japanese characters.

    So another example would be:
    cmd = [ u'test_\u65e5\u672c\u8a9e_exec.bat', u'arg1', u'arg2' ]
    subprocess.call(cmd)

    @ocean-city
    Copy link
    Mannequin

    ocean-city mannequin commented Mar 2, 2008

    I tried to fix this problem using CreateProcessW.
    (environment variables are still ANSI)

    I don't know Python C API well, maybe I'm doing
    something wrong. (I confirmed test_subprocess.py
    passes)

    @gregcouch
    Copy link
    Mannequin

    gregcouch mannequin commented Oct 1, 2008

    We're having the same problem. My quick fix was to patch subprocess.py
    so the command line and executable are converted to the filesystem
    encoding (mbcs).

    @kcwu
    Copy link
    Mannequin

    kcwu mannequin commented May 11, 2009

    ocrean-city's patch applied cleanly with trunk and it works for me.
    Could anybody review and commit? I could help if any refinement required.

    @amauryfa
    Copy link
    Member

    The first patch will introduce regressions for strings that cannot be
    decoded with the filesystem encoding. It is necessary to provide a
    fallback to the CreateProcessA function.

    I'd prefer the python-only patch, except for the "sys=sys" argument to
    the function. Is it really needed?

    @gregcouch
    Copy link
    Mannequin

    gregcouch mannequin commented May 12, 2009

    I like the C patch better. It only tries to decode non-unicode objects
    with the filesystem (mbcs) encoding. This fits in with Python 3.0
    perfectly where all strings are unicode. In 2.5, strings are assumed to
    be in the mbcs encoding, to match the Windows ANSI API, so decoding
    those with the mbcs encoding shouldn't alter the set of acceptable
    strings (which is what the C patch is doing if I read the code correctly).

    @kcwu
    Copy link
    Mannequin

    kcwu mannequin commented May 12, 2009

    There is slight difference between C and python patch.
    C version: convert mbcs argument to unicode
    py version: convert unicode argument to mbcs

    Actually, python version patch may not work if the string is unicode and
    cannot encoded by mbcs. For example, my windows system is Chinese
    (cp950) and the program I want to execute contains Japanese characters.
    Encode Japanese characters with mbcs (in this case, it is cp950) will
    fail. This is also what Matt (mclausch) said.

    On the other hand, the C version patch. I don't think fall-back is
    necessary. If the string is failed to convert from mbcs to unicode, it
    will be eventually failed inside CreateProcessA() because CreateProcessA
    internally (after win2k) will try to convert from mbcs to unicode and
    call CreateProcessW.

    @terryjreedy
    Copy link
    Member

    I fail to see why subprocess.call(cmd.encode('whatever')) is not a general solution. Auto-encoding strikes me as wrong. Someone who wants that should write their own wrapper. In any case, 2.7 is out and closed to new features, while 3.x fixes this and numerous other unicode issues.

    @terryjreedy terryjreedy added the type-feature A feature request or enhancement label Aug 4, 2010
    @terryjreedy terryjreedy added the type-feature A feature request or enhancement label Aug 4, 2010
    @kcwu
    Copy link
    Mannequin

    kcwu mannequin commented Aug 4, 2010

    I fail to see why subprocess.call(cmd.encode('whatever')) is not a general solution.
    Because 'whatever' encoding doesn't exist.

    Assume cmd contains Japanese characters and my system is Chinese windows. subprocess.call expect the argument is encoded in mbcs, which is cp950. However, cp950 encoding doesn't contain Japanese characters.

    subprocess.call(cmd.encode('cp950')) will fail because cp950 doesn't contain Japanese characters.
    subprocess.call(cmd.encode('cp932')) will fail because subprocess.call will decode fail or incorrectly.

    @terryjreedy
    Copy link
    Member

    Thanks for the simple explanation.

    @gregcouch
    Copy link
    Mannequin

    gregcouch mannequin commented Aug 4, 2010

    So Terry, can you reopen this bug then? It's not out of date.

    @terryjreedy
    Copy link
    Member

    I will not reopen this now for the reasons I already stated after "In any case ...". To expand on that.

    1. 2.7 is in maintenance (bug-fix only) mode and I view this a feature request. To persuade someone otherwise, quote some doc that clearly says subprocess should behave as requested. I nosy-ed Jesse Noller so he can contradict me if he wishes.

    2. The underlying issue seems to be the use of limited encodings, which was and is being fixed as well as possible in 3.x. Since there has been no mention of this issue being a problem with subprocess in 3.1, I presume there is none. If there is, say so and I will reopen.

    The discussion shows disagreement on both the goal and approach to change. I am dubious that there will be an acceptable general solution. Even if this is persuasively seen as a bug and there is a good patch, I am dubious that any of the current developers will want to spent the necessary time to properly review a workaround to an issue that was already fixed the right way in 3.x.

    @tjguk
    Copy link
    Member

    tjguk commented Aug 8, 2010

    To confirm the situation on 3.x: a unicode string with non-ascii-encodable characters is fine. The easy test here in the uk is a pound sign:

    <code>
    import subprocess

    FILENAME = "abc£.bat"
    FILENAME.encode ("ascii")
    #
    # UnicodeEncodeError
    #
    with open (FILENAME, "w") as f:
      f.write ("echo hello\n")

    subprocess.call ([FILENAME])

    # "hello" output as expected

    </code>

    So no action for 3.x. I'm sympathetic (in principle) to making a change to 2.7 but I haven't looked over the "competing" patches and assessed the ins-and-outs.

    @Safihre
    Copy link
    Mannequin

    Safihre mannequin commented Oct 4, 2017

    Although this issue is very old, in case anyone else like us need this functionality I created a package that implements the proposed C-fix.
    https://pypi.python.org/pypi/subprocessww
    Simply "import subprocessww" and POpen is patched. We tested it and it does the job pretty well, haven't run into special situations yet.

    We really want to upgrade our app to Python 3, but currently lack the manpower to go over our app line by line. It's not a simple 2to3 conversion, unfortunately.

    @vstinner vstinner changed the title subprocess.call fails with unicode strings in command line [2.7] subprocess.call fails with unicode strings in command line Oct 4, 2017
    @vstinner vstinner changed the title subprocess.call fails with unicode strings in command line [2.7] subprocess.call fails with unicode strings in command line Oct 4, 2017
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants