Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError on OSError on Windows with undecodable (bytes) filename #59683

Closed
vstinner opened this issue Jul 28, 2012 · 19 comments
Closed
Labels
OS-windows topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

BPO 15478
Nosy @loewis, @atsuoishimoto, @vstinner, @tjguk, @ezio-melotti, @skrah, @florentx, @serhiy-storchaka
Files
  • oserror_filename.patch
  • oserror_filename_windows.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-11-13.21:17:42.798>
    created_at = <Date 2012-07-28.12:49:04.917>
    labels = ['type-bug', 'expert-unicode', 'OS-windows']
    title = 'UnicodeDecodeError on OSError on Windows with undecodable (bytes) filename'
    updated_at = <Date 2012-11-13.21:17:42.798>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2012-11-13.21:17:42.798>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-11-13.21:17:42.798>
    closer = 'vstinner'
    components = ['Unicode', 'Windows']
    creation = <Date 2012-07-28.12:49:04.917>
    creator = 'vstinner'
    dependencies = []
    files = ['26675', '27789']
    hgrepos = []
    issue_num = 15478
    keywords = ['patch']
    message_count = 19.0
    messages = ['166652', '166654', '166777', '167314', '174172', '174175', '174178', '174192', '174245', '174248', '174373', '174375', '174377', '174378', '174536', '174537', '174850', '174852', '175490']
    nosy_count = 10.0
    nosy_names = ['loewis', 'ishimoto', 'vstinner', 'tim.golden', 'ezio.melotti', 'skrah', 'flox', 'python-dev', 'sbt', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue15478'
    versions = ['Python 3.4']

    @vstinner
    Copy link
    Member Author

    On Windows, if an OS error fails, the filename type is bytes and the filename cannot be decoded: Python raises an UnicodeDecodeError instead of an OSError. The problem is that Python decodes the filename to fill OSError.filename field. See the issue bpo-15441 for the initial report.

    There are different options to solve this issue:

    • always keep the filename parameter unchanged, so OSError.filename can be a str or a bytes string, depending on the input parameter
    • try to decode the filename from the filesystem encoding, or keep the filename unchanged: OSError.filename is only a bytes string if the filename cannot be decoded
    • don't fill OSError.filename (= None) if the filename cannot be decoded
    • use "surrogateescape", "replace" or "backslashreplace" error handler to decode the filename

    This issue is specific to Windows: on other plaforms, the filename is decoded using the "surrogateescape" error handler and so decoding the filename cannot fail.

    I don't know if OSError.filename is only used to display more information to the user, or if it is used to do another operation on the file (ex: os.chmod).

    I like solutions keeping the filename unchanged, because it does not loose information, and the user can decide how to handle the undecodable filename.

    I don't like the option trying to decode the filename or keeping it unchanged it decoding fails, because applications will work in most cases, but "crash" when someone comes with an unusual code page, a special USB key, or a filename with a non-ASCII character.

    So the best option is maybe to always keep the bytes filename unchanged.

    Such change cannot be done anymore in Python 3.3, it's too late to test it correctly.

    @vstinner vstinner changed the title UnicodeDecodeError on OSError UnicodeDecodeError on OSError on Windows with undecodable (bytes) filename Jul 28, 2012
    @vstinner
    Copy link
    Member Author

    In Python 2, it looks like open(arg) does pass its filename argument unchanged to OSError constructor (so it can be bytes or unicode). OSError.filename is always bytes for os.chdir() on UNIX, but OSError.filename can be bytes or unicode for os.chdir() on Windows.

    @atsuoishimoto
    Copy link
    Mannequin

    atsuoishimoto mannequin commented Jul 29, 2012

    +1 for keeping the file name unchanged. This solution is not very
    compatible with prior versions, but simple and least-surprise.

    I prefer other platforms than Windows to use same method to build OSError.

    @vstinner
    Copy link
    Member Author

    vstinner commented Aug 3, 2012

    Attached patch modifies all functions of the os module taking filenames to keep the filename unmodified in OSError.filename.

    The patch changes also os.link(), os.rename() and os.replace() to use the source, not the destination, in the error message. It is maybe a mistake because these functions can also fail in the directory of the destination does not exist.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 30, 2012

    New changeset 67d69f943b7f by Victor Stinner in branch 'default':
    Issue bpo-15478: Raising an OSError doesn't decode or encode the filename anymore
    http://hg.python.org/cpython/rev/67d69f943b7f

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 30, 2012

    New changeset 27a3b19ee792 by Victor Stinner in branch 'default':
    Issue bpo-15478: Fix compilation on Windows
    http://hg.python.org/cpython/rev/27a3b19ee792

    @vstinner
    Copy link
    Member Author

    The commit is incomplete, there are some remaining functions that need to be patched: here is a new (untested) patch for more Windows functions.

    @serhiy-storchaka
    Copy link
    Member

    See also bpo-16074.

    The patch changes also os.link(), os.rename() and os.replace() to use the source, not the destination, in the error message. It is maybe a mistake because these functions can also fail in the directory of the destination does not exist.

    Yes, in different cases it can be the source, the destination, both, unknown or none of them.

    @serhiy-storchaka serhiy-storchaka added the type-bug An unexpected behavior, bug, or error label Oct 30, 2012
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 31, 2012

    New changeset 01cc9fb52887 by Victor Stinner in branch 'default':
    Issue bpo-15478: Fix test_os on Windows (os.chown is missing)
    http://hg.python.org/cpython/rev/01cc9fb52887

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 31, 2012

    New changeset ef87bd0797de by Victor Stinner in branch 'default':
    Issue bpo-15478: Fix test_os on FreeBSD
    http://hg.python.org/cpython/rev/ef87bd0797de

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 31, 2012

    New changeset 13ebaa36d87d by Victor Stinner in branch 'default':
    Issue bpo-15478: Use path_error() in more posix functions, especially in Windows
    http://hg.python.org/cpython/rev/13ebaa36d87d

    New changeset 9f696742dbda by Victor Stinner in branch 'default':
    Issue bpo-15478: Fix again to fix test_os on Windows
    http://hg.python.org/cpython/rev/9f696742dbda

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 31, 2012

    New changeset 6903f5214e99 by Victor Stinner in branch 'default':
    Issue bpo-15478: Use source filename in OSError, not destination filename
    http://hg.python.org/cpython/rev/6903f5214e99

    @vstinner
    Copy link
    Member Author

    All issues should now be fixed.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 31, 2012

    New changeset b3434c1ae503 by Victor Stinner in branch 'default':
    Issue bpo-15441, bpo-15478: Reenable test_nonascii_abspath() on Windows
    http://hg.python.org/cpython/rev/b3434c1ae503

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Nov 2, 2012

    One of 13ebaa36d87d, 9f696742dbda or 6903f5214e99 causes test failures in test_pep277:

    ======================================================================
    FAIL: test_failures (test.test_pep277.UnicodeFileTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pep277.py", line 120, in test_failures
        self._apply_failure(os.listdir, name)
      File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pep277.py", line 105, in _apply_failure
        self.assertEqual(wildcard, '*.*')
    AssertionError: '7_\u05d4\u05e9\u05e7\u05e6\u05e5\u05e1' != '*.*'
    - 7_\u05d4\u05e9\u05e7\u05e6\u05e5\u05e1
    + *.*

    @skrah skrah mannequin reopened this Nov 2, 2012
    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Nov 2, 2012

    Additionally, some of the changes cause a failure in test_subprocess:

    ======================================================================                                            
    ERROR: test_no_leaking (test.test_subprocess.ProcessTestCase)                                                     
    ----------------------------------------------------------------------                                            
    Traceback (most recent call last):                                                                                
      File "C:\Users\stefan\pydev\cpython\lib\test\test_subprocess.py", line 823, in test_no_leaking                  
        handles.append(os.open(tmpfile, os.O_WRONLY|os.O_CREAT))                                                      
    FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'c:\\users\\stefan\\appdata\\l
    ocal\\temp\\tmpa41o4x\\@test_2236_tmp'

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 5, 2012

    New changeset 817a90752470 by Victor Stinner in branch 'default':
    Issue bpo-15478: Oops, fix regression in os.open() on Windows
    http://hg.python.org/cpython/rev/817a90752470

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 5, 2012

    New changeset 11ea4eb79e9d by Victor Stinner in branch 'default':
    Issue bpo-15478: Fix test_pep277 on Windows
    http://hg.python.org/cpython/rev/11ea4eb79e9d

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 13, 2012

    New changeset ee7b713fec71 by Victor Stinner in branch 'default':
    Issue bpo-15478: os.lchflags() is not always available when os.chflags() is available
    http://hg.python.org/cpython/rev/ee7b713fec71

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    OS-windows topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants