Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.listdir breaks with literal paths #57443

Closed
mandel mannequin opened this issue Oct 20, 2011 · 19 comments
Closed

os.listdir breaks with literal paths #57443

mandel mannequin opened this issue Oct 20, 2011 · 19 comments
Assignees
Labels
OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@mandel
Copy link
Mannequin

mandel mannequin commented Oct 20, 2011

BPO 13234
Nosy @loewis, @pitrou, @vstinner, @tjguk, @briancurtin, @serhiy-storchaka
Files
  • listdir.patch: patch that ensures that the correct os.sep is used when calling FindFirstFile
  • issue13234_py33.patch
  • issue13234_py33_v2.patch
  • issue13234_py33_v3.patch
  • issue13234_py33_v4.patch: Simplified patch
  • issue13234_tip_refresh.patch: tip refresh
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/tjguk'
    closed_at = <Date 2013-10-25.20:46:23.775>
    created_at = <Date 2011-10-20.16:54:45.624>
    labels = ['type-bug', 'library', 'OS-windows']
    title = 'os.listdir breaks with literal paths'
    updated_at = <Date 2013-10-25.20:46:23.774>
    user = 'https://bugs.python.org/mandel'

    bugs.python.org fields:

    activity = <Date 2013-10-25.20:46:23.774>
    actor = 'tim.golden'
    assignee = 'tim.golden'
    closed = True
    closed_date = <Date 2013-10-25.20:46:23.775>
    closer = 'tim.golden'
    components = ['Library (Lib)', 'Windows']
    creation = <Date 2011-10-20.16:54:45.624>
    creator = 'mandel'
    dependencies = []
    files = ['23482', '23512', '23513', '23519', '23520', '32334']
    hgrepos = []
    issue_num = 13234
    keywords = ['patch']
    message_count = 19.0
    messages = ['146031', '146338', '146339', '146341', '146346', '146350', '146353', '146375', '146388', '146389', '146391', '146398', '146400', '181517', '184822', '201140', '201161', '201288', '201290']
    nosy_count = 9.0
    nosy_names = ['loewis', 'pitrou', 'vstinner', 'tim.golden', 'brian.curtin', 'santoso.wijaya', 'python-dev', 'mandel', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue13234'
    versions = ['Python 3.3', 'Python 3.4']

    @mandel
    Copy link
    Mannequin Author

    mandel mannequin commented Oct 20, 2011

    During the development of an application that needed to write paths longer than 260 chars we opted to use \\?\ as per http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx#maxpath.

    When working with literal paths the following the os.listdir funtion would return the following trace:

    >>> import os
    >>> test = r'\\?\C:\Python27'
    >>> os.listdir(test)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: '\\\\?\\C:\\Python27/*.*'

    The reason for this is that the implementation of listdir appends '/' at the end of the path if os.path.sep is not present at the end of it which FindFirstFile does not like. This is a inconsistency from the OS but it can be easily fixed (see attached patch).

    @mandel mandel mannequin added the stdlib Python modules in the Lib dir label Oct 20, 2011
    @santosowijaya santosowijaya mannequin added OS-windows type-bug An unexpected behavior, bug, or error labels Oct 22, 2011
    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Oct 24, 2011

    I'd also like to point out that Unicode path is handled correctly in both 2.7.x
    and 3.x:

        Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win
        32
        Type "help", "copyright", "credits" or "license" for more information.
        >>> import os
        >>> os.listdir(u'\\\\?\\D:\\Temp\\tempdir')
        [u'sub1', u'test1.txt']
        >>> os.listdir('\\\\?\\D:\\Temp\\tempdir')
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        WindowsError: [Error 123] The filename, directory name, or volume label syntax i
        s incorrect: '\\\\?\\D:\\Temp\\tempdir/*.*'
    
        Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on win
        32
        Type "help", "copyright", "credits" or "license" for more information.
        >>> import os
        >>> os.listdir('\\\\?\\D:\\Temp\\tempdir')
        ['sub1', 'test1.txt']
        >>> os.listdir(b'\\\\?\\D:\\Temp\\tempdir')
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        WindowsError: [Error 123] The filename, directory name, or volume label syntax i
        s incorrect: '\\\\?\\D:\\Temp\\tempdir/*.*'

    The problem only lies in the code handling narrow string paths. If you look at
    (py33) posixmodule.c:2545-6, the bare path is appended with L"\\.*". Whereas
    in the narrow string case, posixmodule.c:2625-6, the bare path is appended with
    "/
    .*", instead.

    To be consistent, we should use '\\'.

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Oct 24, 2011

    There are also several other edge cases to be taken care of:

        Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win
        32
        Type "help", "copyright", "credits" or "license" for more information.
        >>> import os
        >>> os.listdir(r'\\?\C:\Python27/')
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        WindowsError: [Error 123] The filename, directory name, or volume label syntax i
        s incorrect: '\\\\?\\C:\\Python27/*.*'
        >>> os.listdir(r'\\?\C:/Python27\Lib')
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        WindowsError: [Error 3] The system cannot find the path specified: '\\\\?\\C:/Py
        thon27\\Lib/*.*'

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Oct 25, 2011

    Additionally, there might be issues in other APIs when handling with extended path lengths:

    D:\Temp\tempdir>dir
    Volume in drive D is Data
    Volume Serial Number is 7E3D-EC81

    Directory of D:\Temp\tempdir

    10/24/2011 04:22 PM <DIR> .
    10/24/2011 04:22 PM <DIR> ..
    10/24/2011 04:28 PM <DIR> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    10/24/2011 01:31 PM <DIR> sub1
    10/24/2011 03:39 PM 262 test1.txt
    1 File(s) 262 bytes
    4 Dir(s) 244,483,321,856 bytes free

    D:\Temp\tempdir>cd AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAA

    D:\Temp\tempdir\AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAA>dir
    Volume in drive D is Data
    Volume Serial Number is 7E3D-EC81

    Directory of D:\Temp\tempdir\AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAA

    10/24/2011 04:28 PM <DIR> .
    10/24/2011 04:28 PM <DIR> ..
    10/24/2011 04:14 PM 0 1234567.txt
    10/24/2011 04:28 PM <DIR> BBBBBBBBBBBBB
    1 File(s) 0 bytes
    3 Dir(s) 244,483,321,856 bytes free

    Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on win
    32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os
    >>> subdir = 'B'*13
    >>> os.path.isdir(subdir)
    False
    >>> os.getcwd()
    'D:\\Temp\\tempdir\\AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAA'
    >>> subdir_abs = os.path.join(os.getcwd(), subdir)
    >>> os.path.isdir(subdir)
    False
    >>> subdir_ext = r'\\?\%s' % subdir_abs
    >>> os.path.isdir(subdir_ext)
    True

    In the above example, perhaps a ValueError('path too long') is better than returning False?

    @mandel
    Copy link
    Mannequin Author

    mandel mannequin commented Oct 25, 2011

    Indeed, in our code we had to write a number of wrappers around the os calls to be able to work with long path on Windows. At the moment working with long paths on windows and python is broken in a number of places and is a PITA to work with.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 25, 2011

    Thanks for the patch. Is there a reason you don't use shutil.rmtree in tearDown()?
    I don't know if it's our business to convert forward slashes passed by the user. Also, general support for extended paths is another can of worms ;)

    @mandel
    Copy link
    Mannequin Author

    mandel mannequin commented Oct 25, 2011

    In case of my patch (I don't know about santa4nt case) I did not use shutil.remove because it was not used in the other tests and I wanted to be consistent and not add a new import. Certainly if there is not an issue with that we should use it.

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Oct 25, 2011

    Even if we decide not to convert any forward slash, listdir() adds u"\\.*" when the input is unicode, but it adds "/.*" when it is not, before passing it off to Windows API. Hence the inconsistency and the problem Manuel saw.

    IMO, his patch shouldn't have differentiated if the path starts with r"\\?\" and just be consistent with adding "\\*.*", unicode or not.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Oct 25, 2011

    This issue is getting messy. I declare that this issue is *only* about the original problem reported in msg146031. When that is fixed, this issue will be closed, and any further issues need to be reported separately.

    As for the original problem, ISTM that the right fix is to replace

    namebuf[len++] = '/';
    

    with

    namebuf[len++] = '\\';
    

    No further change should be necessary.

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Oct 25, 2011

    Addressing patch comments.

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Oct 25, 2011

    Fair enough. Simplifying.

    @vstinner
    Copy link
    Member

    •    if (ch != SEP && ch != ALTSEP && ch != ':')
      

    + if (ch != '\\' && ch != '/' && ch != ':')

    I don't understand this change in issue13234_py33_v4.patch (the change looks to be useless).

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Oct 25, 2011

    I don't understand this change in issue13234_py33_v4.patch (the change looks to be useless).

    It's pedantic correctness on my part. SEP and ALTSEP are defined as wide strings L'\\' and L'/' respectively. Their usage in the unicode conditional branch and the bytes conditional branch seem to have been reversed.

    @serhiy-storchaka
    Copy link
    Member

    I added minor comments in Rietveld.

    Santoso Wijaya, can you please submit a contributor form?

    http://python.org/psf/contrib/contrib-form/
    http://python.org/psf/contrib/

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Mar 21, 2013

    Done.

    @tjguk
    Copy link
    Member

    tjguk commented Oct 24, 2013

    Santoso Wijaya: sorry for the delay. If you'd like to retarget your patch against the tip, I'm happy to apply. At this stage, 3.3 and 3.4 seem the appropriate branches.

    @tjguk tjguk self-assigned this Oct 24, 2013
    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Oct 24, 2013

    Here you go.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 25, 2013

    New changeset 12aaa2943791 by Tim Golden in branch 'default':
    bpo-13234 Allow listdir to handle extended paths on Windows (Patch by Santoso Wijaya)
    http://hg.python.org/cpython/rev/12aaa2943791

    New changeset 5c187d6162c5 by Tim Golden in branch 'default':
    bpo-13234 Credit Santoso for the patch and add NEWS item
    http://hg.python.org/cpython/rev/5c187d6162c5

    @tjguk
    Copy link
    Member

    tjguk commented Oct 25, 2013

    Applied. Thanks for the patch.

    @tjguk tjguk closed this as completed Oct 25, 2013
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants