Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.path.isdir() is slow on windows #55792

Closed
elibendersky mannequin opened this issue Mar 17, 2011 · 14 comments
Closed

os.path.isdir() is slow on windows #55792

elibendersky mannequin opened this issue Mar 17, 2011 · 14 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@elibendersky
Copy link
Mannequin

elibendersky mannequin commented Mar 17, 2011

BPO 11583
Nosy @loewis, @giampaolo, @tjguk, @briancurtin
Files
  • issue11583.diff
  • issue11583_v2.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/briancurtin'
    closed_at = <Date 2011-06-09.00:31:48.883>
    created_at = <Date 2011-03-17.09:38:32.337>
    labels = ['library', 'performance']
    title = 'os.path.isdir() is slow on windows'
    updated_at = <Date 2011-06-09.15:01:24.206>
    user = 'https://bugs.python.org/elibendersky'

    bugs.python.org fields:

    activity = <Date 2011-06-09.15:01:24.206>
    actor = 'python-dev'
    assignee = 'brian.curtin'
    closed = True
    closed_date = <Date 2011-06-09.00:31:48.883>
    closer = 'brian.curtin'
    components = ['Library (Lib)']
    creation = <Date 2011-03-17.09:38:32.337>
    creator = 'eli.bendersky'
    dependencies = []
    files = ['22236', '22239']
    hgrepos = []
    issue_num = 11583
    keywords = ['patch', 'needs review']
    message_count = 14.0
    messages = ['131238', '131239', '131241', '131257', '131260', '137571', '137579', '137581', '137585', '137629', '137931', '137932', '137939', '137982']
    nosy_count = 7.0
    nosy_names = ['loewis', 'giampaolo.rodola', 'tim.golden', 'stutzbach', 'eli.bendersky', 'brian.curtin', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue11583'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3']

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Mar 17, 2011

    Report here: http://stackoverflow.com/questions/5316928/python-os-path-isdir-is-slow-on-windows

    It could be a problem with Windows itself, but I'm opening this to keep track of the issue, just to be on the safe side.

    @elibendersky elibendersky mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Mar 17, 2011
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 17, 2011

    How do you expect this to be resolved? Or will it stay open until Microsoft improves the NTFS performance?

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Mar 17, 2011

    On Thu, Mar 17, 2011 at 12:46, Martin v. Löwis <report@bugs.python.org>wrote:

    I opened this in order not to forget to look at the implementation of isdir
    on windows, making sure it's the most optimal thing we can do. If you know
    it is, the issue can be closed as far as I'm concerned.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 17, 2011

    Ok. It's probably not the most optimal thing we can do. It opens the directory with CreateFileW (specifying OPEN_EXISTING and BACKUP_SEMANTICS), then GetFileInformationByHandle. If we only want to find out whether the file is a directory, we could just to FindFirstFile, which wouldn't need to go to the MFT record.

    I would be surprised if this makes a difference in practice, though.

    @briancurtin
    Copy link
    Member

    I made a bunch of the stat changes in 3.2 so I'll assign this to myself and take a look.

    @briancurtin briancurtin self-assigned this Mar 17, 2011
    @briancurtin
    Copy link
    Member

    Attached is a patch that makes this about twice as fast for regular files and about 15 times faster for symbolic links and directories.

    Not that this would be anyone's performance bottleneck, but it does make the time more of a constant due to the recursion and extra overhead when stat'ing symbolic paths. The numbers were tested with the patch on bpo-12084 (new stat impl).

    # The numbers were run before I hooked nt._isdir up through Lib/ntpath.py, so os.path.isdir is the old way.
    # Regular file

    PCbuild\amd64\python.exe -m timeit -s "import nt" "nt._isdir('README')"
    10000 loops, best of 3: 25 usec per loop

    PCbuild\amd64\python.exe -m timeit -s "import os" "os.path.isdir('README')"
    10000 loops, best of 3: 43 usec per loop

    # Regular directory

    PCbuild\amd64\python.exe -m timeit -s "import nt" "nt._isdir('Lib')"
    10000 loops, best of 3: 24.3 usec per loop

    PCbuild\amd64\python.exe -m timeit -s "import os" "os.path.isdir('Lib')"
    10000 loops, best of 3: 41.4 usec per loop

    # testlink is a symbolically linked directory

    PCbuild\amd64\python.exe -m timeit -s "import nt" "nt._isdir('testlink')"
    10000 loops, best of 3: 26.1 usec per loop

    PCbuild\amd64\python.exe -m timeit -s "import os" "os.path.isdir('testlink')"
    1000 loops, best of 3: 415 usec per loop

    # setup.py.link is a symbolic link

    PCbuild\amd64\python.exe -m timeit -s "import nt" "nt._isdir('setup.py.link')"
    10000 loops, best of 3: 25.8 usec per loop

    PCbuild\amd64\python.exe -m timeit -s "import os" "os.path.isdir('setup.py.link')"
    1000 loops, best of 3: 336 usec per loop

    @briancurtin
    Copy link
    Member

    Looks like I didn't test this enough - many other test failures are occurring. Disregard this patch for now.

    @tjguk
    Copy link
    Member

    tjguk commented Jun 3, 2011

    One (presumably unintended) side-effect is that you can now do os.path.isdir on a wildcard and the first returned entry will be tested for directoryness:

    import os
    os.path.isdir ("c:/tem*")
    # => True where c:/temp exists

    @briancurtin
    Copy link
    Member

    Here's a patch that works. All tests are passing.

    I changed from using FindFirstFile to GetFileAttributes, which provides basically the same performance characteristics.

    One change in this implementation is to not raise a WindowsError when the file cannot be opened. The original os.stat would return False in that case, so now this function returns False (before it just passed on the WindowsError, which killed lots of tests).

    @tjguk
    Copy link
    Member

    tjguk commented Jun 4, 2011

    Code looks good and tests pass. Only one thing: the code in the patched lib\ntpath.py still refers to FindFirstFile

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 9, 2011

    New changeset 88e318166eaf by Brian Curtin in branch '3.2':
    Fix bpo-11583. Changed os.path.isdir to use GetFileAttributes instead of os.stat.
    http://hg.python.org/cpython/rev/88e318166eaf

    New changeset 567f30527913 by Brian Curtin in branch 'default':
    Fix bpo-11583. Changed os.path.isdir to use GetFileAttributes instead of os.stat.
    http://hg.python.org/cpython/rev/567f30527913

    @briancurtin
    Copy link
    Member

    This was also pushed to 2.7 in f1509fc75435.

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Jun 9, 2011

    Brian - great, thanks.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 9, 2011

    New changeset d40609dd01e0 by Brian Curtin in branch '3.2':
    Correction to 88e318166eaf - Issue bpo-11583
    http://hg.python.org/cpython/rev/d40609dd01e0

    New changeset fe813f5711a5 by Brian Curtin in branch '2.7':
    Correction to f1509fc75435 - Issue bpo-11583
    http://hg.python.org/cpython/rev/fe813f5711a5

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants