Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 471 implementation: os.scandir() directory scanning function #66714

Closed
benhoyt mannequin opened this issue Sep 30, 2014 · 56 comments
Closed

PEP 471 implementation: os.scandir() directory scanning function #66714

benhoyt mannequin opened this issue Sep 30, 2014 · 56 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@benhoyt
Copy link
Mannequin

benhoyt mannequin commented Sep 30, 2014

BPO 22524
Nosy @tebeka, @pitrou, @vstinner, @giampaolo, @tjguk, @benhoyt, @4kir4, @socketpair, @MojoVampire
Files
  • scandir-1.patch: Patch to document, implement, and test os.scandir()
  • clear_system_cache.patch
  • do_os_walk_getsize.patch
  • scandir-2.patch
  • get_tree_size_listdir.diff
  • scandir-3.patch
  • scandir-4.patch
  • scandir-5.patch
  • bench_scandir.py
  • bench_scandir2.py
  • scandir-6.patch
  • scandir-7.patch: Updated all-C implementation
  • scandir-8.patch: Updated all-C implementation per Victor's review
  • whatsnew.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-03-09.09:52:18.560>
    created_at = <Date 2014-09-30.14:42:42.209>
    labels = ['type-feature', 'library']
    title = 'PEP 471 implementation: os.scandir() directory scanning function'
    updated_at = <Date 2015-03-10.13:24:41.391>
    user = 'https://github.com/benhoyt'

    bugs.python.org fields:

    activity = <Date 2015-03-10.13:24:41.391>
    actor = 'benhoyt'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-03-09.09:52:18.560>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2014-09-30.14:42:42.209>
    creator = 'benhoyt'
    dependencies = []
    files = ['36831', '36847', '36848', '36963', '37292', '38108', '38111', '38113', '38114', '38120', '38121', '38259', '38374', '38421']
    hgrepos = []
    issue_num = 22524
    keywords = ['patch']
    message_count = 56.0
    messages = ['227933', '228751', '228775', '228780', '228784', '228786', '228818', '228819', '228821', '228854', '228855', '228857', '228858', '228866', '228867', '228872', '228873', '229659', '231673', '231677', '231682', '231684', '231703', '231745', '235458', '235806', '235833', '235835', '235848', '235849', '235859', '235860', '235865', '235873', '235874', '235883', '235884', '235888', '235920', '235940', '236205', '236738', '237328', '237446', '237489', '237490', '237495', '237496', '237497', '237505', '237624', '237627', '237753', '237758', '237761', '237765']
    nosy_count = 12.0
    nosy_names = ['tebeka', 'pitrou', 'vstinner', 'giampaolo.rodola', 'tim.golden', 'benhoyt', 'abacabadabacaba', 'akira', 'socketpair', 'python-dev', 'josh.r', 'alexei.romanov']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue22524'
    versions = ['Python 3.5']

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Sep 30, 2014

    Opening this to track the implementation of PEP-471: os.scandir() [1]. This supercedes Issue bpo-11406 (and possibly others)?

    The implementation is most of the way there, but not yet done as a CPythono 3.5 patch. Before I have a proper patch, it's available on GitHub [2] -- see posixmodule_scandir*.c, test/test_scandir.py, and os.rst.

    [1] http://legacy.python.org/dev/peps/pep-0471/
    [2] https://github.com/benhoyt/scandir

    @benhoyt benhoyt mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Sep 30, 2014
    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Oct 7, 2014

    Attaching my first patch here. It includes docs, the implementation of scandir and DirEntry in posixmodule.c, and tests in test_scandir.py. It does not yet include the changes for os.walk() to use scandir; I'm hoping to do that in a separate patch for easier code review.

    I'd love some code and documentation review, as well as some thoughts on these open questions:

    1. posixmodule.c is getting kinda huge and unwieldy; it'd be nice to implement this in a separate C file, but that comes with its own problems, as several functions would need to be made available externally.

    2. While writing the tests, I was getting issues with Windows (apparently) not deleting the symlinks immediately in my teardown function, so for the moment I just don't call the teardown function at all. Not great; I think we'll want a better solution.

    3. Partly due to Rename README to README.rst and enhance formatting #2, I haven't yet added tests for the file-not-found condition if a file is found by the directory scanning but then deleted before a DirEntry.is_dir() or DirEntry.is_file() call. I intend to add these tests, and there's a TODO comment in test_scandir.py so I remember.

    4. test_scandir.py will need some further cleanup for inclusion in CPython; it's currently taken straight from my scandir module, which supports Python down to Python 2.6.

    @vstinner
    Copy link
    Member

    vstinner commented Oct 7, 2014

    Hi Ben,

    I started to review your patch, sorry for the delay :-( It's convinient to have the patch on Rietveld to comment it inline.

    I didn't expect so much C code. The posixmodule.c is really huge. Well, it's just the longest C file of Python 3.5. Your patch adds 781 more lines to it. It's maybe time to split the file into smaller files.

    At the same time, reading C code is boring and it's too easy to "implement bugs". I need to be convinced that the speedup announced in the PEP requires so much C code. In the PEP and https://github.com/benhoyt/scandir#benchmarks it's not clear if the speedup comes from the C code (instead of implementing scandir in pure Python, for example using ctypes) or the lower number of syscalls.

    To have a fair benchmark, the best would be to have a C part only implementing opendir/readir and FirstFindFile/FindFileXXX (I'm not sure that ctypes is as fast as C code written with the Python C API), and a Python part which implements the nice Python API.

    If the speedup is 10% or lower, I would prefer to not add so much C code and take the compromise of a thin C wrapper and implement scandir in Python.

    If we implement os.scandir() in Python, should we still try to share code with os.listdir()? I see that importlib uses os.listdir(), so we need a listdir() function implemented in C at least for importlib.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 8, 2014

    I agree that having a high-level wrapper around a low-level C primitive would be cool, but someone has to experiment on that to find out how much performance it would cost.

    You may want to have the C primitive return results in batches (of e.g. 10 or 100 entries) to limit the overhead, btw.

    @vstinner
    Copy link
    Member

    vstinner commented Oct 8, 2014

    You may want to have the C primitive return results in batches (of e.g. 10 or 100 entries) to limit the overhead, btw.

    Ah yes, good idea. I read that internally readdir() also fetchs many
    entries in a single syscall (prefetch" / readahead, I don't know how
    to call that).

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Oct 8, 2014

    Thanks for the initial response and code review, Victor. I'll take a look and respond further in the next few days.

    In the meantime, however, I'm definitely open to splitting scandir out into its own C file. This will mean a little refactoring (making some functions public/non-static).

    Based on the numbers so far, I'm not so keen on implementing just the sys calls in C and the rest in Python. I already do basically this with ctypes in the scandir module, and it's slowish. I'll send proper numbers through soon, but here's what I remember from running benchmark.py on my Windows laptop with SSD drive:

    ctypes version: os.walk() 9x faster with scandir
    CPython 3.5 C version (debug): os.walk() 24x faster with scandir
    CPython 3.5 C version (release): os.walk() 55x faster with scandir

    So you do get a lot of speedup from just the ctypes version, but you get a lot more (55/9 = 6x more here) by using the all-C version. Again, these numbers are from memory -- I'll send exact ones later.

    One of the problems is that creating the DirEntry objects and calling their methods is fairly expensive, and if this is all done in Python, you lose a lot. I believe scandir() would end up being slower than listdir() in many cases.

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Oct 8, 2014

    Here are the actual numbers (instead of just from memory) running on my Windows laptop, which happens to have an SSD drive, so os.walk() with scandir is particularly good here.

    python 3.4: benchmark.py -c python (all implemented in Python using ctypes):

    os.walk took 1.011s, scandir.walk took 0.057s -- 17.8x as fast

    python 3.4: benchmark.py -c c (using _scandir.c, so the iteration implemented in C, the DirEntry object in Python):

    os.walk took 1.014s, scandir.walk took 0.039s -- 25.8x as fast

    python 3.5: benchmark.py -c os (using the new all-C version in posixmodule.c):

    os.walk took 0.983s, scandir.walk took 0.019s -- 52.3x as fast

    So as you can see, there's still another 2x speedup to be gained from having everything in C. I know it's a bit more to review, but my thinking is if we're going to speed this baby up, let's speed it right up!

    I haven't done these tests on Linux yet.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 8, 2014

    Ben, how can I run the benchmark on my own machine?

    Thanks!

    @pitrou
    Copy link
    Member

    pitrou commented Oct 8, 2014

    Ok, I've found it. The numbers don't look that good here (Linux, SSD):

    $ ~/cpython/opt/python benchmark.py -c python /usr/share/ 
    [...]
    os.walk took 1.266s, scandir.walk took 1.852s -- 0.7x as fast
    
    $ ~/cpython/opt/python benchmark.py -c c /usr/share/ 
    [...]
    os.walk took 1.276s, scandir.walk took 1.266s -- 1.0x as fast
    
    $ ~/cpython/opt/python benchmark.py -c os /usr/share/ 
    [...]
    os.walk took 1.278s, scandir.walk took 0.585s -- 2.2x as fast

    @vstinner
    Copy link
    Member

    vstinner commented Oct 9, 2014

    I cloned https://github.com/benhoyt/scandir. I understand that the --scandir command line option of benchmark.py are these choices:

    • generic = call listdir() and then use "yield GenericDirEntry" which caches os.stat() and os.lstat() results
    • python = ctypes implemented calling opendir/readdir and yields PosixDirEntry objects which uses d_type field from readdir() in is_dir(), is_file() and is_symlink(). Cache the result of os.stat() and os.lstat()
    • c = "scandir_helper" (iterator) implemented in C (Python C API) yielding PosixDirEntry objects (same class than the "python" benchmark)

    I checked with an assertion: d_type of readdir() is never DT_UNKNOWN on my Linux Fedora 20. Statistics of PosixDirEntry on my /usr/share tree:

    • 155544 PosixDirEntry instances
    • fast-path (use d_type) taken 466632 times in is_dir/is_symlink
    • slow-path (need to call os.stat or os.lstat) taken 7828 times in is_dir/is_symlink
    • os.stat called 7832 times
    • os.stat called 0 times

    7832 is the number of symbolic links in my /usr/share tree. 95% of entries don't need stat() in scandir.walk() when using readdir().

    So is_dir() and is_symlink() are approximatively called 3 times per entry: scandir.walk() calls is_dir() and is_symlink() on each entry, but is_dir() also calls is_symlink() by default (because the default value of the follow_symlinks parameter is True).

    I ran benchmark.py on my Linux Fedora 20 (Linux kernel 3.14). I have two HDD configured as RAID0. I don't think that my disk config is revelant: I also have 12 GB of memory, I hope that /usr/share tree is fully cached. For example, "free -m" tells me that 8.8 GB are cached.

    The generic implementation looks inefficient: it is 2 times slower. Is there a bug? GenericDirEntry caches os.stat() and os.lstat() result, it should be as fast or faster than os.walk(), no? Or is it the cost of a generator?

    The "c" implementation is 35% faster than the "python" implementation (python=1.170 sec, c=0.762 sec).

    Result of benchmark:

    haypo@smithers$ python3 setup.py build && for scandir in generic python c; do echo; echo "=== $scandir ==="; PYTHONPATH=build/lib.linux-x86_64-3.3/ python3 benchmark.py /usr/share -c $scandir || break; done
    running build
    running build_py
    running build_ext

    === generic ===
    Using very slow generic version of scandir
    Comparing against builtin version of os.walk()
    Priming the system's cache...
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk took 1.340s, scandir.walk took 2.471s -- 0.5x as fast

    === python ===
    Using slower ctypes version of scandir
    Comparing against builtin version of os.walk()
    Priming the system's cache...
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk took 1.318s, scandir.walk took 1.170s -- 1.1x as fast

    === c ===
    Using fast C version of scandir
    Comparing against builtin version of os.walk()
    Priming the system's cache...
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk took 1.317s, scandir.walk took 0.762s -- 1.7x as fast

    @vstinner
    Copy link
    Member

    vstinner commented Oct 9, 2014

    I also ran the benchmark without cache on disk. It looks like my hard drive is too slow to see a real speedup of scandir(). The maximum speedup is 2.7 seconds lesser when testing the "c" scandir (24.089 => 21.374 seconds): "1.1x as fast".

    I modified the benchmark to flush all caches, I added the following line into do_scandir_walk():

       os.system("sudo bash -c 'echo 3 > /proc/sys/vm/drop_caches'")

    I also commented the first call to do_scandir_walk() ("Priming the system's cache..."), just to make the benchmark faster.

    (See attached clear_system_cache.patch for all changes.)

    Result of the modified benchmark without cache:

    haypo@smithers$ python3 setup.py build && for scandir in generic python c; do echo; echo "=== $scandir ==="; PYTHONPATH=build/lib.linux-x86_64-3.3/ python3 benchmark.py /usr/share -c $scandir || break; done
    running build
    running build_py
    running build_ext

    === generic ===
    Using very slow generic version of scandir
    Comparing against builtin version of os.walk()
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk took 24.324s, scandir.walk took 24.215s -- 1.0x as fast

    === python ===
    Using slower ctypes version of scandir
    Comparing against builtin version of os.walk()
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk took 24.089s, scandir.walk took 21.374s -- 1.1x as fast

    === c ===
    Using fast C version of scandir
    Comparing against builtin version of os.walk()
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk took 24.225s, scandir.walk took 21.390s -- 1.1x as fast

    @vstinner
    Copy link
    Member

    vstinner commented Oct 9, 2014

    My previous benchmark runs were with the system Python 3.3.

    New benchmark with a Python 3.5 patched with scandir-1.patch (compiled in release mode: ./configure && make).

    "os" (os.scandir) is 2 times faster than "c" (_scandir.scandir_helper): c=0.533 sec, os=0.268 sec.

    On all implementations, scandir.walk() is only much faster than os.walk() when "os" (os.scandir) is used: "3.2x as fast" (860 ms => 268 ms).

    I guess that on Linux the speedup highly depends on the number of symbolic links.

    It would help if benchmark.py created the tree to have more reliable numbers, and being able to compare trees without symlinks, with a few symlinks (ex: 10%), and with a lot of symlinks (ex: 99%).

    Benchmark results:

    haypo@smithers$ ~/prog/python/default/python setup.py build && for scandir in generic python c os; do echo; echo "=== $scandir ==="; PYTHONPATH=build/lib.linux-x86_64-3.5/ ~/prog/python/default/python benchmark.py /usr/share -c $scandir || break; done
    running build
    running build_py
    copying scandir.py -> build/lib.linux-x86_64-3.5
    running build_ext

    === generic ===
    Using very slow generic version of scandir
    Comparing against builtin version of os.walk()
    Priming the system's cache...
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk took 0.857s, scandir.walk took 1.627s -- 0.5x as fast

    === python ===
    Using slower ctypes version of scandir
    Comparing against builtin version of os.walk()
    Priming the system's cache...
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk took 0.856s, scandir.walk took 0.915s -- 0.9x as fast

    === c ===
    Using fast C version of scandir
    Comparing against builtin version of os.walk()
    Priming the system's cache...
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk took 0.857s, scandir.walk took 0.533s -- 1.6x as fast

    === os ===
    Using Python 3.5's builtin os.scandir()
    Comparing against builtin version of os.walk()
    Priming the system's cache...
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk took 0.860s, scandir.walk took 0.268s -- 3.2x as fast

    @vstinner
    Copy link
    Member

    vstinner commented Oct 9, 2014

    On Windows, I guess that "benchmark.py --size" is faster with scandir() than with os.walk(), because os.stat() is never called.

    benchmark.py has a bug in do_os_walk() when the --size option is used: attached do_os_walk_getsize.patch is needed.

    Sizes returned by os.walk() and scandir.walk() are different. I guess that the behaviour of symbolic links to directory is different. Because of that, I'm not sure that benchmark timings are reliable, but well, it should give us an idea of performances.

    To compute the size of a tree, scandir() is twice faster (2.1x as fast) than os.walk(): os.walk=1.435 sec, scandir.walk=0.675 sec.

    "os" is 41% faster than "c": c=1150 ms, os=675 ms.

    Results of "benchmark.py --size" on my Linux Fedora 20:

    haypo@smithers$ ~/prog/python/default/python setup.py build && for scandir in generic python c os; do echo; echo "=== $scandir ==="; PYTHONPATH=build/lib.linux-x86_64-3.5/ ~/prog/python/default/python benchmark.py -s /usr/share -c $scandir || break; done
    running build
    running build_py
    running build_ext

    === generic ===
    Using very slow generic version of scandir
    Comparing against builtin version of os.walk()
    Priming the system's cache...
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk size 3064748475, scandir.walk size 2924332540 -- NOT EQUAL!
    os.walk took 1.425s, scandir.walk took 1.147s -- 1.2x as fast

    === python ===
    Using slower ctypes version of scandir
    Comparing against builtin version of os.walk()
    Priming the system's cache...
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk size 3064748475, scandir.walk size 2924332540 -- NOT EQUAL!
    os.walk took 1.421s, scandir.walk took 1.651s -- 0.9x as fast

    === c ===
    Using fast C version of scandir
    Comparing against builtin version of os.walk()
    Priming the system's cache...
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk size 3064748475, scandir.walk size 2924332540 -- NOT EQUAL!
    os.walk took 1.426s, scandir.walk took 1.150s -- 1.2x as fast

    === os ===
    Using Python 3.5's builtin os.scandir()
    Comparing against builtin version of os.walk()
    Priming the system's cache...
    Benchmarking walks on /usr/share, repeat 1/3...
    Benchmarking walks on /usr/share, repeat 2/3...
    Benchmarking walks on /usr/share, repeat 3/3...
    os.walk size 3064748475, scandir.walk size 2924332540 -- NOT EQUAL!
    os.walk took 1.435s, scandir.walk took 0.675s -- 2.1x as fast

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Oct 9, 2014

    Thanks, Victor and Antone. I'm somewhat surprised at the 2-3x numbers you're seeing, as I was consistently getting 4-5x in the Linux tests I did. But it does depend quite a bit on what file system you're running, what hardware, whether you're running in a VM, etc. Still, 2-3x faster is a good speedup!

    The numbers are significantly better on Windows, as you can see. Even the smallest numbers I've seen with "--scandir os" are around 12x range on Windows.

    In any case, Victor's last tests are "right" -- I presume we'll have *some* C, so what we want to be comparing is "benchmark.py --scandir c" versus "benchmark.py --scandir os": the some C version versus the all C version in the attached CPython 3.5 patch.

    BTW, Victor, "Generic" isn't really useful. I just used it as a test case that calls listdir() and os.stat() to implement the scandir/DirEntry interface. So it's going to be strictly slower than listdir + stat due to using listdir and creating all those DirEntry objects.

    Anyway, where to from here? Are we agreed given the numbers that -- especially on Linux -- it makes good performance sense to use an all-C approach?

    @pitrou
    Copy link
    Member

    pitrou commented Oct 9, 2014

    Le 09/10/2014 14:35, Ben Hoyt a écrit :

    Anyway, where to from here? Are we agreed given the numbers that --
    especially on Linux -- it makes good performance sense to use an all-C
    approach?

    I think so, yes.

    @vstinner
    Copy link
    Member

    vstinner commented Oct 9, 2014

    I'm somewhat surprised at the 2-3x numbers you're seeing, as I was consistently getting 4-5x in the Linux tests I did. But it does depend quite a bit on what file system you're running, what hardware, whether you're running in a VM, etc. Still, 2-3x faster is a good speedup!

    I don't think that hardware matters. As I wrote, I expect the whole /usr/share tree to fit in memory. It's sounds more like optimizations in the Linux kernel. I ran benchmarks on Fedora 20 with the Linux kernel 3.14.

    Anyway, where to from here? Are we agreed given the numbers that -- especially on Linux -- it makes good performance sense to use an all-C approach?

    We didn't try yet to call readdir() multiple times in the C iterator and use a small cache (ex: between 10 and 1000 items, I don't know which size is the best yet) to also limit the number of readdir() calls. The cache would be an array of dirent on Linux.

    scandir_helper() can return an array of items instead of a single item for example.

    I can try to implement it if you want.

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Oct 9, 2014

    I dunno, I'd be happy if you implement this, but it does mean *more* C code, not less. :-) I feel this would be a nice-to-have, but we should get the thing working first, and then do the multiple-OS-calls-in-one optimization.

    In any case, if you implement this, I think you should do so against the CPython 3.5 patch, not the _scandir.c/scandir_helper code.

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Oct 18, 2014

    Attaching updated patch (scandir-2.patch) per Victor's code review here: http://bugs.python.org/review/22524/diff/13005/Modules/posixmodule.c

    Note that I haven't included test_scandir.py this time, as I haven't made any changes there, and it's not really ready for Python 3.5 yet.

    @tebeka
    Copy link
    Mannequin

    tebeka mannequin commented Nov 25, 2014

    Can we also update iglob [1] as well?

    [1] https://docs.python.org/2/library/glob.html#glob.iglob

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Nov 25, 2014

    scandir is slower on my machine:

      $ git clone https://github.com/benhoyt/scandir 
      $ cd scandir/
      $ ../cpython/python benchmark.py /usr/
      Using slower ctypes version of scandir
      Comparing against builtin version of os.walk()
      Priming the system's cache...
      Benchmarking walks on /usr/, repeat 1/3...
      Benchmarking walks on /usr/, repeat 2/3...
      Benchmarking walks on /usr/, repeat 3/3...
      os.walk took 7.761s, scandir.walk took 10.420s -- 0.7x as fast

    What commands should I run to benchmark the attached patch (scandir-2.patch)?

    @vstinner
    Copy link
    Member

    scandir is slower on my machine:

    Please share more information about your config: OS, disk type (hard
    drive, SSD, something else), filesystem, etc.

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Nov 25, 2014

    Akira, note the "Using slower ctypes version of scandir" -- this is the older, ctypes implementation of scandir. On Linux, depending on file system etc, that can often be slower. You need to at least be using the "fast C version" in _scandir.c, which is then half C, half Python. But ideally you'd use the all-C version that I've provided as a CPython 3.5 patch.

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Nov 26, 2014

    STINNER Victor added the comment:

    > scandir is slower on my machine:

    Please share more information about your config: OS, disk type (hard
    drive, SSD, something else), filesystem, etc.

    Ubuntu 14.04, SSD, ext4 filesystem. Results for different python
    versions are the same (scandir on pypy is even slower due to ctype
    usage).

    What other information could be useful?

    Ben Hoyt added the comment:

    Akira, note the "Using slower ctypes version of scandir" -- this is
    the older, ctypes implementation of scandir. On Linux, depending on
    file system etc, that can often be slower. You need to at least be
    using the "fast C version" in _scandir.c, which is then half C, half
    Python. But ideally you'd use the all-C version that I've provided as
    a CPython 3.5 patch.

    Yes. I've noticed it, that is why I asked :) *"What commands should I
    run to benchmark the attached patch (scandir-2.patch)?"*

    I've read benchmark.py's source; It happens that it already supports
    benchmarking of os.scandir. In my python checkout:

    cpython$ hg import --no-commit https://bugs.python.org/file36963/scandir-2.patch
    cpython$ make && cd ../scandir/
    scandir$ ../cpython/python benchmark.py -s /usr/
    Using Python 3.5's builtin os.scandir()
    Comparing against builtin version of os.walk()
    Priming the system's cache...
    Benchmarking walks on /usr/, repeat 1/3...
    Benchmarking walks on /usr/, repeat 2/3...
    Benchmarking walks on /usr/, repeat 3/3...
    os.walk size 7925376343, scandir.walk size 5534939617 -- NOT EQUAL!
    os.walk took 13.552s, scandir.walk took 6.032s -- 2.2x as fast

    It seems os.walk and scandir.walk do a different work here.

    I've written get_tree_size_listdir_fd() that mimics get_tree_size()
    (see get_tree_size_listdir.diff patch) to get the same size:

    scandir$ ../cpython/python benchmark.py -s /usr
    Using Python 3.5's builtin os.scandir()
    Comparing against builtin version of os.walk()
    Comparing against get_tree_size_listdir_fd
    Priming the system's cache...
    Benchmarking walks on /usr, repeat 1/3...
    Benchmarking walks on /usr, repeat 2/3...
    Benchmarking walks on /usr, repeat 3/3...
    os.walk size 5534939617, scandir.walk size 5534939617 -- equal
    os.walk took 5.697s, scandir.walk took 5.621s -- 1.0x as fast

    scandir is not noticeably faster but scandir-based code is much nicer here.

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Nov 27, 2014

    To see what happens at syscall level, I've run various implementations
    of get_tree_size() functions (see get_tree_size_listdir.diff) with
    strace:

    get_tree_size_listdir_fd -- os.listdir(fd) + os.lstat
    get_tree_size -- os.scandir(path) + entry.lstat
    get_tree_size_listdir_stat -- os.listdir(path) + os.lstat
    get_tree_size_listdir -- os.listdir(path) + os.path.isdir + os.lstat

    Summary:

    • os.listdir() and os.scandir()-based variants use the same number of getdents() syscalls
    • and the number of openat, lstat (newfstatat) syscalls is also comparable
      (except for the variant that uses os.path.isdir)

    Log:

    scandir$ /usr/bin/time strace -fco py.strace ../cpython/python -c 'from benchmark import get_tree_size_listdir_fd as f; import os; print(f(os.open("/usr/", os.O_RDONLY)))' && head -7 py.strace
    5535240217
    11.29user 8.14system 0:17.78elapsed 109%CPU (0avgtext+0avgdata 13460maxresident)k
    0inputs+8outputs (0major+6781minor)pagefaults 0swaps
    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    46.51 0.075252 0 264839 newfstatat
    16.88 0.027315 1 50460 getdents
    11.53 0.018660 0 75621 fcntl
    9.74 0.015758 0 50531 close
    6.87 0.011116 0 25214 openat
    scandir$ /usr/bin/time strace -fco py.strace ../cpython/python -c 'from benchmark import get_tree_size as f; print(f("/usr/"))' && head -7 py.strace
    5535240217
    22.56user 8.47system 0:29.77elapsed 104%CPU (0avgtext+0avgdata 13280maxresident)k
    0inputs+8outputs (0major+6306minor)pagefaults 0swaps
    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    62.00 0.032822 0 239644 lstat
    19.97 0.010570 0 50460 getdents
    8.52 0.004510 0 25215 openat
    6.09 0.003224 0 25326 close
    0.55 0.000292 3 95 mmap
    scandir$ /usr/bin/time strace -fco py.strace ../cpython/python -c 'from benchmark import get_tree_size_listdir_stat as f; print(f("/usr/"))' && head -7 py.strace
    5535240217
    13.70user 6.30system 0:18.84elapsed 106%CPU (0avgtext+0avgdata 13456maxresident)k
    0inputs+8outputs (0major+6769minor)pagefaults 0swaps
    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    64.79 0.050217 0 264849 lstat
    19.37 0.015011 0 50460 getdents
    8.22 0.006367 0 25215 openat
    5.76 0.004465 0 25326 close
    0.32 0.000247 2 114 mmap
    scandir$ /usr/bin/time strace -fco py.strace ../cpython/python -c 'from benchmark import get_tree_size_listdir as f; print(f("/usr/"))' && head -7 py.strace
    5535240217
    19.53user 10.61system 0:28.16elapsed 107%CPU (0avgtext+0avgdata 13452maxresident)k
    0inputs+8outputs (0major+6733minor)pagefaults 0swaps
    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    42.04 0.063050 0 265195 37259 stat
    37.40 0.056086 0 265381 lstat
    11.46 0.017187 0 50460 getdents
    4.82 0.007232 0 25215 openat
    3.43 0.005139 0 25326 close

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Feb 6, 2015

    Victor, I'd love to push forward with this (I doubt it's going to make alpha 1 on February 8, but hopefully alpha 2 on March 8).

    It seems pretty settled that we need to use the all-C version I've created, especially for Linux.

    Can you please review scandir-2.patch? It's on Rietveld here: http://bugs.python.org/review/22524/#ps13104

    You can see what changed between scandir-1.patch and scandir-2.patch more easily here: benhoyt/scandir@d4e2d70

    Also, here is just the scandir C code in its own file: https://github.com/benhoyt/scandir/blob/d4e2d7083319ed8988aef6ed02d670fb36a00a33/posixmodule_scandir_main.c

    One other open question is: should it go into CPython's posixmodule.c as I have it in the patch, or should I try to separate it out?

    @vstinner
    Copy link
    Member

    scandir-3.patch: New implementation based on scandir-2.patch on Ben's github repository.

    Main changes with scandir-2.patch:

    • new DirEntry.inode() method

    • os.scandir() doesn't support bytes on Windows anymore: it's deprecated since python 3.3 and using bytes gives unexpected results on Windows

    As discussed with Ben Hoyt, I added a inode() method to solve the use case described here:
    https://www.reddit.com/r/Python/comments/2synry/so_8_peps_are_currently_being_proposed_for_python/cnvnz1w

    I will update the PEP-471 to add the inode() method.

    Notes:

    • os.scandir() doesn't accept a file descriptor, as decided in the PEP-471.

    • It may be nice to modify Scandir.close() to close the handle/directory; Scandir is the iterator returned by os._scandir()

    • My patch doesn't modify os.walk(): it prefer to do that in a new issue

    • DirEntry methods have no docstring

    Changes with scandir-2.patch:

    • C code is added to posixmodule.c, not into a new _scandir.c file, to avoid code duplication (all Windows code to handle attributes)

    • C code is restricted to the minimum: it's now only a wrapper to opendir+readdir and FindFirstFileW/FindNextFileW

    • os._scandir() directly calls opendir(), it's no more delayed to the first call to next(), to report errors earlier. In practice, I don't think that anymore will notify :-p

    • don't allocate a buffer to store a HANDLE: use directly a HANDLE

    • C code: use #ifdef inside functions, not outside

    • On Windows, os._scandir() appends "*.*" instead of "*" to the path, to mimic os.listdir()

    • put information about cache and syscall directly in the doc of DirEntry methods

    • remove promise of performances from scandir doc: be more factual, explain when syscalls are required or not

    • expose DT_UNKOWN, DT_DIR, DT_REG, DT_LNK constants in the posix module; but I prefer to not document them: use directly scandir!

    • rewrite completly unit test:

      • reuse test.support
      • compare DirEntry attributes with the result of functions (ex: os.stat() or os.path.islink())
    • add tests on removed directory, removed file and broken symbolic link

    • remove ":" from repr(DirEntry) result, it's now "<DirEntry 'xxx'>"; drop __str__ method (by default, __str__ calls __repr__)

    • use new OSError subclasses (FileNotFoundError)

    • DirEntry methods: use stat.S_ISxxx() methods instead of "st.st_mode & 0o170000 == S_IFxxx"

    @vstinner
    Copy link
    Member

    Updated patch. Thanks for the review Serhiy.

    @vstinner
    Copy link
    Member

    (I regenerated scandir-4.patch, I had a local private changeset. I removed it.)

    @vstinner
    Copy link
    Member

    Patch version 5:

    • Use None value for the d_type instead of DT_UNKNOW to *prepare* support for platforms where the dirent structure has no d_type field. Serhiy guess that such platform have no DT_xxx constant neither. DirEntry doesn't DT_xxx constants if d_type is None

    • Fix test_removed_file() and test_removed_dir() tests if d_type is unknown (I tested manually by forcing d_type to None)

    • Use os.lstat() instead of os.stat(follow_symlinks=False) in DirEntry.inode(), it's a little bit faster even if it's the same syscall. The cost probably comes from the code parsing Python parameters, especially the keyword

    @vstinner
    Copy link
    Member

    bench_scandir.py: dummy benchmark to compare listdir+stat vs scandir+is_dir.

    os.scandir() is always slower than os.listdir() on tmpfs and ext4 partitions of a local hard driver.

    I will try with NFS.

    Results with scandir-5.patch on Fedora 21 (Linux).

    --- Using /home/haypo (ext4, hard drive) ---

    Test listdir+stat vs scandir+is_dir
    Temporary directory: /home/haypo/tmpji8uviyl
    Create 100000 files+symlinks...
    Create 10000 directories...
    # entries: 210000
    Benchmark...
    listdir: 2187.3 ms
    scandir: 1047.2 ms
    listdir: 494.4 ms
    scandir: 1048.1 ms
    listdir: 493.0 ms
    scandir: 1042.6 ms

    Result:
    listdir: min=493.0 ms (2.3 us per file), max=2187.3 ms (10.4 us per file)
    scandir: min=1042.6 ms (5.0 us per file), max=1048.1 ms (5.0 us per file)
    scandir is between 0.5x and 2.1x faster

    --- Using /tmp (tmpfs, full in memory) ---

    Test listdir+stat vs scandir+is_dir
    Temporary directory: /tmp/tmp6_zk3mqo
    Create 100000 files+symlinks...
    Create 10000 directories...
    # entries: 210000
    Benchmark...
    listdir: 405.4 ms
    scandir: 1001.3 ms
    listdir: 403.3 ms
    scandir: 1024.2 ms
    listdir: 408.1 ms
    scandir: 1013.5 ms

    Remove the temporary directory...

    Result:
    listdir: min=403.3 ms (1.9 us per file), max=408.1 ms (1.9 us per file)
    scandir: min=1001.3 ms (4.8 us per file), max=1024.2 ms (4.9 us per file)
    scandir is between 0.4x and 0.4x faster

    @vstinner
    Copy link
    Member

    Similar benchmark result on my laptop which has a SSD (ext4 filesystem tool, but I guess that the directory is small and fits into the memory).

    Note: I'm not sure that the "between ...x and ...x faster" are revelant, I'm not sure that my computation is correct.

    Test listdir+stat vs scandir+is_dir
    Temporary directory: /home/haypo/prog/python/default/tmpbrn4r2tv
    Create 100000 files+symlinks...
    Create 10000 directories...
    # entries: 210000
    Benchmark...
    listdir: 1730.7 ms
    scandir: 1029.4 ms
    listdir: 476.8 ms
    scandir: 1058.3 ms
    listdir: 485.4 ms
    scandir: 1041.1 ms

    Remove the temporary directory...

    Result:
    listdir: min=476.8 ms (2.3 us per file), max=1730.7 ms (8.2 us per file)
    scandir: min=1029.4 ms (4.9 us per file), max=1058.3 ms (5.0 us per file)
    scandir is between 0.5x and 1.7x faster

    @vstinner
    Copy link
    Member

    Benchmark on NFS. Client: my laptop, connected to the LAN by wifi. Server: desktop, connected to the LAN by PLC. For an unknown reason, the creation of files, symlinks and directories is very slow (more than 30 seconds while I reduced the number of files & directories).

    Test listdir+stat vs scandir+is_dir
    Temporary directory: /home/haypo/mnt/tmp5aee0eic
    Create 1000 files+symlinks...
    Create 1000 directories...
    # entries: 3000
    Benchmark...
    listdir: 14478.0 ms
    scandir: 732.1 ms
    listdir: 9.9 ms
    scandir: 14.9 ms
    listdir: 7.5 ms
    scandir: 12.9 ms

    Remove the temporary directory...

    Result:
    listdir: min=7.5 ms (2.5 us per file), max=14478.0 ms (4826.0 us per file)
    scandir: min=12.9 ms (4.3 us per file), max=732.1 ms (244.0 us per file)
    scandir is between 0.0x and 1119.6x faster

    @vstinner
    Copy link
    Member

    I enhanced bench_scandir2.py to have one command to create a directory or a different command to run the benchmark.

    All commands:

    • create: create the directory for tests (you don't need this command, you can also use an existing directory)
    • bench: compare scandir+is_dir to listdir+stat, cached
    • bench_nocache: compare scandir+is_dir to listdir+stat, flush disk caches
    • bench_nostat: compare scandir to listdir, cached
    • bench_nostat_nocache: compare scandir to listdir, flush disk caches

    --

    New patch version 6 written for performances, changes:

    • On POSIX, decode the filename in C
    • _scandir() iterator now yields list of items, instead of an single item

    With my benchmarks, I see that yielding 10 items reduces the overhead of scandir on Linux (creating DirEntry objects). On Windows, the number of items has no effect. I prefer to also fetch entries 10 per 10 to mimic POSIX. Later, on POSIX, we may use directly getdents() and yield the full getdents() result at once. according to strace, it's currently around 800 entries per getdents() syscall.

    Results of bench_scandir2.py on my laptop using SSD and ext4 filesystem:

    • 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
    • bench: 1.3x faster (scandir: 164.9 ms, listdir: 216.3 ms)
    • bench_nostat: 0.4x faster (scandir: 104.0 ms, listdir: 38.5 ms)
    • bench_nocache: 2.1x faster (scandir: 460.2 ms, listdir: 983.2 ms)
    • bench_nostat_nocache: 2.2x faster (scandir: 480.4 ms, listdir: 1055.6 ms)

    Results of bench_scandir2.py on my laptop using NFS share (server: ext4 filesystem) and slow wifi:

    • 11,100 entries (1,0000 files, 100 symlinks, 1000 directories)
    • bench: 1.3x faster (scandir: 22.5 ms, listdir: 28.9 ms)
    • bench_nostat: 0.2x faster (scandir: 14.3 ms, listdir: 3.2 ms)

    *** Timings with NFS are not reliable. Sometimes, a directory listing takes more than 30 seconds, but then it takes less than 100 ms. ***

    Results of bench_scandir2.py on a Windows 7 VM using NTFS:

    • 11,100 entries (10,000 files, 1,000 directories, 100 symlinks)
    • bench: 9.9x faster (scandir: 58.3 ms, listdir: 578.5 ms)
    • bench_nostat: 0.3x faster (scandir: 28.5 ms, listdir: 7.6 ms)

    Results of bench_scandir2.py on my desktop PC using tmpfs (/tmp):

    • 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
    • bench: 1.3x faster (scandir: 149.2 ms, listdir: 189.2 ms)
    • bench_nostat: 0.3x faster (scandir: 91.9 ms, listdir: 27.1 ms)

    Results of bench_scandir2.py on my desktop PC using HDD and ext4:

    • 110,100 entries (100000 files, 100 symlinks, 10000 directories)
    • bench: 1.4x faster (scandir: 168.5 ms, listdir: 238.9 ms)
    • bench_nostat: 0.4x faster (scandir: 107.5 ms, listdir: 41.9 ms)

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Feb 13, 2015

    Hi Victor, I thank you for your efforts here, especially your addition of DirEntry.inode() and your work on the tests.

    However, I'm a bit frustrated that you just re-implemented the whole thing without discussion: I've been behind scandir and written the first couple of implementations, and I asked if you could review my code, but instead of reviewing it or interacting with my fairly clear desire for the all-C version, you simply turn up and say "I've rewritten it, can you please review?"

    You did express concern off list about an all-C vs part-Python implementation, but you said "it's still unclear to me which implementation should be added to Python" and "I don't feel able today to take the right decision. We may use python-dev to discuss that". But I didn't see that discussion at all, and I had expressed fairly clearly (both here and off list), with benchmarks, why I thought it should be the all-C version.

    So it was a bit of a let down for a first-time Python contributor. Even a simple question would have been helpful here: "Ben, I really think this much C code for a first version is bug-prone; what do you think about me re-implementing it and comparing?"

    TLDR: I'd like to see os.scandir() in Python even if it means the slower implementation, but some discussion would have been nice. :-)

    So if it's not too late, I'll continue that discussion in my next message.

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Feb 13, 2015

    To continue the actual "which implementation" discussion: as I mentioned last week in http://bugs.python.org/msg235458, I think the benchmarks above show pretty clearly we should use the all-C version.

    For background: PEP-471 doesn't add any new functionality, and especially with the new pathlib module, it doesn't make directory iteration syntax nicer either: os.scandir() is all about letting the OS give you whatever info it can *for performance*. Most of the Rationale for adding scandir given in PEP-471 is because it can be so so much faster than listdir + stat.

    My original all-C implementation is definitely more code to review (roughly 800 lines of C vs scandir-6.patch's 400), but it's also more than twice as fast. On my Windows 7 SSD just now, running benchmark.py:

    Original scandir-2.patch version:
    os.walk took 0.509s, scandir.walk took 0.020s -- 25.4x as fast
    
    New scandir-6.patch version:
    os.walk took 0.455s, scandir.walk took 0.046s -- 10.0x as fast
    

    So the all-C implementation is literally 2.5x as fast on Windows. (After both tests, just for a sanity check, I ran the ctypes version as well, and it said about 8x as fast for both runs.)

    Then on Linux, not a perfect comparison (different benchmarks) but shows the same kind of trend:

    Original scandir-2.patch benchmark (http://bugs.python.org/msg228857):
    os.walk took 0.860s, scandir.walk took 0.268s -- 3.2x as fast
    
    New scandir-6.patch benchmark (http://bugs.python.org/msg235865) -- note that "1.3x faster" should actually read "1.3x as fast" here:
    bench: 1.3x faster (scandir: 164.9 ms, listdir: 216.3 ms)
    

    So again, the all-C implementation is 2.5x as fast on Linux too.

    And on Linux, the incremental improvement provided by scandir-6 over listdir is hardly worth it -- I'd use a new directory listing API for 3.2x as fast, but not for 1.3x as fast.

    Admittedly a 10x speed gain (!) on Windows is still very much worth going for, so I'm positive about scandir even with a half-Python implementation, but hopefully the above shows fairly clearly why the all-C implementation is important, especially on Linux.

    Also, if the consensus is in favour of slow but less C code, I think there are further tweaks we can make to the Python part of the code to improve things a bit more.

    @vstinner
    Copy link
    Member

    Note: bench_scandir2.py is a micro-benchmark. Ben's benchmark using walk() is more realistic, but I'm interested by micro-benchmark results.

    scandir-2.patch is faster than scandir-6.patch, much fast on Windows.

    Result of bench (cached): scandir-6.patch => scandir-2.patch

    • Windows 7 VM using NTFS: 14.0x faster => 44.6x faster
    • laptop using NFS share: 1.3x faster => 5.2x faster *** warning: unstable results ***
    • desktop PC using /tmp: 1.3x faster => 3.8x faster
    • laptop using SSD and ext4: 1.3x faster => 2.8x faster
    • desktop PC using HDD and ext4: 1.4x faster => 1.4x faster

    Benchmark using scandir-2.patch
    -------------------------------

    Benchmark results with the full C implementation, scandir-2.patch.

    [ C implementation ] Results of bench_scandir2.py on my desktop PC using HDD and ext4:

    • 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
    • bench: 3.5x faster than listdir (scandir: 63.6 ms, listdir: 219.9 ms)
    • bench_nostat: 0.8x faster than listdir (scandir: 52.8 ms, listdir: 42.4 ms)
    • bench_nocache: 1.4x faster than listdir (scandir: 3745.2 ms, listdir: 5217.6 ms)
    • bench_nostat_nocache: 1.4x faster than listdir (scandir: 3834.1 ms, listdir: 5380.7 ms)

    [ C implementation ] Results of bench_scandir2.py on my desktop PC using /tmp (tmpfs):

    • 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
    • bench: 3.8x faster than listdir (scandir: 46.7 ms, listdir: 176.4 ms)
    • bench_nostat: 0.7x faster than listdir (scandir: 38.6 ms, listdir: 28.6v)

    [ C implementation ] Results of bench_scandir2.py on my Windows 7 VM using NTFS:

    • 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
    • bench: 44.6x faster than listdir (scandir: 125.0 ms, listdir: 5574.9 ms)
    • bench_nostat: 0.8x faster than listdir (scandir: 92.4 ms, listdir: 74.7 ms)

    [ C implementation ] Results of bench_scandir2.py on my laptop using SSD and ext4:

    • 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
    • bench: 3.6x faster (scandir: 59.4 ms, listdir: 213.3 ms)
    • bench_nostat: 0.8x faster than listdir (scandir: 50.0 ms, listdir: 38.6)
    • bench_nocache: 2.8x faster than listdir (scandir: 377.5 ms, listdir: 1073.1)
    • bench_nostat_nocache: 2.8x faster than listdir (scandir: 370.9 ms, listdir: 1055.0)

    [ C implementation ] Results of bench_scandir2.py on my laptop using tmpfs:

    • 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
    • bench: 4.0x faster than listdir (scandir: 43.7 ms, listdir: 174.1)
    • bench_nostat: 0.7x faster than listdir (scandir: 35.2 ms, listdir: 24.5)

    [ C implementation ] Results of bench_scandir2.py on my laptop using NFS share and slow wifi:

    • 11,010 entries (10,000 files, 10 symlinks, 1,000 directories)
    • bench: 5.2x faster than listdir (scandir: 4.2 ms, listdir: 21.7 ms)
    • bench_nostat: 0.6x faster than listdir (scandir: 3.3 ms, listdir: 1.9 ms)

    *** Again, results with NFS are not reliable. Sometimes listing a directory conten takes 40 seconds. It's maybe a network issue. ***

    It looks like d_type can be DT_UNKNOWN on NFS.

    Benchmark using scandir-6.patch
    -------------------------------

    I rerun benchmark with scandir-6.patch with more files to compare the two benchmarks.

    [ C implementation ] Results of bench_scandir2.py on my Windows 7 VM using NTFS:

    • 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
    • bench: 14.0x faster than listdir (scandir: 399.0 ms, listdir: 5578.7 ms)
    • bench_nostat: 0.3x faster than listdir (scandir: 279.2 ms, listdir: 76.1 ms)

    [ C implementation ] Results of bench_scandir2.py on my laptop using NFS share and slow wifi:

    • 11,010 entries (10,000 files, 10 symlinks, 1,000 directories)
    • bench: 1.5x faster than listdir (scandir: 14.8 ms, listdir: 21.4 ms)
    • bench_nostat: 0.2x faster than listdir (scandir: 10.6 ms, listdir: 2.2 ms)

    @vstinner
    Copy link
    Member

    Hi Ben,

    2015-02-13 4:51 GMT+01:00 Ben Hoyt <report@bugs.python.org>:

    Hi Victor, I thank you for your efforts here, especially your addition of DirEntry.inode() and your work on the tests.

    The addition of inode() should still be discussed on python-dev. The
    remaining question is if the inode number (st_ino) is useful even if
    st_dev is missing. I didn't added the inode() method yet to the PEP
    because of that.

    However, I'm a bit frustrated that you just re-implemented the whole thing without discussion:
    (...)
    So it was a bit of a let down for a first-time Python contributor. Even a simple question would have been helpful here: "Ben, I really think this much C code for a first version is bug-prone; what do you think about me re-implementing it and comparing?"

    I'm really sorry, I didn't want to frustrate you :-( Sorry for not
    being available, I also expected someone else to review your code, but
    it didn't happen. And I *want* your opinon on design choices. Sorry if
    I wasn't explicit about that in my last emails.

    Sorry, I was very busy last months on other projects (like asyncio). I
    want scandir() to be included in Python 3.5! The3.5 alpha 1 release
    was a reminder for me.

    I didn't reimplement everything. Almost all code comes from your work.
    I just adapted it to Python 3.5.

    I've been behind scandir and written the first couple of implementations, and I asked if you could review my code, but instead of reviewing it or interacting with my fairly clear desire for the all-C version, you simply turn up and say "I've rewritten it, can you please review?"

    I'm not happy of the benchmark results of the C+Python implementation
    (scandir-6.patch is supposed to be the fasted implementation of
    C+Python).

    I tried to produce as much benchmark results as possible to be able to
    take a decision. I used:

    • hardware: SSD, HDD and tmpfs (RAM)
    • file system: ext4, tmpfs, NFS/ext4
    • operating systems: Linux (Fedora 21), Windows 7

    I will try to summarize benchmark results to write an email to
    python-dev, to make a choice between the full C implementation vs
    C+Python implementation. (ctypes is not an option, it's not reliable,
    portability matters.)

    To continue the actual "which implementation" discussion: as I mentioned last week in http://bugs.python.org/msg235458, I think the benchmarks above show pretty clearly we should use the all-C version.

    It's quite clear that the C implementation is always faster than C+Python.

    My point is that we have to make a choice, C+Python is a nice
    compromise because it uses less C code, and C code is more expensive
    to *maintain*.

    For background: PEP-471 doesn't add any new functionality, and especially with the new pathlib module, it doesn't make directory iteration syntax nicer either: os.scandir() is all about letting the OS give you whatever info it can *for performance*. Most of the Rationale for adding scandir given in PEP-471 is because it can be so so much faster than listdir + stat.

    Good point :-)

    In many setup (hardware, operating system, file system), I see a low
    speedup (less than 2x faster). The whole purpose of the PEP-471
    becomes unclear if the speedup is not at least 2x.

    Quicky summary of benchmarks:

    • using C+Python (scandir-6.patch), the benchmark ("bench") is only
      faster than 2x on Windows, in the other cases it's between 1.3x and
      1.4x faster
    • using C (scandir-2.patch), the benchmark ("bench") is at least 3.5x
      faster, it's between 3.5x and 44.6x faster, the most interesting
      speedup is on Windows (44.6x faster!)

    @vstinner
    Copy link
    Member

    Result of bench (cached): scandir-6.patch => scandir-2.patch
    (...)
    laptop using SSD and ext4: 1.3x faster => 2.8x faster
    desktop PC using HDD and ext4: 1.4x faster => 1.4x faster

    Oops, I copied the wrong numbers. scandir-2.patch is faster than that!

    • laptop using SSD and ext4: 1.3x faster => 3.6x faster
    • desktop PC using HDD and ext4: 1.4x faster => 3.5x faster

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Feb 13, 2015

    As I've mentioned in http://bugs.python.org/issue22524#msg231703

    os.walk size 7925376343, scandir.walk size 5534939617 -- NOT EQUAL!

    os.walk and scandir.walk do a different work here.

    I don't see that it is acknowledged so I assume the benchmark is not fixed.

    If the fixed (same work) benchmark is used then there is no performance difference on Linux. See the attached file http://bugs.python.org/file37292/get_tree_size_listdir.diff

    Note: the performance improvements on Windows along should be enough to accept os.scandir() even if Linux version won't be improved (if the behavior is documented).

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Feb 14, 2015

    Akari: yes, I'm aware of this, and it's definitely a concern to me -- it may be that scandir has a bug with counting the size of certain non-file directory entries, or my implementation of os.walk() via scandir is not quite correct. I'll look at it before too long!

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Feb 19, 2015

    BTW, I replied to Victor privately, but for the thread: no worries, and apology accepted! :-) I'm working on updating my C patch with his feedback, and will update this issue hopefully in the next few days.

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Feb 27, 2015

    Okay, here's the next version of the patch. I've updated scandir-2.patch with a number of modifications and several fixes to the C code.

    This includes Victor's scandir-6.patch test suite (with minor mods) and my original documentation. Note that I haven't looked at either the documentation or the tests too closely, but I'm uploading the patch in any case so Victor (or others) can start reviewing the C code.

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Mar 6, 2015

    Updated version of the patch after Victor's code review of scandir-7 and a couple of his comments offline. Full commit log is available on my GitHub project in posixmodule_scandir_main.c if anyone's interested. Again, I haven't looked closely at the docs or tests yet, but hopefully the C code is getting pretty close now!

    KNOWN ISSUE: There's a reference leak in the POSIX version (found with "python -m test -R 3:2 test_os"). I'm a Windows guy and this is my first serious Python C extension code, so I'm not sure the best way to track this down. Would someone like to either point out the bug :-) or help me with how to track this down?

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Mar 7, 2015

    Oops, I'm sorry re previous comment -- looks like I forgot to attach scandir-8.patch. Now attached. Please re-read my previous comment and review. :-)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 8, 2015

    New changeset d04d5b9c29f6 by Victor Stinner in branch 'default':
    Issue bpo-22524: New os.scandir() function, part of the PEP-471: "os.scandir()
    https://hg.python.org/cpython/rev/d04d5b9c29f6

    @vstinner
    Copy link
    Member

    vstinner commented Mar 8, 2015

    KNOWN ISSUE: There's a reference leak in the POSIX version (found with "python -m test -R 3:2 test_os").

    Don't worry, it was a simple refleak, I fixed it:

    diff -r 392d3214fc23 Modules/posixmodule.c
    --- a/Modules/posixmodule.c     Sun Mar 08 01:58:04 2015 +0100
    +++ b/Modules/posixmodule.c     Sun Mar 08 02:08:05 2015 +0100
    @@ -16442,6 +16442,7 @@ DirEntry_fetch_stat(DirEntry *self, int 
             result = STAT(path, &st);
         else
             result = LSTAT(path, &st);
    +    Py_DECREF(bytes);
     
         if (result != 0)
             return PyErr_SetFromErrnoWithFilenameObject(PyExc_OSError, self->path);

    @vstinner
    Copy link
    Member

    vstinner commented Mar 8, 2015

    I commited scandir-8.patch with the documentation of scandir-5.patch. I wanted os.scandir() to be part of Python 3.5 alpha 2.

    Thanks for your patience Ben ;-)

    We should now watch buildbots to check how they appreciate os.scandir().
    http://buildbot.python.org/all/waterfall?category=3.x.stable&category=3.x.unstable

    Right now, test_os fails on Windows buildbots because of a regression of bpo-23524. test_os.TestScandir succeeded on AMD64 Windows7 SP1 3.x.

    We can maybe keep the issue open if someone still wants to enhance the C code, tests or the doc.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 8, 2015

    New changeset 60e5c34ec53a by Victor Stinner in branch 'default':
    Issue bpo-22524: Fix os.scandir() for platforms which don't have a d_type field in
    https://hg.python.org/cpython/rev/60e5c34ec53a

    @vstinner
    Copy link
    Member

    vstinner commented Mar 8, 2015

    Oh oh, OpenIndiana doesn't support d_type: the dirent structure has no d_type field and DT_xxx constants like DT_UNKNOWN are not defined.

    gcc -Wsign-compare -g -O0 -Wall -Wstrict-prototypes -I/usr/local/include/ncursesw -m64 -Werror=declaration-after-statement -I. -IInclude -I./Include -DPy_BUILD_CORE -c ./Modules/_collectionsmodule.c -o Modules/_collectionsmodule.o
    ./Modules/posixmodule.c: In function 'DirEntry_is_symlink':
    ./Modules/posixmodule.c:16393: error: 'DT_UNKNOWN' undeclared (first use in this function)
    ./Modules/posixmodule.c:16393: error: (Each undeclared identifier is reported only once
    ./Modules/posixmodule.c:16393: error: for each function it appears in.)
    ./Modules/posixmodule.c:16394: error: 'DT_LNK' undeclared (first use in this function)
    ./Modules/posixmodule.c:16398: warning: control reaches end of non-void function
    ./Modules/posixmodule.c: In function 'DirEntry_test_mode':
    ./Modules/posixmodule.c:16519: error: 'DT_LNK' undeclared (first use in this function)
    ./Modules/posixmodule.c:16520: error: 'DT_UNKNOWN' undeclared (first use in this function)
    ./Modules/posixmodule.c:16559: error: 'DT_DIR' undeclared (first use in this function)
    ./Modules/posixmodule.c:16561: error: 'DT_REG' undeclared (first use in this function)
    ./Modules/posixmodule.c: In function 'ScandirIterator_iternext':
    ./Modules/posixmodule.c:16973: error: 'struct dirent' has no member named 'd_type'
    make: *** [Modules/posixmodule.o] Error 1
    make: *** Waiting for unfinished jobs....

    @vstinner
    Copy link
    Member

    vstinner commented Mar 8, 2015

    I opened the issue bpo-23605: "Use the new os.scandir() function in os.walk()".

    @vstinner
    Copy link
    Member

    vstinner commented Mar 9, 2015

    It looks like test_os now pass on all buildbots, including OpenIndiana. I close the issue.

    @vstinner vstinner closed this as completed Mar 9, 2015
    @vstinner
    Copy link
    Member

    vstinner commented Mar 9, 2015

    FYI os.scandir() is part of Python 3.5 alpha 2 which is now available
    (including installers for Windows):
    https://www.python.org/downloads/release/python-350a2/

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Mar 10, 2015

    Hi Victor. I'm attaching a patch to the What's New doc. I know it's a minor thing, but I really read the What's New in Python x.y page whenever it comes out, so making this as specific as possible is good. The changes are:

    1. Be more specific and flesh it out slightly: for example, instead of just saying "PEP-471 adds a new directory iteration function", say very briefly why it was added (to speed up os.walk) and what it achieves.
    2. Move "PEP and implementation by..." under correct correct PEP -- it's currently under PEP-475.
    3. Be more specific (as per point 1) under os.scandir(), and link to issue
    4. Add a note about the os.walk() performance improvement under Optimizations. I realize the changes to os.walk() haven't been committed yet, but hopefully they will be quite soon.

    Any feedback? Are you happy to commit these changes?

    I intend to do some review of the scandir/DirEntry docs as well. I'll send those in the next few days.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 10, 2015

    New changeset 8d76dabd40f6 by Victor Stinner in branch 'default':
    Issue bpo-22524: Rephrase scandir addition in What's New in Python 3.5
    https://hg.python.org/cpython/rev/8d76dabd40f6

    @vstinner
    Copy link
    Member

    2015-03-10 12:36 GMT+01:00 Ben Hoyt <report@bugs.python.org>:

    I intend to do some review of the scandir/DirEntry docs as well. I'll send those in the next few days.

    Open maybe a new issue if you want to enhance the doc.

    @benhoyt
    Copy link
    Mannequin Author

    benhoyt mannequin commented Mar 10, 2015

    Thanks. Will do!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants