-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support of file descriptor in os.scandir() #70184
Comments
For now os.scandir() on Unix is implemented using opendir()/readdir()/closedir(). It accepts bytes and str pathname. But most functions in the os module that accept a pathname, accept also an open file descriptor. It is possible to implement this feature in scandir() with using fdopendir() instead of opendir(). This would allow to add a support of the dir_fd parameter in scandir(). And that would allow to implement os.fwalk() with scandir() and make more efficient implementation of os.walk() (because we no longer need to walk long path for deep directories, see bpo-15200). |
Yeah, it was discussed when the PEP-471 was designed, but it was already hard to design os.scandir() without supporting fd as os.scandir() parameter. It's more complex because we have to handle the lifetime of the file descriptor especially if it's exposed in a public attribute. |
Supporting file descriptor was also discussed when pathlib.Path was designed, but there was similar questions on the lifetime of the file descriptor. (Who is able to close it? When? Is it ok to close it using os.close? etc.) |
Proposed patch adds support for file descriptors in os.scandir() and implements os.fwalk() with os.scandir(). The effect of using os.scandir() in os.fwalk(): $ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.walk("/usr/lib"))'
1 loop, best of 5: 934 msec per loop
$ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.walk("/usr/lib", topdown=False))'
1 loop, best of 5: 718 msec per loop
$ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.fwalk("/usr/lib"))'
Unpatched: 1 loops, best of 5: 1.78 sec per loop
Patched: 1 loop, best of 5: 934 msec per loop
$ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.fwalk("/usr/lib", topdown=False))'
Unpatched: 1 loops, best of 5: 1.76 sec per loop
Patched: 1 loop, best of 5: 947 msec per loop |
Thank you for the review Josh. Updated patch addresses your comments and adds yet few microoptimizations. |
Resolved conflicts in the documentation. |
I'm wondering is it possible to implement this feature on Windows? |
On Windows, scandir() is implemented with FindFirstFile() which takes strings. This function creates a handle which should then be passed to FindNextFile(). There is no similar function taking a directory handle, so it's not possible to implement os.scandir(fd) on Windows. It seems like the gnulib emulates fdopendir() on Windows, and its documentation contains warnings: |
In 3.5+ the CRT has O_OBTAIN_DIR (0x2000) for opening a directory, i.e. to call CreateFile with backup semantics. A directory can be read via GetFileInformationByHandleEx 1 using the information classes FileIdBothDirectoryRestartInfo and FileIdBothDirectoryInfo. This info class is just a simplified wrapper around the more powerful system call NtQueryDirectoryFile 2. The implementation details could be hidden behind _Py_opendir, _Py_fdopendir, _Py_readdir, and _Py_closedir -- allowing a common implementation of the high-level listdir() and scandir() functions. I wrote a ctypes prototype of listdir() along these lines. One feature that's lost in using GetFileInformationByHandleEx to list a directory is the ability to do wildcard filtering. However, Python listdir and scandir never uses wildcard filtering, so it's no real loss. FindFirstFile implements this feature via the FileName parameter of NtQueryDirectoryFile. First it translates DOS wildcards to NT's set of 5 wildcards. There's the native NT '*' and '?', plus the quirky semantics of MS-DOS via '<', '>', and '"', i.e. DOS_STAR, DOS_QM, and DOS_DOT. See FsRtlIsNameInExpression 3 for a description of these wildcard characters. |
Thank you for your investigation Eryk. Helpful as always. Since I have no access to Windows I left this feature Unix-only. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: