Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-117587: Add C implementation of os.path.abspath #117855

Draft
wants to merge 66 commits into
base: main
Choose a base branch
from

Conversation

nineteendo
Copy link
Contributor

@nineteendo nineteendo commented Apr 13, 2024

Benchmark

posixpath.py by @eryksun:

absolute, with "/a" repeated
len   speedup
100   1.521x
500   1.157x
1000  1.128x
2000  1.068x
4000  1.019x

relative 'a', with cwd length
len     speedup
93+2    1.861x
548+2   1.729x
1003+2  1.881x
2043+2  1.556x
4058+2  1.548x

ntpath.py

script
::speedup-posixpath.abspath.bat
@echo off
echo 10 chars && call main\python -m timeit -s "import os" "os.path.abspath('a/' * 5)" && call speedup-posixpath.abspath\python -m timeit -s "import os" "os.path.abspath('a/' * 5)"
echo 100 chars && call main\python -m timeit -s "import os" "os.path.abspath('a/' * 50)" && call speedup-posixpath.abspath\python -m timeit -s "import os" "os.path.abspath('a/' * 50)"
echo 1000 chars && call main\python -m timeit -s "import os" "os.path.abspath('a/' * 500)" && call speedup-posixpath.abspath\python -m timeit -s "import os" "os.path.abspath('a/' * 500)"
10 chars
500000 loops, best of 5: 635 nsec per loop # before
500000 loops, best of 5: 517 nsec per loop # after
# -> 1.23x faster
100 chars
200000 loops, best of 5: 1.53 usec per loop # before
200000 loops, best of 5: 1.23 usec per loop # after
# -> 1.24x faster
1000 chars
50000 loops, best of 5: 9.87 usec per loop # before
50000 loops, best of 5: 7.73 usec per loop # after
# -> 1.28x faster

@nineteendo nineteendo marked this pull request as ready for review April 13, 2024 20:03
@erlend-aasland erlend-aasland changed the title gh-117587: Speedup posixpath.abspath gh-117587: Add C implementation of posixpath.abspath Apr 13, 2024
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Modules/posixmodule.c Outdated Show resolved Hide resolved
Python/fileutils.c Outdated Show resolved Hide resolved
nineteendo and others added 2 commits April 13, 2024 23:43
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Lib/posixpath.py Outdated Show resolved Hide resolved
@nineteendo
Copy link
Contributor Author

The same approach doesn't work on Window, because ntpath.join() is much slower than nt._getfullpathname():

PS C:\Users\wanne\cpython> python -m timeit -s "import nt" "nt._getfullpathname('.')"; python -m timeit -s "import os" "os.path.join(r'C:\Users\wanne\cpython', '')"
1000000 loops, best of 5: 286 nsec per loop # _getfullpathname
200000 loops, best of 5: 1.41 usec per loop # join

@eryksun
Copy link
Contributor

eryksun commented Apr 14, 2024

The same approach doesn't work on Window, because ntpath.join() is much slower than nt._getfullpathname()

Preferably, abspath() should not be naively implemented by simply joining a relative path with the working directory. It doesn't work for drive-relative paths. That's a long-standing bug in the generic implementation ntpath._abspath_fallback().

A process has a working directory on each A-Z drive. This gets used to resolve drive-relative paths such as "Z:spam\eggs" -> "Z:\path\to\working_directory\spam\eggs". On NT based Windows systems, a process can optionally store the working directories on drives in special "=<letter>:" environment variables that are inherited by child processes, such as "=Z:". If the environment variable isn't set for a drive, or it refers to a path that doesn't exist, the drive's working directory defaults to the root directory. If the overall process working directory is on the same drive, it supersedes the environment variable. Python opts into this scheme on Windows by setting these environment variables in the implementation of os.chdir(). However, for resolving drive-relative paths, Python relies on WinAPI GetFullPathNameW() rather than trying to implement the scheme on its own.

Python also relies on GetFullPathNameW() to implement some Windows-specific path rules to ensure that the result of abspath() is as explicit as possible, but without having to hard code rules that may change. This includes special casing the removal of trailing dots and spaces from the final path component, and also reserving DOS device names such as "con". For the latter, it's always "con" -> "\\.\con". But edge cases can vary across Windows versions, such as for qualified paths (e.g. ".\con" or "C:\con") and file extensions (e.g. "con.txt"), and for the set of device names (e.g. supporting "conin$" and "conout$").

In C, abspath() on Windows would be implemented via _Py_normpath_and_size() followed by _PyOS_getfullpathname(), with the empty string as a special case. However, it has to fall back on a generic implementation instead of _PyOS_getfullpathname() if the path contains embedded null characters, since WinAPI GetFullPathNameW() requires the lpFileName parameter to be a null-terminated string.

@nineteendo
Copy link
Contributor Author

Preferably, abspath() should not be naively implemented by simply joining a relative path with the working directory. It doesn't work for drive-relative paths. That's a long-standing bug in the generic implementation ntpath._abspath_fallback().

I didn't even get that far in the testing process. I already knew it wasn't going to work.
I did notice this while trying to eliminate normpath() from the current implementation.

@nineteendo
Copy link
Contributor Author

cc @barneygale, @zooba, @serhiy-storchaka

@nineteendo nineteendo marked this pull request as draft April 27, 2024 07:23
@nineteendo
Copy link
Contributor Author

@eryksun, are there still edge cases you would like to fix? Or did I cover everything?

@nineteendo
Copy link
Contributor Author

nineteendo commented May 4, 2024

Let's leave optimising ntpath.isdevdrive() until #118355 is merged.

@nineteendo nineteendo marked this pull request as draft May 6, 2024 15:31
@nineteendo

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants