Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize pathlib.PurePath.__fspath__() #102783

Closed
barneygale opened this issue Mar 17, 2023 · 5 comments
Closed

Optimize pathlib.PurePath.__fspath__() #102783

barneygale opened this issue Mar 17, 2023 · 5 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir topic-pathlib type-feature A feature request or enhancement

Comments

@barneygale
Copy link
Contributor

barneygale commented Mar 17, 2023

Feature or enhancement

Return an unnormalized path from pathlib.PurePath.__fspath__()

Pitch

Code like open(Path('./README.txt')) or Path('/home//barney').iterdir() shouldn't require us to normalize the path in pathlib (e.g. remove . segments, doubled slashes, etc), as OS APIs are perfectly happy with unnormalized paths.

We can improve the performance of most Path methods, and the effective performance of passing a Path object to any API that accepts os.PathLike, by skipping normalization in __fspath__().

Prerequisites

Pathlib must not normalize paths on construction:

Pathlib's normalization must not change the meaning of paths:

Previous discussion

Linked PRs

@barneygale barneygale added the type-feature A feature request or enhancement label Mar 17, 2023
@AlexWaygood AlexWaygood added performance Performance or resource usage topic-pathlib labels Mar 17, 2023
@eryksun
Copy link
Contributor

eryksun commented Mar 17, 2023

On Windows, an exception should be made to normalize extended paths in __fspath__(). If an extended path isn't normalized, the open will fail. One needs to replace slashes with backslashes; collapse repeated slashes; and resolve "." and ".." components.

Scripts have to be prepared to handle extended paths coming from os.path.realpath(), __file__, and attributes of the sys module such as executable, prefix, argv, and path. Extended paths can also come from extension modules that use API functions such as GetModuleFileNameW() (if an EXE/DLL was loaded using an extended path), GetFinalPathNameByHandleW(), and any of the PathCch*() functions that implements the flag PATHCCH_ALLOW_LONG_PATHS.

For path construction, there's an open issue that proposes to modify ntpath.join() to fully normalize any drive path or UNC path via normpath(). For paths without a drive, it's proposed to replace slashes with backslashes and collapse repeated slashes. For example, this change would support getting a useful result from something like ntpath.join(filename, "../spam//eggs.py") given filename is an extended path such as "\\?\C:\foo\bar.py".

@barneygale
Copy link
Contributor Author

On Windows, an exception should be made to normalize extended paths in __fspath__(). If an extended path isn't normalized, the open will fail. One needs to replace slashes with backslashes; collapse repeated slashes; and resolve "." and ".." components.

I don't think pathlib's normalization should change the meaning of paths. It sounds like we should adjust the normalization routine to preserve forward slashes, repeated slashes, and "." components, for these kinds of path. ".." components are already be preserved I think.

@eryksun
Copy link
Contributor

eryksun commented Mar 17, 2023

I don't think pathlib's normalization should change the meaning of paths. It sounds like we should adjust the normalization routine to preserve forward slashes, repeated slashes, and "." components, for these kinds of path. ".." components are already be preserved I think.

In almost all cases, extended paths are used solely to access a long path. The need to get a literal path is rare, especially in regard to slashes and "." and ".." component names. Microsoft's filesystems will fail an open that has forward slashes or repeated backslashes. NTFS will fail an open that has "." or ".." component names. FAT filesystems allow creating "." and ".." component names as a file's long name, with some other random short name, but it's very dysfunctional.

People assume that open(filename / "../spam//eggs.txt") will just work. But if filename is an extended path -- one that's injected completely outside of the control of the script -- then this open will fail if the path isn't normalized. IMO, no purity in regards to literal paths is worth this headache.

@pitrou
Copy link
Member

pitrou commented Apr 20, 2023

OS APIs are perfectly happy with unnormalized paths

What happens if a path is mounted on e.g. a FUSE filesystem or a Samba share? Does the OS unnormalize before handing the path to the filesystem?

@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 23, 2023
barneygale added a commit to barneygale/cpython that referenced this issue Nov 25, 2023
…g raw path

Return an unnormalized path from `pathlib.PurePath.__fspath__()`. This is
equivalent to a normalized path (because pathlib's normalization doesn't
change the meaning of paths), but considerably cheaper to generate.
@barneygale
Copy link
Contributor Author

Resolving as "can't do", because #65238 is resolved as "won't fix"

@barneygale barneygale closed this as not planned Won't fix, can't repro, duplicate, stale Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir topic-pathlib type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants