Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up pathlib.Path.glob() by working with strings internally #117586

Closed
barneygale opened this issue Apr 6, 2024 · 0 comments
Closed

Speed up pathlib.Path.glob() by working with strings internally #117586

barneygale opened this issue Apr 6, 2024 · 0 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir topic-pathlib

Comments

@barneygale
Copy link
Contributor

barneygale commented Apr 6, 2024

pathlib.Path.glob() currently generates Path objects for intermediate paths that might never be yielded to the user, which is slow and unnecessary. For example, a pattern like **/*.mp3 is evaluated by creating a Path object for every directory visited.

There are already few tricks employed to avoid instantiation, but it would be better if only real results were converted to path objects.

Linked PRs

@barneygale barneygale added performance Performance or resource usage stdlib Python modules in the Lib dir topic-pathlib labels Apr 6, 2024
barneygale added a commit to barneygale/cpython that referenced this issue Apr 6, 2024
Move pathlib globbing implementation to a new module and class:
`pathlib._glob.Globber`. This class implements fast string-based globbing.
It's called by `pathlib.Path.glob()`, which then converts strings back to
path objects.

In the private pathlib ABCs, add a `pathlib._abc.Globber` subclass that
works with `PathBase` objects rather than strings, and calls user-defined
path methods like `PathBase.stat()` rather than `os.stat()`.

This sets the stage for two more improvements:

- pythonGH-115060: Query non-wildcard segments with `lstat()`
- pythonGH-116380: Move `pathlib._glob` to `glob` (unify implementations).
barneygale added a commit that referenced this issue Apr 10, 2024
…17589)

Move pathlib globbing implementation into a new private class: `glob._Globber`. This class implements fast string-based globbing. It's called by `pathlib.Path.glob()`, which then converts strings back to path objects.

In the private pathlib ABCs, add a `pathlib._abc.Globber` subclass that works with `PathBase` objects rather than strings, and calls user-defined path methods like `PathBase.stat()` rather than `os.stat()`.

This sets the stage for two more improvements:

- GH-115060: Query non-wildcard segments with `lstat()`
- GH-116380: Unify `pathlib` and `glob` implementations of globbing.

No change to the implementations of `glob.glob()` and `glob.iglob()`.
barneygale added a commit to barneygale/cpython that referenced this issue Apr 10, 2024
Move `pathlib.Path.walk()` implementation into `glob._Globber`. The new
`glob._Globber.walk()` classmethod works with strings internally, which is
a little faster than generating `Path` objects and keeping them normalized.
The `pathlib.Path.walk()` method converts the strings back to path objects.

In the private pathlib ABCs, our existing subclass of `_Globber` ensures
that `PathBase` instances are used throughout.

Follow-up to python#117589.
barneygale added a commit that referenced this issue Apr 11, 2024
…17726)

Move `pathlib.Path.walk()` implementation into `glob._Globber`. The new
`glob._Globber.walk()` classmethod works with strings internally, which is
a little faster than generating `Path` objects and keeping them normalized.
The `pathlib.Path.walk()` method converts the strings back to path objects.

In the private pathlib ABCs, our existing subclass of `_Globber` ensures
that `PathBase` instances are used throughout.

Follow-up to #117589.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…gs (python#117589)

Move pathlib globbing implementation into a new private class: `glob._Globber`. This class implements fast string-based globbing. It's called by `pathlib.Path.glob()`, which then converts strings back to path objects.

In the private pathlib ABCs, add a `pathlib._abc.Globber` subclass that works with `PathBase` objects rather than strings, and calls user-defined path methods like `PathBase.stat()` rather than `os.stat()`.

This sets the stage for two more improvements:

- pythonGH-115060: Query non-wildcard segments with `lstat()`
- pythonGH-116380: Unify `pathlib` and `glob` implementations of globbing.

No change to the implementations of `glob.glob()` and `glob.iglob()`.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…gs (python#117726)

Move `pathlib.Path.walk()` implementation into `glob._Globber`. The new
`glob._Globber.walk()` classmethod works with strings internally, which is
a little faster than generating `Path` objects and keeping them normalized.
The `pathlib.Path.walk()` method converts the strings back to path objects.

In the private pathlib ABCs, our existing subclass of `_Globber` ensures
that `PathBase` instances are used throughout.

Follow-up to python#117589.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir topic-pathlib
Projects
None yet
Development

No branches or pull requests

1 participant