Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behaviour between pathlib.PurePath.match and glob.glob #118701

Closed
zhukoff-pavel opened this issue May 7, 2024 · 7 comments
Closed
Labels
topic-pathlib type-bug An unexpected behavior, bug, or error

Comments

@zhukoff-pavel
Copy link

zhukoff-pavel commented May 7, 2024

Bug report

Bug description:

Prerequisites

Hello! I have a structure of directories as follows:

% tree
.
└── a
    └── b
        └── c
            └── d
                └── e

Problem

Here Pathlib.PurePath a/b/c/d/e won't match with **/b/c/**, but will match with **/c/d/**:

Python 3.12.3 (main, Apr  9 2024, 08:09:14) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> path = pathlib.PurePath("a/b/c/d/e")
>>> path.match("**/b/c/**")
False
>>> path.match("**/c/d/**")
True

However, glob on this structure yields different results:

>>> import glob
>>> glob.glob("**/b/c/**")
['a/b/c/d']
>>> glob.glob("**/c/d/**")
[]
>>> glob.glob("**/b/c/**", recursive=True)
['a/b/c/', 'a/b/c/d', 'a/b/c/d/e']
>>> glob.glob("**/c/d/**", recursive=True)
['a/b/c/d/', 'a/b/c/d/e']

i.e. **/b/c/** and **/c/d/** will match to the named path only in recursive mode.

Ideas

I'd expect pathlib.match to behave more like recursive glob.glob (as in prior code block) or pathlib.Path.glob:

>>> sorted(pathlib.Path(".").glob("**/b/c/**"))
[PosixPath('a/b/c'), PosixPath('a/b/c/d'), PosixPath('a/b/c/d/e')]
>>> sorted(pathlib.Path(".").glob("**/c/d/**"))
[PosixPath('a/b/c/d'), PosixPath('a/b/c/d/e')]

CPython versions tested on:

3.10, 3.12

Operating systems tested on:

Linux, macOS

Linked PRs

@zhukoff-pavel zhukoff-pavel added the type-bug An unexpected behavior, bug, or error label May 7, 2024
@zhukoff-pavel
Copy link
Author

Follow-up: May be connected with #106747

@sobolevn
Copy link
Member

sobolevn commented May 7, 2024

cc @barneygale

@barneygale
Copy link
Contributor

barneygale commented May 7, 2024

PurePath.match() has a couple of important differences from Path.glob() and glob.glob():

  1. Recursive expansion isn't supported, and so ** wildcards work exactly like *
  2. If a relative pattern is given, matching is performed from the right

Could you try using Path.full_match() in an alpha or (upcoming) beta of 3.13? I think it does what you want:

>>> import pathlib
>>> path = pathlib.PurePath("a/b/c/d/e")
>>> path.full_match("**/b/c/**")
True
>>> path.full_match("**/c/d/**")
True

@zhukoff-pavel
Copy link
Author

Recursive expansion isn't supported, and so ** wildcards work exactly like *

Oh, I see. So, this is intended. Thank you for quick reply!

Is it possible then to add this note to the documentation of previous Python versions? (e.g. Python 3.12)

By the way, side-question regarding naming: Doesn't full_match sound more strict than just match? :)
full_match seems more relaxed than the regular match, because of allowing for recursive patterns to be used, thus allowing more matches per recursive glob.

@barneygale
Copy link
Contributor

By the way, side-question regarding naming: Doesn't full_match sound more strict than just match? :) full_match seems more relaxed than the regular match, because of allowing for recursive patterns to be used, thus allowing more matches per recursive glob.

full_match() matches against the entire path, whereas match() matches from the right, so:

>>> from pathlib import PurePath
>>> PurePath('foo/bar.py').match('*.py')
True
>>> PurePath('foo/bar.py').full_match('*.py')
False

It's analogous to re.match() vs re.fullmatch().

Support for recursive ** wildcards only really makes sense once you eliminate the right-hand-side matching, otherwise all your patterns have an implicit **/ prefix, which makes it much less useful.

@zhukoff-pavel
Copy link
Author

I see now.

Thank you for detailed explanations and for documenting!

@barneygale
Copy link
Contributor

No worries, thank you very much for the report :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-pathlib type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants