Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-81079: Add case_sensitive argument to pathlib.Path.glob() #102710

Merged
merged 13 commits into from
May 4, 2023

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Mar 15, 2023

This argument allows case-sensitive matching to be enabled on Windows, and case-insensitive matching to be enabled on Posix.

This PR removes _PreciseSelector and uses _WildcardSelector to select non-wildcard patterns. This allows case sensitivity rules to be varied, and ensures the paths returned from glob() use filesystem casing.

This argument allows case-sensitive matching to be enabled on Windows, and
case-insensitive matching to be enabled on Posix.
@barneygale barneygale changed the title GH-81079: Add case_sensitive argument to pathlib.Path.glob() GH-81079: Add case_sensitive argument to pathlib.Path.glob() Mar 15, 2023
@barneygale barneygale marked this pull request as ready for review March 15, 2023 01:08
@barneygale
Copy link
Contributor Author

Hey @zooba, would you be available to review this? I think you already looked at a similar patch in #102616, so you may already have some useful context. No worries if not. Thanks!

@zooba
Copy link
Member

zooba commented May 2, 2023

It might be nice to describe the =None value as meaning "using the native case comparison rules for the filesystem". Windows allows marking directories as case-sensitive, and one day we might actually respect that. Similarly, pretty sure you can use FAT from any OS, and we ought to treat that as case insensitive by default.

We don't have to implement those yet, but let's not lock ourselves out by documenting the specific rules. We can also file a feature request now to implement proper case support - it might help find someone motivated to make it happen.

Also, like with the symlink change, I'd like to see clearly stated which parts of the pattern are affected, as I assume it only applies to segments from the first wildcard (that is, foo/bar*/baz with case_sensitive=False on a case sensitive drive matches foo/BARF/BAZ but not FOO/barf/baz).

I trust you (and the tests) on the implementation. It takes me a good few hours to get this code into my head in a way I can judge it, and I don't have time for that today, but also don't want to block on it when all the tests look good :)

@barneygale
Copy link
Contributor Author

Thanks for taking a look!

It might be nice to describe the =None value as meaning "using the native case comparison rules for the filesystem". Windows allows marking directories as case-sensitive, and one day we might actually respect that. Similarly, pretty sure you can use FAT from any OS, and we ought to treat that as case insensitive by default.

We don't have to implement those yet, but let's not lock ourselves out by documenting the specific rules. We can also file a feature request now to implement proper case support - it might help find someone motivated to make it happen.

Good shout, will do!

Also, like with the symlink change, I'd like to see clearly stated which parts of the pattern are affected, as I assume it only applies to segments from the first wildcard (that is, foo/bar*/baz with case_sensitive=False on a case sensitive drive matches foo/BARF/BAZ but not FOO/barf/baz).

It affects all parts of the pattern! So it would match both your examples. I think this is the behaviour users expect when passing case_sensitive=False.

(The same thing will apply when we add support for follow_symlinks=True and follow_symlinks=False -- it will affect all parts of the pattern uniformly)

I trust you (and the tests) on the implementation. It takes me a good few hours to get this code into my head in a way I can judge it, and I don't have time for that today, but also don't want to block on it when all the tests look good :)

I'll expand the PR description to (hopefully) make it easier for you (or anyone else) to get into the patch.

@zooba
Copy link
Member

zooba commented May 2, 2023

It affects all parts of the pattern! So it would match both your examples. I think this is the behaviour users expect when passing case_sensitive=False.

(The same thing will apply when we add support for follow_symlinks=True and follow_symlinks=False -- it will affect all parts of the pattern uniformly)

I was thinking of the =None case, especially since for case_sensitive that's a legitimate and arguably more useful value (whereas for symlinks it's deprecated/legacy). But applying to the whole pattern, even if the first few parts are constants, seems fine.

@barneygale
Copy link
Contributor Author

While thinking about how to explain this PR, I realised it could be broken down further. So here's two new smallish PRs:

I'll mark this one as a draft for now.

@barneygale barneygale marked this pull request as draft May 2, 2023 20:11
@barneygale
Copy link
Contributor Author

One more for good luck:

@barneygale barneygale marked this pull request as draft May 3, 2023 02:22
@barneygale barneygale marked this pull request as ready for review May 3, 2023 16:33
@barneygale
Copy link
Contributor Author

If we'd like to add a case_sensitive argument to PurePath.match() too, then this PR ought to land first:

Doc/library/pathlib.rst Outdated Show resolved Hide resolved
Lib/pathlib.py Show resolved Hide resolved
barneygale and others added 2 commits May 3, 2023 18:56
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@barneygale barneygale enabled auto-merge (squash) May 4, 2023 16:37
@barneygale barneygale merged commit 8100be5 into python:main May 4, 2023
14 checks passed
carljm added a commit to carljm/cpython that referenced this pull request May 5, 2023
* main: (61 commits)
  pythongh-64595: Argument Clinic: Touch source file if any output file changed (python#104152)
  pythongh-64631: Test exception messages in cloned Argument Clinic funcs (python#104167)
  pythongh-68395: Avoid naming conflicts by mangling variable names in Argument Clinic (python#104065)
  pythongh-64658: Expand Argument Clinic return converter docs (python#104175)
  pythonGH-103092: port `_asyncio` freelist to module state (python#104196)
  pythongh-104051: fix crash in test_xxtestfuzz with -We (python#104052)
  pythongh-104190: fix ubsan crash (python#104191)
  pythongh-104106: Add gcc fallback of mkfifoat/mknodat for macOS (pythongh-104129)
  pythonGH-104142: Fix _Py_RefcntAdd to respect immortality (pythonGH-104143)
  pythongh-104112: link from cached_property docs to method-caching FAQ (python#104113)
  pythongh-68968: Correcting message display issue with assertEqual (python#103937)
  pythonGH-103899: Provide a hint when accidentally calling a module (pythonGH-103900)
  pythongh-103963: fix 'make regen-opcode' in out-of-tree builds (python#104177)
  pythongh-102500: Add PEP 688 and 698 to the 3.12 release highlights (python#104174)
  pythonGH-81079: Add case_sensitive argument to `pathlib.Path.glob()` (pythonGH-102710)
  pythongh-91896: Deprecate collections.abc.ByteString (python#102096)
  pythongh-99593: Add tests for Unicode C API (part 2) (python#99868)
  pythongh-102500: Document PEP 688 (python#102571)
  pythongh-102500: Implement PEP 688 (python#102521)
  pythongh-96534: socketmodule: support FreeBSD divert(4) socket (python#96536)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants