Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-78079: Fix UNC device path root normalization in pathlib #102003

Merged
merged 10 commits into from
Apr 14, 2023

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Feb 17, 2023

We no longer add a root to device paths such as //./PhysicalDrive0, //?/BootPartition and //./c: while normalizing. We also avoid adding a root to incomplete UNC share paths, like //, //a and //a/.

This isn't easy to backport as the the surrounded code has changed a lot since 3.11.

We no longer add a root to device paths such as `//./PhysicalDrive0`,
`//?/BootPartition` and `//./c:` while normalizing. We also avoid adding a
root to incomplete UNC share paths, like `//`, `//a` and `//a/`.
@barneygale
Copy link
Contributor Author

Hey @eryksun, does this PR look OK to you?

As I mentioned on the issue, it does not fix //./Global and //?/Global; those are somewhat complicated and I'd like to tackle them in a follow-up. Hope that makes sense?

@eryksun
Copy link
Contributor

eryksun commented Mar 10, 2023

As I mentioned on the issue, it does not fix //./Global and //?/Global; those are somewhat complicated and I'd like to tackle them in a follow-up. Hope that makes sense?

Handling "Global" is a low priority. It turns out that handling "GLOBALROOT" is a more pressing concern, but it's a small enough niche and such a complicated problem that I think leaving the drive as "\\?\GLOBALROOT" is good enough.

os.readlink() has to be updated to support returning "\\?\GLOBALROOT" paths because custom symlinks can target arbitrary NT object paths. For example, if SYMLINK_FLAG_RELATIVE isn't set in the symlink reparse buffer, then the target path "\ContainerMappedDirectories" has to be returned as "\\?\GLOBALROOT\ContainerMappedDirectories". Refer to the discussion in issue #102440 and the sample code I posted in this comment.

Lib/pathlib.py Outdated Show resolved Hide resolved
Co-authored-by: Eryk Sun <eryksun@gmail.com>
@barneygale barneygale marked this pull request as draft March 10, 2023 20:57
@barneygale
Copy link
Contributor Author

This is presently blocked on #102789.

@barneygale
Copy link
Contributor Author

#102789 has now landed, but it turns out we're blocked on another bug! PR here: #103221

@@ -330,8 +330,7 @@ def _parse_path(cls, path):
elif len(drv_parts) == 6:
# e.g. //?/unc/server/share
root = sep
unfiltered_parsed = [drv + root] + rel.split(sep)
parsed = [sys.intern(x) for x in unfiltered_parsed if x and x != '.']
parsed = [sys.intern(str(x)) for x in rel.split(sep) if x and x != '.']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, we sys.intern everything? That seems like a painful memory leak. Would we be better off with a class level lru_cache'd function1?

Don't we already know that rel will be str here? If it's bytes, str(x) is the wrong conversion anyway. Maybe the intern function should be cls._flavour.str_and_intern?

Footnotes

  1. @lru_cache(max_size=REASONABLE_VALUE) def intern(x): return x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interning of the path parts is longstanding pathlib behaviour. Honestly I don't know enough about the benefits/drawbacks of interning to say whether it's reasonable.

Don't we already know that rel will be str here?

We know that it's an instance of str (and not bytes), but we don't know whether it's a true str object (required by sys.intern()).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, we sys.intern everything? That seems like a painful memory leak.

Is it really all that bad? sys.intern() calls PyUnicode_InternInPlace(), not PyUnicode_InternImmortal(). An interned string gets deallocated based on its reference count, like any other object. For example:

>>> s = sys.intern('spam and eggs')
>>> t = sys.intern('spam and eggs')
>>> s is t
True
>>> id(s)
139835455199152
>>> del s, t
>>> s = sys.intern('spam and eggs')
>>> id(s)
139835455198912

@barneygale barneygale marked this pull request as ready for review April 11, 2023 16:52
@barneygale
Copy link
Contributor Author

Hey @eryksun, are you happy for this to land? Thanks

@eryksun
Copy link
Contributor

eryksun commented Apr 14, 2023

LGTM. The implementation of pathlib has changed a lot since I opened #78079 almost 5 years ago.

@barneygale barneygale merged commit 8af8f52 into python:main Apr 14, 2023
13 checks passed
@barneygale
Copy link
Contributor Author

Thank you both for the reviews + advice!

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot ARM Raspbian 3.x has failed when building commit 8af8f52.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/424/builds/3700) and take a look at the build logs.
  4. Check if the failure is related to this commit (8af8f52) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/424/builds/3700

Summary of the results of the build (if available):

== Tests result: ENV CHANGED ==

413 tests OK.

10 slowest tests:

  • test_venv: 7 min 28 sec
  • test_asyncio: 6 min 41 sec
  • test_largefile: 5 min 30 sec
  • test_math: 4 min 26 sec
  • test_multiprocessing_spawn: 4 min 24 sec
  • test_concurrent_futures: 3 min 8 sec
  • test___all__: 2 min 27 sec
  • test_cppext: 2 min 13 sec
  • test_tokenize: 2 min 7 sec
  • test_multiprocessing_forkserver: 2 min 7 sec

1 test altered the execution environment:
test_concurrent_futures

20 tests skipped:
test_devpoll test_idle test_ioctl test_kqueue test_launcher
test_msilib test_peg_generator test_perf_profiler test_startfile
test_tcl test_tix test_tkinter test_ttk test_ttk_textonly
test_turtle test_winconsoleio test_winreg test_winsound test_wmi
test_zipfile64

Total duration: 30 min 33 sec

Click to see traceback logs
remote: Enumerating objects: 22, done.        
remote: Counting objects:   4% (1/22)        
remote: Counting objects:   9% (2/22)        
remote: Counting objects:  13% (3/22)        
remote: Counting objects:  18% (4/22)        
remote: Counting objects:  22% (5/22)        
remote: Counting objects:  27% (6/22)        
remote: Counting objects:  31% (7/22)        
remote: Counting objects:  36% (8/22)        
remote: Counting objects:  40% (9/22)        
remote: Counting objects:  45% (10/22)        
remote: Counting objects:  50% (11/22)        
remote: Counting objects:  54% (12/22)        
remote: Counting objects:  59% (13/22)        
remote: Counting objects:  63% (14/22)        
remote: Counting objects:  68% (15/22)        
remote: Counting objects:  72% (16/22)        
remote: Counting objects:  77% (17/22)        
remote: Counting objects:  81% (18/22)        
remote: Counting objects:  86% (19/22)        
remote: Counting objects:  90% (20/22)        
remote: Counting objects:  95% (21/22)        
remote: Counting objects: 100% (22/22)        
remote: Counting objects: 100% (22/22), done.        
remote: Compressing objects:   9% (1/11)        
remote: Compressing objects:  18% (2/11)        
remote: Compressing objects:  27% (3/11)        
remote: Compressing objects:  36% (4/11)        
remote: Compressing objects:  45% (5/11)        
remote: Compressing objects:  54% (6/11)        
remote: Compressing objects:  63% (7/11)        
remote: Compressing objects:  72% (8/11)        
remote: Compressing objects:  81% (9/11)        
remote: Compressing objects:  90% (10/11)        
remote: Compressing objects: 100% (11/11)        
remote: Compressing objects: 100% (11/11), done.        
remote: Total 12 (delta 10), reused 2 (delta 1), pack-reused 0        
From https://github.com/python/cpython
 * branch                  main       -> FETCH_HEAD
Note: switching to '8af8f52d175959f03cf4a2786b6a81ec09e15e95'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 8af8f52d17 GH-78079: Fix UNC device path root normalization in pathlib (GH-102003)
Switched to and reset branch 'main'

Objects/obmalloc.c:775:1: warning: ‘always_inline’ function might not be inlinable [-Wattributes]
  775 | arena_map_get(pymem_block *p, int create)
      | ^~~~~~~~~~~~~

make: *** [Makefile:1950: buildbottest] Error 3

carljm added a commit to carljm/cpython that referenced this pull request Apr 17, 2023
* main:
  Remove `expert-*` from `project-updater` GH workflow (python#103579)
  pythongh-103583: Add codecs and maps to _codecs_* module state (python#103540)
  pythongh-48330: address review comments to PR-12271 (python#103209)
  pythongh-103527: Add multibytecodec.h as make dep for _codecs_* (python#103567)
  pythongh-103553: Improve `test_inspect`: add more assertions, remove unused (python#103554)
  pythonGH-103517: Improve tests for `pathlib.Path.walk()` (pythonGH-103518)
  pythongh-102114: Make dis print more concise tracebacks for syntax errors in str inputs (python#102115)
  pythonGH-78079: Fix UNC device path root normalization in pathlib (pythonGH-102003)
  pythongh-101517: Add regression test for a lineno bug in try/except* impacting pdb (python#103547)
  pythongh-103527: Add make deps for _codecs_* and _multibytecodec (python#103528)
  pythongh-103532: Fix reST syntax in NEWS entry (pythonGH-103544)
  pythongh-103532: Add NEWS entry (python#103542)
carljm added a commit to carljm/cpython that referenced this pull request Apr 17, 2023
* superopt:
  update generated cases with new comment
  review comments
  Remove `expert-*` from `project-updater` GH workflow (python#103579)
  pythongh-103583: Add codecs and maps to _codecs_* module state (python#103540)
  pythongh-48330: address review comments to PR-12271 (python#103209)
  pythongh-103527: Add multibytecodec.h as make dep for _codecs_* (python#103567)
  pythongh-103553: Improve `test_inspect`: add more assertions, remove unused (python#103554)
  pythonGH-103517: Improve tests for `pathlib.Path.walk()` (pythonGH-103518)
  pythongh-102114: Make dis print more concise tracebacks for syntax errors in str inputs (python#102115)
  pythonGH-78079: Fix UNC device path root normalization in pathlib (pythonGH-102003)
  pythongh-101517: Add regression test for a lineno bug in try/except* impacting pdb (python#103547)
  pythongh-103527: Add make deps for _codecs_* and _multibytecodec (python#103528)
  pythongh-103532: Fix reST syntax in NEWS entry (pythonGH-103544)
  pythongh-103532: Add NEWS entry (python#103542)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants