Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PurePosixPath no longer correctly parses PureWindowsPath #103631

Closed
domdfcoding opened this issue Apr 19, 2023 · 10 comments
Closed

PurePosixPath no longer correctly parses PureWindowsPath #103631

domdfcoding opened this issue Apr 19, 2023 · 10 comments
Assignees
Labels
3.12 bugs and security fixes 3.13 new features, bugs and security fixes topic-pathlib type-bug An unexpected behavior, bug, or error

Comments

@domdfcoding
Copy link
Contributor

domdfcoding commented Apr 19, 2023

Bug report

In pathlib prior to Python 3.12, passing a PureWindowsPath to a PurePosixPath resulted in the Windows separator (\) being converted to the POSIX separator (/). However, in the current main branch the backslashes are preserved in the PurePosixPath object.

Here is an example which illustrates this:

Python 3.12.0a7 (tags/v3.12.0a7:b861ba4, Apr  6 2023, 16:09:18) [Clang 10.0.0 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> print(pathlib.PurePosixPath(pathlib.PureWindowsPath(r"a\b\c")))
a\b\c
>>> print(pathlib.PurePosixPath(pathlib.PureWindowsPath(r"a\b\c")).as_posix())
a\b\c 
Python 3.11.2 (tags/v3.11.2:878ead1, Mar  9 2023, 16:26:59) [Clang 10.0.0 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> print(pathlib.PurePosixPath(pathlib.PureWindowsPath(r"a\b\c")))
a/b/c
>>> print(pathlib.PurePosixPath(pathlib.PureWindowsPath(r"a\b\c")).as_posix())
a/b/c 

(The behaviour is the same if using PosixPath or WindowsPath on the relevant platform; it's not specific to the "pure" variants.)


Before the recent refactoring of the module, passing one Path or PurePath object to another resulted in the _parts attribute being inspected. This was a list of the individual path elements (e.g. ['a', 'b', 'c'] for the path a\b\c). The _parts attribute was removed in GH-102476 and replaced with _tail, but with slightly different semantics.

The current code replaces any os.altsep in the path with os.sep, which for WindowsPath replaces / with \ but for PosixPath does nothing as there is no alternative separator. However, the following will produce a correct result:

Python 3.12.0a7+ (heads/main:bd2ed06, Apr 19 2023, 15:27:47) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> pathlib.PurePosixPath._flavour.altsep = "\\"
>>> pathlib.PurePosixPath(pathlib.PureWindowsPath(r"a\b\c")).as_posix()
'a/b/c'

Thus I think the problem can be isolated to these lines here:

cpython/Lib/pathlib.py

Lines 316 to 324 in da2273f

@classmethod
def _parse_path(cls, path):
if not path:
return '', '', []
sep = cls._flavour.sep
altsep = cls._flavour.altsep
if altsep:
path = path.replace(altsep, sep)
drv, root, rel = cls._flavour.splitroot(path)

Your environment

  • CPython versions tested on: 3.8, 3.11 and 3.12 (alpha 7 and HEAD)
  • Operating system and architecture: Ubuntu 20.04 and Windows 10

Linked PRs

@domdfcoding domdfcoding added the type-bug An unexpected behavior, bug, or error label Apr 19, 2023
@eryksun eryksun added 3.12 bugs and security fixes topic-pathlib 3.13 new features, bugs and security fixes labels Apr 19, 2023
@eryksun
Copy link
Contributor

eryksun commented Apr 19, 2023

If constructing a POSIX path directly from a Windows path is supported, it won't be implemented by adding backslash as an alternate path separator for POSIX paths. Backslash is a normal name character in POSIX, not a reserved path separator.

As a workaround, you can get a POSIX path before passing the path to the PurePosixPath constructor.

>>> pathlib.PurePosixPath(pathlib.PureWindowsPath(r"a\b\c").as_posix())
PurePosixPath('a/b/c')

@barneygale
Copy link
Contributor

In pathlib prior to Python 3.12, passing a PureWindowsPath to a PurePosixPath resulted in the Windows separator (\) being converted to the POSIX separator (/)

Only for relative paths. If you try to pass an absolute path, things quickly go south:

barney@acorn ~ $ python3.8
Python 3.8.10 (default, Mar 13 2023, 10:26:41) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> pathlib.PurePosixPath(pathlib.PureWindowsPath(r"c:\a\b\c"))
PurePosixPath('c:\\/a/b/c')
>>> pathlib.PurePosixPath(pathlib.PureWindowsPath(r"\\server\share\a\b\c"))
PurePosixPath('\\\\server\\share\\/a/b/c')

@barneygale
Copy link
Contributor

FYI, the behaviour was changed in #101667 and #102454 in order to fix this bug:

As a result, these all do the same thing:

>>> from pathlib import PurePosixPath, PureWindowsPath
>>> from os import fspath
>>> PurePosixPath(r"c:\a\b\c")
PurePosixPath('c:\\a\\b\\c')
>>> PurePosixPath(PureWindowsPath(r"c:\a\b\c"))
PurePosixPath('c:\\a\\b\\c')
>>> PurePosixPath(fspath(PureWindowsPath(r"c:\a\b\c")))
PurePosixPath('c:\\a\\b\\c')

We tweaked the docs slightly to make clear that all os.PathLike objects are treated similarly:

diff --git a/Doc/library/pathlib.rst b/Doc/library/pathlib.rst
index c8a734ecad8e..8e91936680fa 100644
--- a/Doc/library/pathlib.rst
+++ b/Doc/library/pathlib.rst
@@ -105,8 +105,9 @@ we also call *flavours*:
       PurePosixPath('setup.py')
 
    Each element of *pathsegments* can be either a string representing a
-   path segment, an object implementing the :class:`os.PathLike` interface
-   which returns a string, or another path object::
+   path segment, or an object implementing the :class:`os.PathLike` interface
+   where the :meth:`~os.PathLike.__fspath__` method returns a string,
+   such as another path object::
 
       >>> PurePath('foo', 'some/path', 'bar')
       PurePosixPath('foo/some/path/bar')

Eryk's suggestion looks ideal to me, and clearly signals intentions.

@mwichmann
Copy link

As a workaround, you can get a POSIX path before passing the path to the PurePosixPath constructor.

>>> pathlib.PurePosixPath(pathlib.PureWindowsPath(r"a\b\c").as_posix())
PurePosixPath('a/b/c')

The problem with this is it means your code has to be system-aware. The case I've run across this, just figuring out it was broken now, uses a PureWindowsPath when processing a file of pathnames, so it can be read in an agnostic way (that is, either forward or backward slashes in the file work no matter what the platform), then converts it to concrete paths by calling Path() on the result, which used to work to produce suitable "native" paths. Now it's going to have to LBYL to figure out whether to as_posix or not.

@barneygale
Copy link
Contributor

Hm! Backslashes can appear in Posix filenames. How would I specify such a path in your file of pathnames? Or is that not a problem because you create all these paths from within your program?

@barneygale
Copy link
Contributor

barneygale commented May 22, 2023

For your use case, would this work?

pathlib.Path(pathlib.PureWindowsPath(yourpath).as_posix())

as_posix() returns a path with forward slashes as separators, which is valid for both WindowsPath and PosixPath.

Alternatively:

pathlib.Path(yourpath.replace('\\', '/'))

@mwichmann
Copy link

For your use case, would this work?

pathlib.Path(pathlib.PureWindowsPath(r"a\b\c").as_posix())

as_posix() returns a path with forward slashes as separators, which is valid for both WindowsPath and PosixPath.

The usecase is more specifically a list of tests to run (cmdline or from a file), and we know we're not creating tests with particularly "odd" names, so it's a (somewhat) controlled namespace. We've wanted to display the names the way they were entered if they were on Windows, so if someone actually typed foo\bar it should look like foo\bar was being run, but that's just a nicety; normalizing all paths to "POSIX style" isn't unreasonable - somebody already tried to push that idea.

@barneygale
Copy link
Contributor

Gotcha.

Path() will switch the forward slashes back to backslashes on Windows, which sounds like what you want?

barneygale added a commit to barneygale/cpython that referenced this issue May 25, 2023
…handling

For backwards compatibility, accept backslashes as path separators in
`PurePosixPath` if an instance of `PureWindowsPath` is supplied.
@barneygale
Copy link
Contributor

Upon further consideration I don't think this is worth the backwards-compatibility break. PR up: #104949

barneygale added a commit that referenced this issue May 26, 2023
…ng (GH-104949)

For backwards compatibility, accept backslashes as path separators in
`PurePosixPath` if an instance of `PureWindowsPath` is supplied.
This restores behaviour from Python 3.11.

Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 26, 2023
…handling (pythonGH-104949)

For backwards compatibility, accept backslashes as path separators in
`PurePosixPath` if an instance of `PureWindowsPath` is supplied.
This restores behaviour from Python 3.11.

(cherry picked from commit 328422c)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
barneygale added a commit that referenced this issue May 26, 2023
… handling (GH-104949) (GH-104991)

For backwards compatibility, accept backslashes as path separators in
`PurePosixPath` if an instance of `PureWindowsPath` is supplied.
This restores behaviour from Python 3.11.

(cherry picked from commit 328422c)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
@barneygale
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 new features, bugs and security fixes topic-pathlib type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants