Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-88569: add ntpath.isreserved() #95486

Merged
merged 38 commits into from Jan 26, 2024
Merged

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Jul 31, 2022

Add ntpath.isreserved(), which identifies reserved pathnames such as "NUL", "AUX" and "CON".

Deprecate pathlib.PurePath.is_reserved().

@eryksun
Copy link
Contributor

eryksun commented Jul 31, 2022

In ntpath.isreserved(), let's remove the exception for UNC paths. GetFullPathNameW() doesn't reserve DOS device names in UNC paths, but the SMB provider disallows creating files with DOS device names. It's the most common and most important UNC provider. For example, on the "//localhost/C$" SMB share:

>>> for fn in ('nul', 'con', 'aux', 'prn', 'com1', 'lpt1'):
...     try:
...         open(f'//localhost/C$/Temp/{fn}', 'w')
...     except PermissionError:
...         print(f'{fn} disallowed')
...
nul disallowed
con disallowed
aux disallowed
prn disallowed
com1 disallowed
lpt1 disallowed

Also, a name that ends with a space or dot should be reserved. The system's path normalization strips trailing spaces and dots from the final path component (e.g. "spam. . ." -> "spam"). Path normalization applies to all cases except opening "\\?\" extended paths.

Also, the characters :<>"?*| are reserved in file names, as are as well the ASCII control characters (ordinals 0-31). (The characters : and | aren't reserved by all file systems, but generally they are.)

In file systems that support file streams, : delimits the stream name and type (e.g. "filename:streamname:$DATA"). Stream names can include the characters <>"?*| as well as control characters other than null (ordinals 1-31). I think it's simplest to just reserve a name that includes :, so names that specify a file stream are marked as reserved for general use. Code that needs to use file streams should be aware of this and separately validate the file name and stream name.

@ChrisDenton
Copy link

Not sure if it's relevant to this function but Windows 11 has greatly simplified how reserved names work. Now C:\path\COM1 is not reserved and neither is aux.c. But aux on its own still is (and aux.. .. because trailing spaces and dots are always stripped).

@eryksun
Copy link
Contributor

eryksun commented Jul 31, 2022

Not sure if it's relevant to this function but Windows 11 has greatly simplified how reserved names work. Now C:\path\COM1 is not reserved and neither is aux.c. But aux on its own still is (and aux.. .. because trailing spaces and dots are always stripped).

In Windows 11, path normalization no longer special cases a DOS device name if it has an extension (e.g. "con.txt"). Also a DOS device name isn't special cased if it's the leaf component of a path -- except for the "NUL" device. For example:

>>> nt._getfullpathname('nul.txt')
'C:\\Temp\\nul.txt'
>>> nt._getfullpathname('c:/temp/nul') # NUL is still reserved
'\\\\.\\nul'

However, since DOS device names are still special cased as unqualified names and still reserved by the SMB server, they should still be avoided as file names. For example, creating a file named "con" in the current working directory would have to use the path "./con".

@barneygale
Copy link
Contributor Author

@eryksun what should ntpath.isreserved('.') return? Thanks

@eryksun
Copy link
Contributor

eryksun commented Jul 31, 2022

@eryksun what should ntpath.isreserved('.') return? Thanks

Isn't that a cross-platform question? The names "." and ".." are reserved in POSIX and Windows. For ".." it always has to be exact. Otherwise trailing dots and spaces are stripped in Windows. For example:

>>> nt._getfullpathname('z:/spam/..')
'z:\\'
>>> nt._getfullpathname('z:/spam/...')
'z:\\spam\\'
>>> nt._getfullpathname('z:/spam/.. .')
'z:\\spam\\'

@barneygale
Copy link
Contributor Author

That would make pathlib.Path().is_reserved() always true. I'm not convinced tbh. I can't see any existing CPython bug or feature request that covers the behaviour change you're requesting.

@eryksun
Copy link
Contributor

eryksun commented Jul 31, 2022

That would make pathlib.Path().is_reserved() always true. I'm not convinced tbh. I can't see any existing CPython bug or feature request that covers the behaviour change you're requesting.

I'm referring to the base name, i.e. os.fsdecode(basename(path)), in terms of answering the question, can one create a file or directory with this exact name? One cannot create/overwrite a file named "." or "..". The names are reserved as implicit hard links or aliases to the current directory and parent directory. It's no different from something like "nul" in Windows. For example, open('c:/Temp/nul', 'w') succeeds because it is a valid path. But the name "nul" implicitly resolves to "\\.\nul" (even in Windows 11).

Copy link
Member

@brettcannon brettcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, but I think there's an extraneous function call to take out. I would also like @eryksun to make sure we are copying over the right implementation details, or make any tweaks as necessary now if possible.

Lib/posixpath.py Outdated Show resolved Hide resolved
@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@barneygale
Copy link
Contributor Author

Thinking out loud:

There's probably a distinction between reserved names and invalid names, analogous to the distinction between reserved Python keywords (for, break, etc) and invalid Python identifiers (my-var, 3dpoints)

pathlib.PurePath.is_reserved() consciously concerns itself only with the former case (at least for the moment!)

@eryksun
Copy link
Contributor

eryksun commented Aug 6, 2022

There's probably a distinction between reserved names and invalid names,

Does this mean that you don't want to reserve names that contain the wildcard characters *, ?, " (i.e. DOS_DOT), < (i.e. DOS_STAR), and > (i.e. DOS_QM)? Wildcard characters are illegal in the name of a file or directory because they're used in system calls such as NtQueryDirectoryFile() (its FileName parameter can contain wildcards to filter a directory listing) and ultimately by filesystem device drivers via runtime library functions such as FsRtlIsNameInExpression().

What about names that are illegal to create because they're reserved in other contexts? That's the case when creating a file with a DOS device name on an SMB share. SMB disallows creating a file named "nul", for example, because it causes problems with accessing the file directly on the server (e.g. as "C:\share\nul").

What about base names that contain colons? For example, in an NTFS filesystem, "spam:00" creates a file named "spam" that contains a data stream named "00". That can surprise even experienced developers, particularly if they come from a POSIX background. In a FAT filesystem, "spam:00" is an invalid name. In a VboxSharedFolderFS (VirtualBox) filesystem with a Linux host, "spam:00" is allowed as the literal name.

What about a name that changes when accessed because trailing dots and spaces are stripped? For example, "spam ." -> "spam".

What about "." and ".." components in "\\?\" extended paths? In Windows, "." and ".." components are handled in the user-mode API. However, normalization is skipped for a "\\?\" extended path, so the filesystem is passed a path that contains "." and ".." components. The results can be dysfunctional and surprising. For example, FAT filesystems (volume "E:" in this example) allow creating regular files named "." and "..":

>>> open(r'\\?\E:\Temp\.', 'w').close()
>>> open(r'\\?\E:\Temp\..', 'w').close()
>>> print(*[(e[-2], e[-1], e[0]) for e in win32file.FindFilesW('*')], sep='\n')
('.', '', 16)
('..', '', 16)
('.', 'F93F~1', 32)
('..', 'C59D~1', 32)  

The first two are the required "." and ".." entries, which are directories (16) and have no short name. The last two are regular files with the archive attribute set (32) and legacy short names.

Otherwise, FAT filesystems don't support opening paths with literal "." and ".." entries. The open fails with ERROR_PATH_NOT_FOUND (i.e. C ENOENT). For example:

>>> open(r'\\?\E:\Temp\..\spam.txt', 'w').close()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '\\\\?\\E:\\Temp\\..\\spam.txt' 

NTFS fails all three of the above cases with ERROR_INVALID_NAME (i.e. C EINVAL).

@barneygale
Copy link
Contributor Author

@eryksun whats your view on the existing pathlib.PurePath.is_reserved() method? It doesn't perform any of the validity checks you mention. Is it broken without them? If so, perhaps we should consider deprecating the method, rather than expanding it and chasing a long tail of pathname validity bugs?

@eryksun
Copy link
Contributor

eryksun commented Aug 7, 2022

whats your view on the existing pathlib.PurePath.is_reserved() method? It doesn't perform any of the validity checks you mention. Is it broken without them? If so, perhaps we should consider deprecating the method, rather than expanding it and chasing a long tail of pathname validity bugs?

The is_reserved() method is useful in theory, but it isn't used much. A sanitizepath() function would be more useful. Regarding the design, it was a mistake to exclude UNC paths. Just because one can create "//?/C:/Temp/nul", that doesn't mean it's not a troublesome file name that should never be used. Also, file names that end with trailing spaces and dots have to be reserved.

As to reserved characters and "." and ".." components in extended paths, I'm just putting it out there for discussion. I'm less concerned about the five wildcard characters. They're always disallowed in base file names (but not stream names). A filesystem that didn't exclude them would be broken. Creating a file named "spam?.txt" isn't going to magically succeed in a surprising way. But a filesystem might allow :, |, and ASCII control characters in base file names, even though they should be illegal. VboxSharedFolderFS does. If a name like that doesn't get sanitized, it can cause problems later on, e.g. when trying to copy the file to another filesystem. The potentially surprising behavior with names that specify a data stream is the worst offender (e.g. "spam:.txt" has a stream named ".txt"). If I could only choose one character to flag as reserved in base file names, it would be colon.

Lib/ntpath.py Outdated Show resolved Hide resolved
Lib/test/test_ntpath.py Outdated Show resolved Hide resolved
Lib/test/test_ntpath.py Outdated Show resolved Hide resolved
Co-authored-by: Eryk Sun <eryksun@gmail.com>
Lib/ntpath.py Outdated Show resolved Hide resolved
@eryksun
Copy link
Contributor

eryksun commented Aug 9, 2022

The following check would need to be removed from PureWindowsPathTest.test_is_reserved() in "Lib/test/test_pathlib.py":

        # UNC paths are never reserved.
        self.assertIs(False, P('//my/share/nul/con/aux').is_reserved()) 

Though this entire pathlib test is redundant now. As you consolidate the implementations, you might want to remove redundant tests, and only keep tests that are specific to how pathlib uses the os.path flavour interfaces.

@zooba
Copy link
Member

zooba commented Jan 8, 2024

Considering the rules about which paths are reserved have changed in recent Windows releases (e.g. '.\\aux' is no longer reserved - I believe only nul is still reserved anywhere in a path, and the rest need to be the entire path), I'm even more cautious about the whole idea of a reserved path.

With a suitable warning in the docs (e.g. "this is an approximation of Windows rules as of this Python release; actual paths may differ, and this function may be updated as changed rules become more widely used"), I think it's okay to have in ntpath, and would deprecated PurePath.

Doc/library/os.path.rst Outdated Show resolved Hide resolved
Doc/library/pathlib.rst Show resolved Hide resolved
Lib/pathlib/__init__.py Outdated Show resolved Hide resolved
@eryksun
Copy link
Contributor

eryksun commented Jan 9, 2024

Considering the rules about which paths are reserved have changed in recent Windows releases (e.g. '.\\aux' is no longer reserved - I believe only nul is still reserved anywhere in a path, and the rest need to be the entire path), I'm even more cautious about the whole idea of a reserved path.

Even in Windows 11, the SMB redirector still reserves names of legacy DOS devices in the filename part of an open/create (e.g. NUL, CON, PRN, AUX, LPT1-LPT9, COM1-COM9). At least SMB has never reserved DOS device names with a file extension (e.g. "con.txt"). And at least it just denies access instead of trying to access a remote device. It would be nice if the next version of the SMB protocol allowed legacy DOS device names in the filename part of an open/create.

@zooba
Copy link
Member

zooba commented Jan 15, 2024

I don't have any other concerns with this. However, I think we should refer to os.path.isreserved in the NEWS entry (and clarify that it's only on Windows), and refer to ntpath in the commit message/PR title. Right now these are the opposite way around.

@zooba
Copy link
Member

zooba commented Jan 15, 2024

@brettcannon Your objection was a while ago and I believe has been addressed. You don't need to re-review unless you want to, but I think we'd like an ack that you aren't blocking the PR anymore

Lib/ntpath.py Outdated Show resolved Hide resolved
@brettcannon brettcannon self-requested a review January 15, 2024 23:14
@brettcannon
Copy link
Member

@barneygale I refreshed my review and I'm not blocking this. 🙂

@barneygale barneygale changed the title gh-88569: add os.path.isreserved() gh-88569: add ntpath.isreserved() Jan 16, 2024
@brettcannon brettcannon removed their request for review January 16, 2024 19:28
@barneygale
Copy link
Contributor Author

@zooba just checking you're happy for me to merge?

@zooba
Copy link
Member

zooba commented Jan 26, 2024

Yeah, go ahead

@barneygale
Copy link
Contributor Author

Thanks everyone for the help with this

@barneygale barneygale merged commit 7e31d6d into python:main Jan 26, 2024
29 checks passed
aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
Add `ntpath.isreserved()`, which identifies reserved pathnames such as "NUL", "AUX" and "CON".

Deprecate `pathlib.PurePath.is_reserved()`.

---------

Co-authored-by: Eryk Sun <eryksun@gmail.com>
Co-authored-by: Brett Cannon <brett@python.org>
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants