pathlib is_reserved fails for some reserved paths on Windows #72014
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
assignee = None closed_at = <Date 2021-07-28.15:17:51.384> created_at = <Date 2016-08-22.09:33:47.875> labels = ['type-bug', '3.9', '3.10', '3.11', 'library', 'OS-windows'] title = 'pathlib is_reserved fails for some reserved paths on Windows' updated_at = <Date 2021-07-28.15:17:51.382> user = 'https://github.com/eryksun'
activity = <Date 2021-07-28.15:17:51.382> actor = 'lukasz.langa' assignee = 'none' closed = True closed_date = <Date 2021-07-28.15:17:51.384> closer = 'lukasz.langa' components = ['Library (Lib)', 'Windows'] creation = <Date 2016-08-22.09:33:47.875> creator = 'eryksun' dependencies =  files = ['44588'] hgrepos =  issue_num = 27827 keywords = ['patch'] message_count = 13.0 messages = ['273344', '273761', '275964', '290515', '290526', '290530', '378151', '378153', '395708', '398390', '398394', '398396', '398397'] nosy_count = 9.0 nosy_names = ['paul.moore', 'tim.golden', 'lukasz.langa', 'zach.ware', 'serhiy.storchaka', 'eryksun', 'steve.dower', 'miss-islington', 'barneygale'] pr_nums = ['26698', '27421', '27422'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue27827' versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']
The text was updated successfully, but these errors were encountered:
pathlib._WindowsFlavour.is_reserved assumes Windows uses an exact match up to the file extension for reserved DOS device names. However, this misses cases involving trailing spaces and colons, such as the following examples:
>>> pathlib.Path('C:/foo/NUL:').is_reserved() False >>> print(os.path._getfullpathname('C:/foo/NUL:')) \\.\NUL
>>> pathlib.Path('C:/foo/NUL ').is_reserved() False >>> print(os.path._getfullpathname('C:/foo/NUL ')) \\.\NUL
Trailing spaces followed by a file extension:
>>> pathlib.Path('C:/foo/NUL .txt').is_reserved() False >>> print(os.path._getfullpathname('C:/foo/NUL .txt')) \\.\NUL
Windows calls RtlIsDosDeviceName_Ustr to check whether a path represents a DOS device name. Here's a link to the reverse-engineered implementation of this function in ReactOS 4.1:
The ReactOS implementation performs the following steps:
It seems that ":" and "." are effectively equivalent for the purposes of is_reserved. Given this is the case, it could return whether parts[-1].partition('.').partition(':').rstrip(' ').upper() is in self.reserved_names. Or maybe use a regex for the entire check.
If a script is running on Windows, I think the best approach is to call os.path.abspath, which calls _getfullpathname. This lets Windows itself determine if the path maps to the \\.\ device namespace. However, I realize that is_reserved is intended to be cross-platform.
By the way, the comment for this method says that r"foo\NUL" isn't reserved, but it is. Maybe the author checked by trying to open NUL in a non-existing foo directory. DOS device names are only reserved in practice when opening and creating files in existing directories (as opposed to reserved in principle with GetFullPathName, which doesn't check for a valid path). NT can thus return an error that's consistent with how DOS behaved in the 1980s -- because that's really important, you know.
Also, "CONIN$" and "CONOUT$" need to be added to the list of reserved names. Prior to Windows 8 these two names are reserved only for the current directory, which for the most part also applies to "CON".
For Windows 8+, the redesign to use a real console device means that these three console devices are handled in exactly the same way as the other reserved DOS device names. For example:
>>> print(os.path.abspath('C:/Temp/conout$ : spam . eggs')) \\.\conout$
>>> print(os.path.abspath('C:/Temp/conout$ : spam . eggs')) C:\Temp\conout$ : spam . eggs
The attached patch adds tests and the suggested enhancement to _WindowsFlavour.is_reserved.
Shouldn't it also return True if the name contains ASCII control characters? They're only valid in NTFS stream names. Also, I think a name containing a colon that's not part of a DOS drive letter spec should be considered reserved. Otherwise it could designate an NTFS named stream (e.g. "path\filename:streamname:$DATA"), which is rarely desired and not universally supported, e.g. FAT32 doesn't support file streams. I'm thinking of a program that calls this method to ensure that a path is reasonably 'safe' for use on Windows -- i.e. isn't inherently invalid and won't do something surprising like open NUL or write to a named stream.
For COM[n] and LPT[n], only ASCII 1-9 and superscript 1-3 (U+00b9, U+00b2, and U+00b3) are handled as decimal digits. For example:
>>> print(*(ascii(chr(c)) for c in range(1, 65536) ... if _getfullpathname('COM%s' % chr(c)) == '\\'), sep=', ') '1', '2', '3', '4', '5', '6', '7', '8', '9', '\xb2', '\xb3', '\xb9'
The implementation uses iswdigit in ntdll.dll. (ntdll.dll is the system DLL that has the user-mode runtime library and syscall stubs -- except the Win32k syscall stubs are in win32u.dll.) ntdll's private CRT uses the C locale (Latin-1, not just ASCII), and it classifies these superscript digits as decimal digits:
>>> ntdll = ctypes.WinDLL('ntdll') >>> print(*(chr(c) for c in range(1, 65536) if ntdll.iswdigit(c))) 0 1 2 3 4 5 6 7 8 9 ² ³ ¹
Unicode, and thus Python, does not classify these superscript digits as decimal digits, so I just hard-coded the list.
Here's an example with an attached debugger to show the runtime library calling iswdigit:
The argument is in register rcx:
Skip to the ret instruction, and check the result in register rax:
Since U+2074 isn't considered a decimal digit, 'COM⁴' is not a reserved DOS device name. The system handles it as a regular filename: