Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.stat()’s follow_symlinks is a bit ambigously described #96402

Open
calestyo opened this issue Aug 29, 2022 · 4 comments
Open

os.stat()’s follow_symlinks is a bit ambigously described #96402

calestyo opened this issue Aug 29, 2022 · 4 comments
Labels
docs Documentation in the Doc dir

Comments

@calestyo
Copy link
Contributor

Documentation

There are numerous functions which take a pathname and and argument like follow_symlinks.

For most of these, the argument follow_symlinks is not further explained in the function itself, but people will rather have to resort to https://docs.python.org/3/library/os.html#files-and-directories where things are rather exactly described.

However, the description of os.stat() has:

This function normally follows symlinks; to stat a symlink add the argument follow_symlinks=False, or use lstat().

Which is however only half correct, because what it actually means is:

When the last component of the path is a symbolic link, the function normally follows it. Symbolic links in the path that are not the last component, are always followed.

Similar, the paragraph below for windows, also uses wording that implies any name-surrogate reparse points, i.e. not only if the last pathname component is one.
No idea what Windows does, but if that's also wrong, it should be corrected accordingly. Also in the "Changed in" entry for that.

AFAICS, the other functions of os have it correctly described (by simply not describing it).

Thanks,
Chris.

@calestyo calestyo added the docs Documentation in the Doc dir label Aug 29, 2022
@eryksun
Copy link
Contributor

eryksun commented Aug 30, 2022

Usually it's taken for granted that symbolic links in the parent path are followed. Otherwise, path parsing would stop on the first symbolic link in the path, but how would one know where it stopped without manually parsing the path?

@eryksun
Copy link
Contributor

eryksun commented Aug 30, 2022

No idea what Windows does

FYI, Windows uses filesystem reparse points, which integrate with generalized support for path reparsing in the kernel object namespace. A filesystem reparse point contains reparse data that's identified by a 32-bit tag such as IO_REPARSE_TAG_MOUNT_POINT (0xA000_0003) or IO_REPARSE_TAG_SYMLINK (0xA000_000C). The upper 16 bits of the tag are reserved as attributes, including whether it's a Microsoft type (0x8000_0000) and whether it's a name surrogate type (0x2000_0000), i.e. whether it targets another named object in the system. Non-Microsoft tags have to be registered with Microsoft.

For follow_symlinks=False, Python 3.8+ handles name-surrogate reparse points as symlink-ish. In os.stat(), this is implemented by calling CreateFileW() with the flag FILE_FLAG_OPEN_REPARSE_POINT and querying whether it's a reparse point and, if so, whether it's a name surrogate. If it's a reparse point but not a name surrogate (e.g. tiered storage) 1, CreateFileW() is called again without FILE_FLAG_OPEN_REPARSE_POINT.

That said, the S_IFLNK POSIX file type, and thus os.path.islink() support, is limited to IO_REPARSE_TAG_SYMLINK. This limit will be in place until any type of name-surrogate reparse point can be copied exactly via os.readlink() and os.symlink(). That's technically possible, but it hasn't been implemented.

At a low level, Windows supports an OBJ_DONT_REPARSE flag in OBJECT_ATTRIBUTES, which disables all path reparsing. It can be used in direct NT system calls such as NtOpenFile() and NtCreateFile(). However, it's intended for security scenarios and otherwise not useful. On the other hand, in the kernel API for drivers, IoCreateFileEx() supports a useful IO_STOP_ON_SYMLINK option. Instead of reparsing a symlink, the call fails with the warning STATUS_STOPPED_ON_SYMLINK and the reparse buffer is returned, including the number of unparsed bytes in the opened path. For example, symlinks in a remote path have to be reparsed and opened locally on the client side, so the SMB server on the remote side uses IO_STOP_ON_SYMLINK when opening a path. If the open fails with STATUS_STOPPED_ON_SYMLINK, the client gets sent a symbolic link error response.

Footnotes

  1. Note that placeholder reparse points (e.g. OneDrive and Projected FS) may be disguised as regular files and directories by default (e.g. see Expose placeholder reparse points in Windows #83493). The placeholder behavior was introduced because many applications naively handle all reparse points as symlink-ish. This used to include Python.

@calestyo
Copy link
Contributor Author

Usually it's taken for granted that symbolic links in the parent path are followed. Otherwise, path parsing would stop on the first symbolic link in the path, but how would one know where it stopped without manually parsing the path?

Well in principle yes, but still, I think it wouldn't harm to have it described as precisely as possible.

IMO, especially the plural of follow_symlinks and also “symlinks” in the description is a bit unfortunate. E.g. in the description it uses singular for the path, but then plural in “This function normally follows symlinks”… but per invocation, there can be at most one symlink (the one in the last component) which is actually followed,... so a reader could be lead into some false assumptions about other symlinks in the path.

Also one could e.g. imagine some weird behaviour to happen like: /foo/symlink/../bar/baz not being followed in the usual sense but resulting in /foo/bar/baz (which is something different).

Now you may argue why someone should ever had written it to do that, but it seems as if os.path.realpath() would do just something like that:

Assume /root/ is inaccessible for normal users and /root/testbeing a symlink to/`, than as normal user:

>>> os.path.realpath("/root/test/..")
'/root'

whereas as root:

>>> os.path.realpath("/root/foo/..")
'/'

The (POSIX) realpath(1) tool, as normal user gives:

$ realpath "/root/test/.."
realpath: /root/test/..: Permission denied

@calestyo
Copy link
Contributor Author

Oh and thanks for the elaborate description about the situation on Windows. I personally don't use it, but maybe it would be interesting for others to have that in the description?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
None yet
Development

No branches or pull requests

2 participants