-
-
Notifications
You must be signed in to change notification settings - Fork 33k
gh-83714: Implement os.statx #139178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-83714: Implement os.statx #139178
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Here is a first review.
Co-authored-by: Victor Stinner <vstinner@python.org>
Modules/posixmodule.c
Outdated
/* Future bits may refer to members beyond the current size of struct | ||
statx, so we need to mask them off to prevent memory corruption. */ | ||
mask &= _Py_STATX_KNOWN; | ||
int flags = AT_NO_AUTOMOUNT | (follow_symlinks ? 0 : AT_SYMLINK_NOFOLLOW); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that os.statx() has a flags parameter rather than only supporting sync option:
- I would prefer to not pass AT_NO_AUTOMOUNT by default, it's the responsibility of the caller to pass it.
- This API doesn't let using AT_STATX_SYNC_AS_STAT mode.
- We don't support future flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done as you directed, but I have some comments.
- I would prefer to not pass AT_NO_AUTOMOUNT by default, it's the responsibility of the caller to pass it.
Well, I would prefer to add an automount=False
option so that the default matches os.stat
, rather than being opposite. It's a trap for people modifying code that previously used os.stat
that probably won't be noticed until it becomes a problem for someone. But I don't feel that strongly.
- This API doesn't let using AT_STATX_SYNC_AS_STAT mode.
In this API, sync=None
is AT_STATX_SYNC_AS_STAT mode. It's noted in the docstring and in the documentation I wrote:
sync=None
expresses no preference, in which case the kernel
will return information as fresh as :func:~os.stat
does.
This is a direct translation of the C interface, where AT_STATX_FORCE_SYNC and AT_STATX_DONT_SYNC are real flags, but AT_STATX_SYNC_AS_STAT, which is defined as 0, is merely a marker that can be used to explicitly accept the default of whatever stat does. But if this is not clear to you, it probably won't be clear to users, so I guess you made your point anyway. I've been more explicit in the new documentation for the os.AT_STATX_SYNC_AS_STAT
constant.
- We don't support future flags.
On the other hand, we now have to reject flags relating to follow_symlinks
or dir_fd
, because the flags argument is just an int and the user can pass AT_SYMLINK_NOFOLLOW
or AT_EMPTY_PATH
despite there not being constants for them in the os module.
Co-authored-by: Victor Stinner <vstinner@python.org>
In addition to making the changes you directed, when modifying the
and |
automount point instead of performing the automount. (On Linux, | ||
:func:`os.stat`, :func:`os.fstat` and :func:`os.lstat` always behave this | ||
way.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
automount point instead of performing the automount. (On Linux, | |
:func:`os.stat`, :func:`os.fstat` and :func:`os.lstat` always behave this | |
way.) | |
automount point instead of performing the automount. On Linux, | |
:func:`os.stat`, :func:`os.fstat` and :func:`os.lstat` always behave this | |
way. |
Added the :attr:`st_birthtime` member on Windows. | ||
|
||
|
||
.. function:: statx(path, mask, flags=0, *, dir_fd=None, follow_symlinks=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of marking mask optional? Add a default value of 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's possible to just call os.statx('.')
.
I'm not fully comfortable with having some os.statx() tests in test_posix and some others in test_os. But it seems like you reused existing code in test_posix and test_os, so I would say that I'm fine with it. We might merge test_posix and test_os into a single test, but that's a different topic. |
I created #139322 for that :-) |
This PR implements
os.statx
, an interface to the Linuxstatx(2)
system call introduced in kernel version 4.11 (April 2017) and first available in glibc version 2.28 (August 2018). This is derived from my earlier PR #136334 with the changes toos.stat
removed, plus other changes (see below).This PR implements the return value of
os.statx
with a custom C typestatx_result
, which contains astruct statx
and uses member and getset descriptors to lazily create Python objects on attribute access. In a comment on the previous PR, @vstinner suggested usingtypes.SimpleNamespace
as the return value. I implemented that on the statx-simplenamespace branch in my fork. Using the benchmarks from this script (note they are not all fair comparisons):It's slower than this PR when only size and mtime are requested, because it creates objects for the unconditionally valid members and both the float-seconds and int-nanoseconds timestamps. We could get some of this back by defining our own mask bits (bit 32 and above). As more bits are set in the mask and attributes are accessed, the gap widens. I'm not sure if that's due to dict resizing or slower attribute access or both, and I don't see how to improve either. The advantage of the namespace implementation is its simplicity, and any speed hacks would dilute that.
(Also, because it doesn't create unrequested members, it's not a perfect wrapper around the system call. For "real" use it doesn't matter, but for testing the syscall, you'll miss some quirks. For example, btrfs seems to always return atime and btime, but returns mtime and ctime (always together) only if mtime and/or ctime were requested.)
In this PR, most attributes are implemented with member descriptors pointing into the
struct statx
or a member instatx_result
, so each attribute access onstatx_result
creates a new int or float object. My previous PR cached the created objects in thestatx_result
for the commonly-used attributes to avoid creating them more than once, which requires getset descriptors. Obviously, creating objects has a cost, but getset descriptors are slower than member descriptors. The crossover point turns out to be about two accesses (per attribute). If an attribute is only used once, this PR is faster; twice, slightly faster or even; three or more times, the previous PR is faster. I decided the complexity of the cache wasn't worth it. There's a cleaned-up version of the caching implementation on my fork if you want to take your own measurements.(I also didn't test the cache in the free-threaded build; I think the atomics/locking is correct, but I don't really understand how critical sections that can be "suspended" can possibly be safe.)
statx(2)
system call on Linux for extendedos.stat
information #83714📚 Documentation preview 📚: https://cpython-previews--139178.org.readthedocs.build/