Skip to content

Conversation

jbosboom
Copy link
Contributor

@jbosboom jbosboom commented Sep 20, 2025

This PR implements os.statx, an interface to the Linux statx(2) system call introduced in kernel version 4.11 (April 2017) and first available in glibc version 2.28 (August 2018). This is derived from my earlier PR #136334 with the changes to os.stat removed, plus other changes (see below).


This PR implements the return value of os.statx with a custom C type statx_result, which contains a struct statx and uses member and getset descriptors to lazily create Python objects on attribute access. In a comment on the previous PR, @vstinner suggested using types.SimpleNamespace as the return value. I implemented that on the statx-simplenamespace branch in my fork. Using the benchmarks from this script (note they are not all fair comparisons):

Benchmark statx-nocache namespace
statx-sizemtime-mask-size-mtimens 1.86 us 3.07 us: 1.65x slower
statx-basic-all 2.29 us 4.57 us: 1.99x slower
statx-everything-all 2.61 us 6.34 us: 2.43x slower

It's slower than this PR when only size and mtime are requested, because it creates objects for the unconditionally valid members and both the float-seconds and int-nanoseconds timestamps. We could get some of this back by defining our own mask bits (bit 32 and above). As more bits are set in the mask and attributes are accessed, the gap widens. I'm not sure if that's due to dict resizing or slower attribute access or both, and I don't see how to improve either. The advantage of the namespace implementation is its simplicity, and any speed hacks would dilute that.

(Also, because it doesn't create unrequested members, it's not a perfect wrapper around the system call. For "real" use it doesn't matter, but for testing the syscall, you'll miss some quirks. For example, btrfs seems to always return atime and btime, but returns mtime and ctime (always together) only if mtime and/or ctime were requested.)


In this PR, most attributes are implemented with member descriptors pointing into the struct statx or a member in statx_result, so each attribute access on statx_result creates a new int or float object. My previous PR cached the created objects in the statx_result for the commonly-used attributes to avoid creating them more than once, which requires getset descriptors. Obviously, creating objects has a cost, but getset descriptors are slower than member descriptors. The crossover point turns out to be about two accesses (per attribute). If an attribute is only used once, this PR is faster; twice, slightly faster or even; three or more times, the previous PR is faster. I decided the complexity of the cache wasn't worth it. There's a cleaned-up version of the caching implementation on my fork if you want to take your own measurements.

(I also didn't test the cache in the free-threaded build; I think the atomics/locking is correct, but I don't really understand how critical sections that can be "suspended" can possibly be safe.)


📚 Documentation preview 📚: https://cpython-previews--139178.org.readthedocs.build/

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Here is a first review.

Co-authored-by: Victor Stinner <vstinner@python.org>
/* Future bits may refer to members beyond the current size of struct
statx, so we need to mask them off to prevent memory corruption. */
mask &= _Py_STATX_KNOWN;
int flags = AT_NO_AUTOMOUNT | (follow_symlinks ? 0 : AT_SYMLINK_NOFOLLOW);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that os.statx() has a flags parameter rather than only supporting sync option:

  • I would prefer to not pass AT_NO_AUTOMOUNT by default, it's the responsibility of the caller to pass it.
  • This API doesn't let using AT_STATX_SYNC_AS_STAT mode.
  • We don't support future flags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done as you directed, but I have some comments.

  • I would prefer to not pass AT_NO_AUTOMOUNT by default, it's the responsibility of the caller to pass it.

Well, I would prefer to add an automount=False option so that the default matches os.stat, rather than being opposite. It's a trap for people modifying code that previously used os.stat that probably won't be noticed until it becomes a problem for someone. But I don't feel that strongly.

  • This API doesn't let using AT_STATX_SYNC_AS_STAT mode.

In this API, sync=None is AT_STATX_SYNC_AS_STAT mode. It's noted in the docstring and in the documentation I wrote:

sync=None expresses no preference, in which case the kernel
will return information as fresh as :func:~os.stat does.

This is a direct translation of the C interface, where AT_STATX_FORCE_SYNC and AT_STATX_DONT_SYNC are real flags, but AT_STATX_SYNC_AS_STAT, which is defined as 0, is merely a marker that can be used to explicitly accept the default of whatever stat does. But if this is not clear to you, it probably won't be clear to users, so I guess you made your point anyway. I've been more explicit in the new documentation for the os.AT_STATX_SYNC_AS_STAT constant.

  • We don't support future flags.

On the other hand, we now have to reject flags relating to follow_symlinks or dir_fd, because the flags argument is just an int and the user can pass AT_SYMLINK_NOFOLLOW or AT_EMPTY_PATH despite there not being constants for them in the os module.

Co-authored-by: Victor Stinner <vstinner@python.org>
@jbosboom
Copy link
Contributor Author

In addition to making the changes you directed, when modifying the sync= kwarg test to test the flags parameter, I moved that test to test_posix.py because test_os.py's header comment says it is

for a few functions which have been determined to be more
portable than they had been thought to be.

and os.statx is not portable. The os.statx_result tests are still in test_os.py because they reuse some of the code for testing os.stat_result in that file. The split seems quite arbitrary to me, but I'm sure there are historical reasons.

Comment on lines +3547 to +3549
automount point instead of performing the automount. (On Linux,
:func:`os.stat`, :func:`os.fstat` and :func:`os.lstat` always behave this
way.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
automount point instead of performing the automount. (On Linux,
:func:`os.stat`, :func:`os.fstat` and :func:`os.lstat` always behave this
way.)
automount point instead of performing the automount. On Linux,
:func:`os.stat`, :func:`os.fstat` and :func:`os.lstat` always behave this
way.

Added the :attr:`st_birthtime` member on Windows.


.. function:: statx(path, mask, flags=0, *, dir_fd=None, follow_symlinks=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of marking mask optional? Add a default value of 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's possible to just call os.statx('.').

@vstinner
Copy link
Member

I'm not fully comfortable with having some os.statx() tests in test_posix and some others in test_os. But it seems like you reused existing code in test_posix and test_os, so I would say that I'm fine with it. We might merge test_posix and test_os into a single test, but that's a different topic.

@vstinner
Copy link
Member

We might merge test_posix and test_os into a single test, but that's a different topic.

I created #139322 for that :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants