gh-83714: Implement os.statx #139178

jbosboom · 2025-09-20T09:13:49Z

This PR implements os.statx, an interface to the Linux statx(2) system call introduced in kernel version 4.11 (April 2017) and first available in glibc version 2.28 (August 2018). This is derived from my earlier PR #136334 with the changes to os.stat removed, plus other changes (see below).

This PR implements the return value of os.statx with a custom C type statx_result, which contains a struct statx and uses member and getset descriptors to lazily create Python objects on attribute access. In a comment on the previous PR, @vstinner suggested using types.SimpleNamespace as the return value. I implemented that on the statx-simplenamespace branch in my fork. Using the benchmarks from this script (note they are not all fair comparisons):

Benchmark	statx-nocache	namespace
statx-sizemtime-mask-size-mtimens	1.86 us	3.07 us: 1.65x slower
statx-basic-all	2.29 us	4.57 us: 1.99x slower
statx-everything-all	2.61 us	6.34 us: 2.43x slower

It's slower than this PR when only size and mtime are requested, because it creates objects for the unconditionally valid members and both the float-seconds and int-nanoseconds timestamps. We could get some of this back by defining our own mask bits (bit 32 and above). As more bits are set in the mask and attributes are accessed, the gap widens. I'm not sure if that's due to dict resizing or slower attribute access or both, and I don't see how to improve either. The advantage of the namespace implementation is its simplicity, and any speed hacks would dilute that.

(Also, because it doesn't create unrequested members, it's not a perfect wrapper around the system call. For "real" use it doesn't matter, but for testing the syscall, you'll miss some quirks. For example, btrfs seems to always return atime and btime, but returns mtime and ctime (always together) only if mtime and/or ctime were requested.)

In this PR, most attributes are implemented with member descriptors pointing into the struct statx or a member in statx_result, so each attribute access on statx_result creates a new int or float object. My previous PR cached the created objects in the statx_result for the commonly-used attributes to avoid creating them more than once, which requires getset descriptors. Obviously, creating objects has a cost, but getset descriptors are slower than member descriptors. The crossover point turns out to be about two accesses (per attribute). If an attribute is only used once, this PR is faster; twice, slightly faster or even; three or more times, the previous PR is faster. I decided the complexity of the cache wasn't worth it. There's a cleaned-up version of the caching implementation on my fork if you want to take your own measurements.

(I also didn't test the cache in the free-threaded build; I think the atomics/locking is correct, but I don't really understand how critical sections that can be "suspended" can possibly be safe.)

Issue: Use statx(2) system call on Linux for extended os.stat information #83714

📚 Documentation preview 📚: https://cpython-previews--139178.org.readthedocs.build/

vstinner

Thanks. Here is a first review.

Modules/posixmodule.c

Doc/library/os.rst

Lib/test/test_os.py

Lib/test/test_posix.py

Co-authored-by: Victor Stinner <vstinner@python.org>

vstinner · 2025-09-23T10:07:02Z

Modules/posixmodule.c

+    /* Future bits may refer to members beyond the current size of struct
+       statx, so we need to mask them off to prevent memory corruption. */
+    mask &= _Py_STATX_KNOWN;
+    int flags = AT_NO_AUTOMOUNT | (follow_symlinks ? 0 : AT_SYMLINK_NOFOLLOW);


I would prefer that os.statx() has a flags parameter rather than only supporting sync option:

I would prefer to not pass AT_NO_AUTOMOUNT by default, it's the responsibility of the caller to pass it.

This API doesn't let using AT_STATX_SYNC_AS_STAT mode.

We don't support future flags.

I've done as you directed, but I have some comments.

I would prefer to not pass AT_NO_AUTOMOUNT by default, it's the responsibility of the caller to pass it.

Well, I would prefer to add an automount=False option so that the default matches os.stat, rather than being opposite. It's a trap for people modifying code that previously used os.stat that probably won't be noticed until it becomes a problem for someone. But I don't feel that strongly.

This API doesn't let using AT_STATX_SYNC_AS_STAT mode.

In this API, sync=None is AT_STATX_SYNC_AS_STAT mode. It's noted in the docstring and in the documentation I wrote:

sync=None expresses no preference, in which case the kernel
will return information as fresh as :func:~os.stat does.

This is a direct translation of the C interface, where AT_STATX_FORCE_SYNC and AT_STATX_DONT_SYNC are real flags, but AT_STATX_SYNC_AS_STAT, which is defined as 0, is merely a marker that can be used to explicitly accept the default of whatever stat does. But if this is not clear to you, it probably won't be clear to users, so I guess you made your point anyway. I've been more explicit in the new documentation for the os.AT_STATX_SYNC_AS_STAT constant.

We don't support future flags.

On the other hand, we now have to reject flags relating to follow_symlinks or dir_fd, because the flags argument is just an int and the user can pass AT_SYMLINK_NOFOLLOW or AT_EMPTY_PATH despite there not being constants for them in the os module.

Co-authored-by: Victor Stinner <vstinner@python.org>

jbosboom · 2025-09-24T05:08:01Z

In addition to making the changes you directed, when modifying the sync= kwarg test to test the flags parameter, I moved that test to test_posix.py because test_os.py's header comment says it is

for a few functions which have been determined to be more
portable than they had been thought to be.

and os.statx is not portable. The os.statx_result tests are still in test_os.py because they reuse some of the code for testing os.stat_result in that file. The split seems quite arbitrary to me, but I'm sure there are historical reasons.

vstinner · 2025-09-24T10:59:38Z

Doc/library/os.rst

+   automount point instead of performing the automount.  (On Linux,
+   :func:`os.stat`, :func:`os.fstat` and :func:`os.lstat` always behave this
+   way.)


Suggested change

automount point instead of performing the automount. (On Linux,

:func:`os.stat`, :func:`os.fstat` and :func:`os.lstat` always behave this

way.)

automount point instead of performing the automount. On Linux,

:func:`os.stat`, :func:`os.fstat` and :func:`os.lstat` always behave this

way.

vstinner · 2025-09-24T11:13:19Z

Doc/library/os.rst

      Added the :attr:`st_birthtime` member on Windows.


+.. function:: statx(path, mask, flags=0, *, dir_fd=None, follow_symlinks=True)


What do you think of marking mask optional? Add a default value of 0.

So it's possible to just call os.statx('.').

vstinner · 2025-09-25T11:07:02Z

I'm not fully comfortable with having some os.statx() tests in test_posix and some others in test_os. But it seems like you reused existing code in test_posix and test_os, so I would say that I'm fine with it. We might merge test_posix and test_os into a single test, but that's a different topic.

vstinner · 2025-09-25T11:59:36Z

We might merge test_posix and test_os into a single test, but that's a different topic.

I created #139322 for that :-)

Implement os.statx

c2e6f81

jbosboom requested review from erlend-aasland, corona10, AA-Turner and emmatyping as code owners September 20, 2025 09:13

bedevere-app bot added the awaiting review label Sep 20, 2025

bedevere-app bot mentioned this pull request Sep 20, 2025

Use statx(2) system call on Linux for extended os.stat information #83714

Open

jbosboom mentioned this pull request Sep 20, 2025

gh-83714: Use statx on Linux 4.11 and later in os.stat #136334

Closed

vstinner reviewed Sep 22, 2025

View reviewed changes

Apply suggestions from code review

a1110f3

Co-authored-by: Victor Stinner <vstinner@python.org>

vstinner reviewed Sep 23, 2025

View reviewed changes

Apply suggestions from code review

51ef6cc

Co-authored-by: Victor Stinner <vstinner@python.org>

vstinner reviewed Sep 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-83714: Implement os.statx #139178

gh-83714: Implement os.statx #139178

Uh oh!

jbosboom commented Sep 20, 2025 •

edited

Loading

Uh oh!

vstinner left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vstinner Sep 23, 2025

Uh oh!

jbosboom Sep 24, 2025

Uh oh!

jbosboom commented Sep 24, 2025

Uh oh!

vstinner Sep 24, 2025

Uh oh!

vstinner Sep 24, 2025

Uh oh!

vstinner Sep 24, 2025

Uh oh!

vstinner commented Sep 25, 2025

Uh oh!

vstinner commented Sep 25, 2025

Uh oh!

Uh oh!

		Added the :attr:`st_birthtime` member on Windows.


		.. function:: statx(path, mask, flags=0, *, dir_fd=None, follow_symlinks=True)

Uh oh!

gh-83714: Implement os.statx #139178

Are you sure you want to change the base?

gh-83714: Implement os.statx #139178

Uh oh!

Conversation

jbosboom commented Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vstinner Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

jbosboom Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

jbosboom commented Sep 24, 2025

Uh oh!

vstinner Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

vstinner Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

vstinner Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

vstinner commented Sep 25, 2025

Uh oh!

vstinner commented Sep 25, 2025

Uh oh!

Uh oh!

jbosboom commented Sep 20, 2025 •

edited

Loading