Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-89188: replace bitfield with struct fields in PyASCIIObject #102589

Closed

Conversation

davidhewitt
Copy link
Contributor

@davidhewitt davidhewitt commented Mar 10, 2023

Closes #89188

I noticed that PyASCIIObject.state is a 32-bit bitfield with 4 members, so I replaced it with 4 separate uint8_t fields. The advantage of doing this is that C bitfield layout is implementation-defined, which makes it hard to interact with them from non-C languages (e.g. Rust).

cc @encukou @vstinner as you participated in the discussion on the original issue, I'd be interested to hear if you think this looks like a suitable solution.

Comment on lines -136 to -137
/* Padding to ensure that PyUnicode_DATA() is always aligned to
4 bytes (see issue #19537 on m68k). */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment on alignment seemed important, so I created a test in _testinternalcapi.c which I believe codifies this.

@davidhewitt
Copy link
Contributor Author

Looks like I may not have got the gdb lib changes exactly right, I can take another look at those on a Linux system once I've had some input on whether this direction is feasible, i.e. I know it's worth me spending the time to debug.

@encukou
Copy link
Member

encukou commented Mar 21, 2023

I don't find the Rust argument convincing -- we should expose good API functions, rather than make internal structs easy to use. Or is there a performance-sensitive use case? (If there is we might want to add some higher-level API for it, rather than making you play with bits.)

I do think this should change. Exposing compiler-implementation-defined behavior in the API is a ticking bomb.

IMO, if it changes it should change to bit masks. With this PR, there's no way to add an additional unrelated flag. (And one might be needed soon for static/immortal objects).

Both a version with bit masks and the PR as is are an API change. Per PEP 387, it needs a deprecation warning period or a SC exception.
(If you take the current PR and wrap the fields in struct state, it might not be an API change -- but as above, four u8s are painting ourselves in a corner.)
IMO, since this isn't time-sensitive, we should deprecate the state struct (for use outside CPython), and after a deprecation period, make it an 32-bit int and use it with use bit masks. And make the bit masks themselves private, or at least unstable, so that users preferably go through the public accessor functions.

Meanwhile, for Rust, you can expose the accessors as actual API functions, while keeping the static inline versions. There's a bit of a dance to do that, see e.g. (1, 2, 3) for PyVectorcall_NARGS.

@davidhewitt
Copy link
Contributor Author

Thanks @encukou, I wasn't aware that this struct would need the deprecation cycle applied.

I'll replace this PR with one proposing API functions. To check I understand, as per the PyVectorcall_NARGS dance, I'll need to add these accessors to the stable API?

@encukou
Copy link
Member

encukou commented Apr 3, 2023

I'll need to add these accessors to the stable API?

No, unlike vectorcall, they don't need to be in the limited API/stable ABI. (They can be added later if needed.)
The dance is to avoid having only inline functions (which can be faster but are only usable from C/C++).

@davidhewitt
Copy link
Contributor Author

Superseded by #115211 (which may itself yet be superseded by something else, who knows!)

@davidhewitt davidhewitt closed this Feb 9, 2024
@davidhewitt davidhewitt deleted the asciiobject-state-fields branch February 9, 2024 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reliance on C bit fields in C API is undefined behavior
4 participants