Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we / can we make MaybeUninit<T> always preserve all bytes of T? #518

Open
RalfJung opened this issue Jul 26, 2024 · 19 comments
Open

Comments

@RalfJung
Copy link
Member

It is a frequent source of confusion that MaybeUninit<T> is not just preserving all the underlying bytes of storage, but actually if T has padding then those bytes are lost on copies/moves of MaybeUninit<T>.

This is currently pretty much a necessary consequence of the promise that MaybeUninit<T> is ABI-compatible with T: some ABIs don't preserve the padding of T when it is passed to a function. However, this was not part of the intention with MaybeUninit at all, it is something we discovered later.

Maybe we should try to take this back, and make the guarantee only for types without padding?

I am not even sure why we made this a guarantee. We made the type repr(transparent) because for performance it is quite important that MaybeUninit<$int> becomes just an iN in LLVM. But that doesn't require a stable guarantee. And in fact it seems like it would almost always be a bug if the caller and callee disagree about whether the value has to be initialized. So I would be curious about real-world examples where this guarantee is needed.

@elichai
Copy link

elichai commented Jul 26, 2024

I'll add that a typed copy of an uninitialized variable is UB in C, so there's no need to promise any ABI for FFI compatibility,
So this leaves us with the "Rust" ABI which isn't stable anyway.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 26, 2024 via email

@Diggsey

This comment was marked as off-topic.

@RalfJung

This comment was marked as off-topic.

@carbotaniuman
Copy link

The PR that introduced the guarantees does not talk about padding, and it seems like that wasn't really understood back then. The t-lang minutes discussing this are lost to time and reorganizations, but it seems doubtful that such a consideration was raised. Discussions from 2018 raise a lack of real-world use cases for ABI compatibility, and I agree with such a sentiment in the present.

I don't this this would be approved nowadays, but I am incredibly apprehensive about removing it. There are few places in the Rust documentation that use always for guarantees like this, and the use cases for some weird FFI thunks or bindings would be nigh-impossible to properly test with crater or similar...

@Diggsey

This comment was marked as off-topic.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 26, 2024

@carbotaniuman I think we should consider removing it. If we can't come up with any legitimate usecase, I think we should definitely remove it. I don't like going back on a promise like this, but if we don't have a usecase that could be broken by taking back this promise, then the chances that someone is affected should be very slim.

@Diggsey thanks for explaining why you think this belongs in this thread. But I disagree. "MaybeUninit preserves provenance" is not relevant here. You will note that provenance does not appear in the issue description. Furthermore, provenance on CHERI works like it does everywhere else, so even if provenance were relevant, CHERI wouldn't change anything. It is true that you can write code with MaybeUninit that will work everywhere but not on CHERI; discussing that is off-topic here as the reasons are completely different from what this thread is about and also different from what you mentioned -- it is caused not by padding and not by provenance, but by capabilities. So please take this elsewhere, e.g. Zulip or a new issue where you explain why you think MaybeUninit's provenance behavior is incompatible with CHERI, but I'd prefer not to see yet another thread derailed by CHERI. I like CHERI and want to see it work in Rust, but our main task here is to figure out Rust for the existing targets we already support. CHERI support is a nice extra that I'll happily discuss, as long as it doesn't distract from our core task.

@carbotaniuman
Copy link

If we do agree to remove the guarantee, I expect it to break 0 uses in practice. My only other concern would be the performance impact of having to copy more bytes. It probably won't affect SIMD or buffers though, so I don't really think that's it's really an issue.

@bjorn3
Copy link
Member

bjorn3 commented Jul 26, 2024

I think we should preserve the memory layout compatibility, but drop the calling convention compatibility. That could be done using repr(C) instead of repr(transparent) I think.

@chorman0773
Copy link
Contributor

FTR, I use MaybeUninit<T> in the signatures of lccc's libatomic and libsfp ABI-level routines. This is because they get called from xlang's codegen, and xlang allows uninit for those operations. Though in these cases, they are types without padding in the signature.

However, for compatibility with gcc/clang, they have to expose an ABI equal to the rountines using primitives.

@chorman0773
Copy link
Contributor

(And in general, I agree with @carbotaniuman - unless crater is testing all kinds of targets, I'm betting it primarily tests x86_64, where aggregate-of-one-field will get passed the same way as that one field*, so without using miri-crater, the ABI checks won't be found by crater. If the code is used on something like arm32 though, it's going to be very visibility broken)

@RalfJung
Copy link
Member Author

Yes, this is super hard to test for. I wonder if it's worth having a blog post asking people whether they need this guarantee...

I think we should preserve the memory layout compatibility, but drop the calling convention compatibility. That could be done using repr(C) instead of repr(transparent) I think.

Yes, concretely the proposal would be:

  • T and MaybeUninit<T> always have the same size and alignment
  • they have the same ABI if T has no padding

Or maybe "no padding" should be restricted a bit further, like "if T is a primitive integer/float/pointer type" or so. Note that some non-power-of-2 SIMD types have padding so we have to be careful if we want to talk about those types.

@Ddystopia
Copy link

Is the only motivation to backing up on that promise is the fact that this is a frequent source of confusion? Which benefits except clarity can Rust gain?

@RalfJung
Copy link
Member Author

We never intended MaybeUninit<(u8, u16)> to have a padding byte that would be lost on copies. This is a complete accident. It's not just a source of confusion, it's not the semantics we want. It came up in #517 where it means that returning a MaybeUninit<T> from an atomic compare-exchange actually doesn't work since padding bytes still get lost so we won't end up having the same bit pattern as what is stored in the atomic location.

@jamesmunns
Copy link
Member

jamesmunns commented Jul 26, 2024

I'm pretty sure this is still the case, but it might be worth it to enumerate things that ARE still allowed for this wrt FFI/ABI concerns. My primary use of MaybeUninit<T> in FFI is for "outptr" usages (edit: specifically &mut MaybeUninit<T> or *mut MaybeUninit<T>), which seems to be still good (because we never copy/pass by value - the part that is discussed by this issue), but for folks like myself might be worth spelling out clearly/contrasting what is no longer allowed.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 26, 2024

Yes, ABI compatibility is about the "by-value" part of a function argument or return type. That's how we've consistently been using this term for a while now, also see our glossary and the documentation on ABI compatibility.

In public communication we'll obviously spell out the details more than in internal discussion. ("Internal" not as in "private" but as in "among the team members and anyone else who's willing to participate".)

@chorman0773
Copy link
Contributor

Or maybe "no padding" should be restricted a bit further, like "if T is a primitive integer/float/pointer type" or so. Note that some non-power-of-2 SIMD types have padding so we have to be careful if we want to talk about those types.

I at the very least need target simd types as well - for floating-point types that aren't directly supported by rust (e.g. f2x64_t), I wrap them in a target-specific ABI type, which on x86_64, is mostly __m128.

@RalfJung
Copy link
Member Author

Since that is a compiler-internal concern, you could also do this by providing more ABI guarantees than what Rust provides in general.

But that case would be covered by "types without padding", or we could explicitly mention the stdarch SIMD types (since they are all powers of 2).

@chorman0773
Copy link
Contributor

Since that is a compiler-internal concern, you could also do this by providing more ABI guarantees than what Rust provides in general.

Not fully - you don't necessarily need to compile the rtlibs with lccc themselves, they're written in mostly portable rust, and quite deliberately. I'd like to be able to continue providing that guarantee.

Yes, ABI compatibility is about the "by-value" part of a function argument or return type. That's how we've consistently been using this term for a while now, also see our glossary and the documentation on ABI compatibility.

You can also now see a formalization in reference#1545, as a note.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants