Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When does copying an uninitialized byte poison the destination? #301

Closed
mcy opened this issue Aug 9, 2021 · 14 comments
Closed

When does copying an uninitialized byte poison the destination? #301

mcy opened this issue Aug 9, 2021 · 14 comments

Comments

@mcy
Copy link

mcy commented Aug 9, 2021

In other words, is this program UB?

struct HasPadding(u8, u16);
let mut bytes = [0; size_of::<HasPadding>()];
(&mut bytes as *mut _ as *mut HasPadding).write(HasPadding(42, 42));   // Assume well-aligned.

println!("{:?}", bytes);  // Using `bytes` shouldn't affect if this is UB.

If I have initialized memory somewhere, and I do a type-specific copy (not a generic memcpy) of a type with uninitialized bytes (be it padding bytes or a MayebeUninit<T>) does the uninitialized-ness get copied too? (The answer need not be the same for each case.)

Given that we seem to believe that assigning a struct does not copy padding bits (can't find the issue # for this atm), I feel like this code should not be UB (by way of making one of bytes' bytes uninitialized.

@scottmcm
Copy link
Member

scottmcm commented Aug 9, 2021

(nit: "poison" has a specific meaning in LLVM, so it'd probably be better to avoid that world unless that's meant specifically.)

"uninit" is basically another possible value in addition to the "normal" ones. So yes, copying uninit around absolutely copies the uninit-ness -- even writing the uninit to memory and reading it back keeps the uninit-ness.

(This is important for optimization. For example, you can only optimize *p = x; y = *p; to *p = x; y = x; if the uninit-ness gets written-to and read-from memory as well.)

And MIRI says it's currently UB: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=36f80529ae08fa1ae686d891c125f05d

error: Undefined Behavior: using uninitialized data, but this operation requires initialized memory
   --> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/num.rs:462:5
    |
462 | /     impl_Display!(
463 | |         i8, u8, i16, u16, i32, u32, i64, u64, usize, isize
464 | |             as u64 via to_u64 named fmt_u64
465 | |     );
    | |______^ using uninitialized data, but this operation requires initialized memory

@digama0
Copy link

digama0 commented Aug 9, 2021

The part about this example that's not clear to me is: In a typed pointer write of a HasPadding into *mut HasPadding, exactly which bytes are written to? AFAICT we have a free choice between whether padding bytes have uninit written into them, or not (preserving the destination bytes). I don't know any UCG documentation about this.

I believe internal padding bytes should probably be copied, in order to allow for a memcpy implementation of the write. Trailing padding bytes can possibly avoid being copied, but copying all padding bytes seems to lead to the simplest specification. If stride != size then you also have to worry about stride padding bytes, but I don't know of any plans to change that.

@scottmcm
Copy link
Member

scottmcm commented Aug 9, 2021

In the abstract machine, size_of::<T> bytes are written. And that's the level at which UB is analyzed.

In implementation, the compiler might not write padding bytes, because it would be UB to depend on their value anyway so it's inobservable whether they were written or not.

@digama0
Copy link

digama0 commented Aug 9, 2021

In the abstract machine, size_of::<T> bytes are written. And that's the level at which UB is analyzed.

Is that documented / guaranteed anywhere? (I agree that, if true, this will resolve the question.)

@RalfJung
Copy link
Member

For padding it's actually even more interesting: after writing a HasPadding somewhere, the padding in the destination will always be "uninit" -- no matter the previous state of the padding in source or destination. (Miri does not currently properly implement this.)

So for the original question, this becomes UB when you print the bytes, as that's the first time that the unit padding byte is "used" at integer type.

Is that documented / guaranteed anywhere?

Not that I know of. This is the most conservative model I can imagine and based on what I think C does.

@mcy
Copy link
Author

mcy commented Aug 10, 2021

(nit: "poison" has a specific meaning in LLVM, so it'd probably be better to avoid that world unless that's meant specifically.)

Hehehe, I think in LLVM terms too much sometimes.

"uninit" is basically another possible value in addition to the "normal" ones. So yes, copying uninit around absolutely copies the uninit-ness -- even writing the uninit to memory and reading it back keeps the uninit-ness.

This is what I expect, but ISTR we didn't want padding to get copied when doing typed pointer writes (or any assignment) because that can step on niches. Mind, the abstract machine might not care about niches and all of this is moot, but it would be nice if typed writes of structs with padding didn't have this problem...

@RalfJung
Copy link
Member

RalfJung commented Aug 10, 2021

it would be nice if typed writes of structs with padding didn't have this problem...

I don't think that is reasonably possible. We have to allow the compiler to turn a typed write into a byte-wise memcpy, so we have to allow "overwriting" previously initialized padding bytes in the destination.

@Diggsey
Copy link

Diggsey commented Aug 10, 2021

I don't think that is reasonably possible.

It would be possible if padding was always zero, since then memcpy will set the correct niche, but obviously there are other issues with that.

@RalfJung
Copy link
Member

I am also not entirely sure which guarantees LLVM makes for padding.

@RalfJung
Copy link
Member

RalfJung commented Aug 2, 2023

I think this question has a pretty clear answer: after a typed copy, all padding bytes in the destination are Uninit. Preserving source bytes does not work since we want to allow field-by-field copy, and preserving target bytes does not work since we want to allow doing a plain memcpy of the entire type. We could say "non-deterministically use source or target byte value", but I'd rather avoid non-determinism without a good motivation (non-determinism is subtle to reason about in the compiler since it needs to make consistent choices), and also the decode/encode way that MiniRust handles typed copies fundamentally cannot represent this -- it makes the spec of assignment a lot more complicated than currently, which I don't think we have sufficient motivation for.

The validity invariant for padding imposes no requirements at all, so we also can't rely on it always being 0 or something like that.

In other words, even this is UB:

#[repr(C)]
struct HasPadding(u8, u16);
assert_eq!(size_of::<HasPadding>(), 4);

let src = 0u32;
let mut dst = 0u32;
let srcptr = addr_of!(src).cast::<HasPadding>();
let dstptr = addr_of_mut!(dst).cast::<HasPadding>();
*dstptr = *srcptr; // copy at type `HasPadding`
assert!(dst == 0); // UB! `dst` has an uninit byte, validity invariant violated

In the hope that we have consensus on this question...
@rfcbot close

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 2, 2023

Team member @RalfJung has proposed to close this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 7, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 17, 2023

The final comment period, with a disposition to close, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

@RalfJung
Copy link
Member

I added this to https://github.com/rust-lang/opsem-team/blob/main/fcps.md, the issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants