Make sure our handling of partially initialized values is compatible with LLVM / We cannot use LLVM poison semantics #94428

RalfJung · 2022-02-27T17:19:51Z

The intent is for types like MaybeUninit<u64> to support dealing with partially initialized data: e.g., if we have a (u32, u16) (and assuming for a second we could rely on its layout), it should be sound to transmute that to MaybeUninit<u64> and back even though the padding between the two tuple fields might be uninitialized. Code like #94212 relies on this.

The thing is, we are compiling MaybeUninit<u64> to i64 for LLVM -- MaybeUninit is repr(transparent). This was required to avoid codegen regressions when MaybeUninit started to be used in some hot data copying loops inside libcore. So, for this all to work out, we better be sure that i64 correctly preserves partially initialized data.

LLVM has two kinds of "uninit" data, undef and poison.

undef is per-bit and precisely preserved in all iN types, so we should be fine here.
poison, however, is per-value: when loading an i64 and any of its bytes is poison, the entire result is poison. That is exactly not what we want for MaybeUninit<u64>. However, at least in current LLVM, poison is only created in very few situations (such as "nowrap" arithmetic that overflows), and AFAIK none of them can happen in a UB-free Rust program -- so, basically "uninit" in Rust only ever corresponds to undef in LLVM, never to poison. (But I might have missed places where LLVM generates posion.)

So I think right now we are good. However, LLVM is slowly moving away from undef and towards posion, since undef is seriously ill-behaved in many ways. And if that ever means that "uninit" in Rust could correspond to LLVM poison, then we have a problem here -- we have to keep monitoring this situation, and it might be good for us to be involved in the relevant LLVM discussions here as well to make sure they are aware of this problem.

Similarly, as we evolve the MIR semantics we have to make sure that no UB-free program can generate poison after compilation to LLVM.

A very elegant solution to this issue would be for LLVM to adopt the "byte type" proposal, however, so far my impression is the LLVM community is not convinced they need such a type. With a byte type, MaybeUninit<u64> could be easily compiled to b64 in LLVM, and a byte type would preserve poison precisely, so we'd be all good.

I am mostly opening this so we have some place to track the current situation, and to make sure everyone agrees on what the main concerns are here -- and to get input from folks with more LLVM experience in case I got some of this wrong.
Cc @rust-lang/wg-unsafe-code-guidelines @rust-lang/wg-llvm

The text was updated successfully, but these errors were encountered:

RalfJung · 2022-07-14T00:01:06Z

Cc llvm/llvm-project#52930

Also see this LLVM thread.

RalfJung · 2024-02-19T09:48:01Z

There was another LLVM discussion early last year, re-affirming this point:

Similarly, as we evolve the MIR semantics we have to make sure that no UB-free program can generate poison after compilation to LLVM.

That is not great since it means we can't have "delayed UB (via poison)" in Rust at all even when LLVM has it.

RalfJung · 2024-07-04T10:12:52Z

See llvm/llvm-project#96631, specifically this comment by @nikic

However, I believe that all frontend-generated arguments/returns/loads can be nopoison, because Rust does not expose poison in its semantics (and I think this holds for pretty much all other frontends as well).

So right now it looks like LLVM is increasingly homing in on "frontends can't use poison for their own semantics if they want to have good codegen", which is unfortunate. (But still better than unsound codegen!)

See here for one example of a discussion where people asked for having poison semantics in Rust. The usual suspects that wants this are sound wrappers around fast-math operations, and SIMD wrappers where some lanes may become poison.

To be clear, "let's first get LLVM into a sound state, remove undef, and then figure out how to let frontends use poison" seems like a viable strategy to me. I'd just hate for us to end up in a corner where letting frontends use poison is near impossible. After all frontends can use undef, so if undef is gone and poison can't be used, that's quite the regression.

RalfJung mentioned this issue Feb 27, 2022

const-eval: load of partially initialized scalar produces entirely uninitialized result #69488

Closed

RalfJung mentioned this issue Jul 14, 2022

Tracking issue for RFC 2645, "Transparent Unions" (formerly: and Enums) #60405

Open

3 tasks

RalfJung mentioned this issue Apr 5, 2024

RFC: Add freeze intrinsic and related library functions rust-lang/rfcs#3605

Open

RalfJung mentioned this issue Jul 4, 2024

[InstSimplify] Fix incorrect poison propagation when folding phi llvm/llvm-project#96631

Open

RalfJung changed the title ~~Make sure our handling of partially initialized values is compatible with LLVM~~ Make sure our handling of partially initialized values is compatible with LLVM / We cannot use LLVM poison semantics Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure our handling of partially initialized values is compatible with LLVM / We cannot use LLVM poison semantics #94428

Make sure our handling of partially initialized values is compatible with LLVM / We cannot use LLVM poison semantics #94428

RalfJung commented Feb 27, 2022

RalfJung commented Jul 14, 2022 •

edited

Loading

RalfJung commented Feb 19, 2024

RalfJung commented Jul 4, 2024 •

edited

Loading

Make sure our handling of partially initialized values is compatible with LLVM / We cannot use LLVM poison semantics #94428

Make sure our handling of partially initialized values is compatible with LLVM / We cannot use LLVM poison semantics #94428

Comments

RalfJung commented Feb 27, 2022

RalfJung commented Jul 14, 2022 • edited Loading

RalfJung commented Feb 19, 2024

RalfJung commented Jul 4, 2024 • edited Loading

RalfJung commented Jul 14, 2022 •

edited

Loading

RalfJung commented Jul 4, 2024 •

edited

Loading