Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading uninitialized memory must be poison, not undef #52930

Open
1 of 11 tasks
nunoplopes opened this issue Dec 30, 2021 · 10 comments
Open
1 of 11 tasks

Reading uninitialized memory must be poison, not undef #52930

nunoplopes opened this issue Dec 30, 2021 · 10 comments
Labels
bug clang:codegen llvm:core miscompilation

Comments

@nunoplopes
Copy link
Member

nunoplopes commented Dec 30, 2021

Currently we consider loads of uninitialized memory to be undef. This is both sub-optimal and a source of miscompilations.
For example, instsimplify & newgvn convert phi(X, undef) -> X. This is wrong as X might be poison. We had the same issue with select but it has already been fixed.
Fixing this bug is related with uninit memory because sroa/mem2reg create phis with undef. And we need to be able to fold those away.

We have a patch for langref, sroa, mem2reg here: https://reviews.llvm.org/D104648
I have patches locally for InstSimplify & NewGVN to fix the phi issue mentioned above.

Changing the semantics of load of uninit memory breaks compatibility. Although most users won't notice, we would need to interface with the major frontends.
For clang, we would need to fix codegen of bit-fields as at the least the first field store loads uninit memory before masking it. The easy way is to just freeze the loaded value. Spill-over of poison from different fields is not an issue as if one field becomes poison is because some UB already happened in the program so we can spill over poison to neighboring fields.
Bitfields aren't very common, so this shouldn't be a major issue.

On LLVM's side, I believe the only issue is load widening. This must be either be removed at IR level, or we need to use vectors (or the byte type) for the widened type.

In summary, todo:

  • Change clang's codegen of bit-field loads to do a freeze of the loaded value
  • Investigate if we need new instcombine patterns to remove freezes from bitfields
  • Announce breaking changes to IR semantics on the mailing list for the following LLVM version
  • Change semantics: commit LangRef patch
  • Add auto-upgrade for old IR: load -> load + freeze
  • Change load widening to use vectors
  • Commit SROA + mem2reg patches
  • Commit InstSimplify patch
  • Commit NewGVN patch
  • Revert 1881711
  • Celebrate? One step closer to removing undef 🥂

cc @aqjune @regehr @nikic @rotateright @LebedevRI @alinas

@nunoplopes nunoplopes added bug clang:codegen llvm:core miscompilation and removed new issue labels Dec 30, 2021
@nikic
Copy link
Contributor

nikic commented Dec 30, 2021

Generally sounds good, but I believe we also need to auto-upgrade existing bitcode by replacing all loads with freeze(load), which will significantly pessimize the ability to optimize old bitcode.

@nunoplopes
Copy link
Member Author

nunoplopes commented Dec 30, 2021

Generally sounds good, but I believe we also need to auto-upgrade existing bitcode by replacing all loads with freeze(load), which will significantly pessimize the ability to optimize old bitcode.

Right, forgot that. Added to the todo list, thanks!

@aqjune
Copy link
Contributor

aqjune commented Dec 31, 2021

Sounds like a great plan!
AFAIR load widening already exists in SelDag and/or MIR, and alias analysis should still work well without load widening, so it won't greatly affect performance (my hope).
The old GVN has the load widening; should we remove it?

@nunoplopes
Copy link
Member Author

nunoplopes commented Jan 29, 2022

NewGVN patch fixed in f1c18ac
Started with this one as is low risk for perf regressions as it's not enabled in the default pipeline.

@nunoplopes
Copy link
Member Author

nunoplopes commented Mar 10, 2022

See also: https://reviews.llvm.org/D89050
Introduced !noundef on loads. Is that sufficient? Is that a good stepping stone on the way to change semantics?

(just for documentation so we don't forget)

@aqjune
Copy link
Contributor

aqjune commented Mar 11, 2022

Clang frontend must be able to use the !noundef to annotate lvalues that must be initialized.
This will reduce the cost of using freeze, and be helpful for poison-to-undef switch because loading undef anyway becomes UB.

@RalfJung
Copy link

RalfJung commented Apr 21, 2022

On LLVM's side, I believe the only issue is load widening. This must be either be removed at IR level, or we need to use vectors (or the byte type) for the widened type.

Or poison might be changed to work per-bit, more like undef does, which would make load widening legal?

@nunoplopes
Copy link
Member Author

nunoplopes commented Apr 21, 2022

On LLVM's side, I believe the only issue is load widening. This must be either be removed at IR level, or we need to use vectors (or the byte type) for the widened type.

Or poison might be changed to work per-bit, more like undef does, which would make load widening legal?

That's a huge can of worms. It's hard to define the semantics of arithmetic operations with bitwise poison semantics. Several people have tried in the past and failed.
So I have to consider bitwise poison to be impossible until there's a concrete proposal.

@regehr
Copy link
Contributor

regehr commented Apr 21, 2022

That's a huge can of worms. It's hard to define the semantics of arithmetic operations with bitwise poison semantics. Several people have tried in the past and failed. So I have to consider bitwise poison to be impossible until there's a concrete proposal.

yeah I don't think we've ever seen a proposal along these lines that resulted in rules that were comprehensible in practice to the target audience, people trying to reason about optimizations. my take on what I saw was that it was so complicated that one would simply need to blindly trust alive or whatever other automated checking tool people write

nikic added a commit that referenced this issue Apr 29, 2022
This removes memset with undef char. We already do this for stores
of undef value.

This comes with the caveat that this optimization is not, strictly
speaking, legal for undef values, because we might be overwriting
a poison value. However, our entire load/store model currently still
operates on undef values, so we need to support undef here as well
for internal consistency.

Once #52930 is resolved,
these and related folds can be limited to poison -- I've added
FIXMEs to that effect.

Differential Revision: https://reviews.llvm.org/D124173
@RalfJung
Copy link

RalfJung commented Apr 30, 2022

That's a huge can of worms. It's hard to define the semantics of arithmetic operations with bitwise poison semantics. Several people have tried in the past and failed.

Are these previously considered proposals and their issues summarized somewhere?

So I have to consider bitwise poison to be impossible until there's a concrete proposal.

FWIW https://discourse.llvm.org/t/a-memory-model-for-llvm-ir-supporting-limited-type-punning/61948 proposes bitwise poison where for arithmetic, if any input bit is poison, the full output is poison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug clang:codegen llvm:core miscompilation
Projects
None yet
Development

No branches or pull requests

5 participants