Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM loop optimization can make safe programs crash #28728

Open
RalfJung opened this issue Sep 29, 2015 · 79 comments
Open

LLVM loop optimization can make safe programs crash #28728

RalfJung opened this issue Sep 29, 2015 · 79 comments

Comments

@RalfJung
Copy link
Member

@RalfJung RalfJung commented Sep 29, 2015

The following snippet crashes when compiled in release mode on current stable, beta and nightly:

enum Null {}

fn foo() -> Null { loop { } }

fn create_null() -> Null {
    let n = foo();

    let mut i = 0;
    while i < 100 { i += 1; }
    return n;
}

fn use_null(n: Null) -> ! {
    match n { }
}


fn main() {
    use_null(create_null());
}

https://play.rust-lang.org/?gist=1f99432e4f2dccdf7d7e&version=stable

This is based on the following example of LLVM removing a loop that I was made aware of: https://github.com/simnalamburt/snippets/blob/12e73f45f3/rust/infinite.rs.
What seems to happen is that since C allows LLVM to remove endless loops that have no side-effect, we end up executing a match that has to arms.

@steveklabnik steveklabnik added the A-LLVM label Sep 29, 2015
@ranma42

This comment has been minimized.

Copy link
Contributor

@ranma42 ranma42 commented Sep 29, 2015

The LLVM IR of the optimised code is

; Function Attrs: noreturn nounwind readnone uwtable
define internal void @_ZN4main20h5ec738167109b800UaaE() unnamed_addr #0 {
entry-block:
  unreachable
}

This kind of optimisation breaks the main assumption that should normally hold on uninhabited types: it should be impossible to have a value of that type.
rust-lang/rfcs#1216 proposes to explicitly handle such types in Rust. It might be effective in ensuring that LLVM never has to handle them and in injecting the appropriate code to ensure divergence when needed (IIUIC this could be achieved with appropriate attributes or intrinsic calls).
This topic has also been recently discussed in the LLVM mailing list: http://lists.llvm.org/pipermail/llvm-dev/2015-July/088095.html

@alexcrichton

This comment has been minimized.

Copy link
Member

@alexcrichton alexcrichton commented Sep 29, 2015

triage: I-nominated

Seems bad! If LLVM doesn't have a way to say "yes, this loop really is infinite" though then we may just have to sit-and-wait for the upstream discussion to settle.

@ranma42

This comment has been minimized.

Copy link
Contributor

@ranma42 ranma42 commented Sep 29, 2015

A way to prevent infinite loops from being optimised away is to add unsafe {asm!("" :::: "volatile")} inside of them. This is similar to the llvm.noop.sideeffect intrinsic that has been proposed in the LLVM mailing list, but it might prevent some optimisations.
In order to avoid the performance loss and to still guarantee that diverging functions/loops are not optimised away, I believe that it should be sufficient to insert an empty non-optimisable loop (i.e. loop { unsafe { asm!("" :::: "volatile") } }) if uninhabited values are in scope.
If LLVM optimises the code which should diverge to the point that it does not diverge anymore, such loops will ensure that the control flow is still unable to proceed.
In "lucky" case in which LLVM is unable to optimise the diverging code, such loop will be removed by DCE.

@geofft

This comment has been minimized.

Copy link
Contributor

@geofft geofft commented Sep 29, 2015

Is this related to #18785? That one's about infinite recursion to be UB, but it sounds like the fundamental cause might be similar: LLVM doesn't consider not halting to be a side effect, so if a function has no side effects other than not halting, it's happy to optimize it away.

@arielb1

This comment has been minimized.

Copy link
Contributor

@arielb1 arielb1 commented Sep 29, 2015

@geofft

It's the same issue.

@RalfJung

This comment has been minimized.

Copy link
Member Author

@RalfJung RalfJung commented Sep 29, 2015

Yes, looks like it's the same. Further down that issue, they show how to get undef, from which I assume it's not hard to make a (seemingly safe) program crash.

@simnalamburt

This comment has been minimized.

Copy link
Contributor

@simnalamburt simnalamburt commented Sep 29, 2015

👍

@ranma42

This comment has been minimized.

Copy link
Contributor

@ranma42 ranma42 commented Sep 29, 2015

@bluss bluss added the I-wrong label Sep 29, 2015
@nikomatsakis

This comment has been minimized.

Copy link
Contributor

@nikomatsakis nikomatsakis commented Oct 1, 2015

So I've been wondering how long until somebody reports this. :) In my opinion, the best solution would of course be if we could tell LLVM not to be so aggressive about potentially infinite loops. Otherwise, the only thing I think we can do is to do a conservative analysis in Rust itself that determines whether:

  1. the loop will terminate OR
  2. the loop will have side-effects (I/O operations etc, I forget precisely how this is defined in C)

Either of this should be enough to avoid undefined behavior.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

@nikomatsakis nikomatsakis commented Oct 1, 2015

triage: P-medium

We'd like to see what LLVM will do before we invest a lot of effort on our side, and this seems relatively unlikely to cause problems in practice (though I have personally hit this while developing the compiler as well). There are no backwards incomatibility issues to be concerned about.

@rust-highfive rust-highfive added P-medium and removed I-nominated labels Oct 1, 2015
@dotdash

This comment has been minimized.

Copy link
Contributor

@dotdash dotdash commented Oct 1, 2015

Quoting from the LLVM mailing list discussion:

 The implementation may assume that any thread will eventually do one of the following:
   - terminate
   - make a call to a library I/O function
   - access or modify a volatile object, or
   - perform a synchronization operation or an atomic operation

 [Note: This is intended to allow compiler transformations such as removal of empty loops, even
  when termination cannot be proven. — end note ]
@ranma42

This comment has been minimized.

Copy link
Contributor

@ranma42 ranma42 commented Oct 2, 2015

@dotdash The excerpt you are quoting comes from the C++ specification; it is basically the answer to "how it [having side effects] is defined in C" (also confirmed by the standard committee: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1528.htm ).

Regarding what is the expected behaviour of the LLVM IR there is some confusion. https://llvm.org/bugs/show_bug.cgi?id=24078 shows that there seems to be no accurate & explicit specification of the semantics of infinite loops in LLVM IR. It aligns with the semantics of C++, most likely for historical reasons and for convenience (I only managed to track down https://groups.google.com/forum/#!topic/llvm-dev/j2vlIECKkdE which apparently refers to a time when infinite loops were not optimised away, some time before the C/C++ specs were updated to allow it).

From the thread it is clear that there is the desire to optimise C++ code as effectively as possible (i.e. also taking into account the opportunity to remove infinite loops), but in the same thread several developers (including some that actively contribute to LLVM) have shown interest in the ability to preserve infinite loops, as they are needed for other languages.

@dotdash

This comment has been minimized.

Copy link
Contributor

@dotdash dotdash commented Oct 2, 2015

@ranma42 I'm aware of that, I just quoted that for reference, because one possibility to work-around this would be to detect such loops in rust and add one of the above to it to stop LLVM from performing this optimization.

@bstrie

This comment has been minimized.

Copy link
Contributor

@bstrie bstrie commented Nov 30, 2015

Is this a soundness issue? If so, we should tag it as such.

@bluss

This comment has been minimized.

Copy link
Member

@bluss bluss commented Nov 30, 2015

Yes, following @ranma42's example, this way shows how it readily defeats array bounds checks. playground link

@bluss bluss added I-unsound 💥 and removed I-wrong labels Nov 30, 2015
@arielb1 arielb1 added I-wrong and removed I-unsound 💥 labels Dec 2, 2015
@arielb1

This comment has been minimized.

Copy link
Contributor

@arielb1 arielb1 commented Dec 2, 2015

@bluss

The policy is that wrong-code issues that are also soundness issues (i.e. most of them) should be tagged I-wrong.

@brson brson added I-unsound 💥 E-hard and removed E-hard labels Aug 4, 2016
sfanxiang added a commit to sfanxiang/rust that referenced this issue Mar 31, 2019
LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

rust-lang#28728
bors added a commit that referenced this issue Apr 1, 2019
Add llvm.sideeffect to potential infinite loops and recursions

LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

#28728
sfanxiang added a commit to sfanxiang/rust that referenced this issue Apr 1, 2019
LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

rust-lang#28728
bors added a commit that referenced this issue Apr 1, 2019
Add llvm.sideeffect to potential infinite loops and recursions

LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

#28728
@Vurich

This comment has been minimized.

Copy link

@Vurich Vurich commented May 8, 2019

Would it make sense to have an unsafe intrinsic that states that a given loop, recursion, ... always terminates?

std::hint::reachable_unchecked?

sfanxiang added a commit to sfanxiang/rust that referenced this issue Jun 4, 2019
LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

rust-lang#28728
sfanxiang added a commit to sfanxiang/rust that referenced this issue Jun 4, 2019
LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

rust-lang#28728
sfanxiang added a commit to sfanxiang/rust that referenced this issue Jun 19, 2019
LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

rust-lang#28728
@rsimmonsjr

This comment has been minimized.

Copy link

@rsimmonsjr rsimmonsjr commented Aug 26, 2019

Incidentally I ran into this writing real code for a TCP message system. I had an infinite loop as a stopgap until I put in a real mechanism for stopping but the thread exited immediately.

mati865 added a commit to mati865/rust that referenced this issue Sep 7, 2019
LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

rust-lang#28728
@ExpHP

This comment has been minimized.

Copy link
Contributor

@ExpHP ExpHP commented Sep 17, 2019

In case anybody wanted to play test case code golf:

fn main() {
    (|| loop {})()
}
$ cargo run --release
Illegal instruction (core dumped)
bors pushed a commit that referenced this issue Oct 10, 2019
LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

#28728
bors added a commit that referenced this issue Oct 10, 2019
Add llvm.sideeffect to potential infinite loops and recursions

LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

#28728

**UPDATE:** [Mentoring instructions here](#59546 (comment)) to unstall this PR
XiangQingW added a commit to XiangQingW/rust that referenced this issue Oct 13, 2019
LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

rust-lang#28728
yvt added a commit to yvt/Stella2 that referenced this issue Oct 15, 2019
The old implementation ended up in infinite recursion, which was somehow
optimized into no-op by the compiler (probably an instance of
<rust-lang/rust#28728>). This problematic code
was discovered by a compiler warning.
@memoryruins

This comment has been minimized.

Copy link
Contributor

@memoryruins memoryruins commented Oct 15, 2019

In case anybody wanted to play test case code golf:

pub fn main() {
   (|| loop {})()
}

With the -Z insert-sideeffect rustc flag, added by @sfanxiang in #59546, it keeps on looping :)

before:

main:
  ud2

after:

main:
.LBB0_1:
  jmp .LBB0_1
choller added a commit to choller/rust that referenced this issue Oct 17, 2019
LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

rust-lang#28728
@bstrie

This comment has been minimized.

Copy link
Contributor

@bstrie bstrie commented Oct 21, 2019

By the way, the LLVM bug tracking this is https://bugs.llvm.org/show_bug.cgi?id=965 , which I haven't seen posted yet in this thread.

@shepmaster

This comment has been minimized.

Copy link
Member

@shepmaster shepmaster commented Oct 21, 2019

which I haven't seen posted yet in this thread.

#28728 (comment) and #28728 (comment)

@simnalamburt

This comment has been minimized.

Copy link
Contributor

@simnalamburt simnalamburt commented Nov 8, 2019

@RalfJung Can you update the hyperlink https://github.com/simnalamburt/snippets/blob/master/rust/src/bin/infinite.rs in the issue description into https://github.com/simnalamburt/snippets/blob/12e73f45f3/rust/infinite.rs this? The former hyperlink were broken for a long time since in was not a permalink. Thanks! 😛

@RalfJung

This comment has been minimized.

Copy link
Member Author

@RalfJung RalfJung commented Nov 8, 2019

@simnalamburt done, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.