Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add llvm.sideeffect to potential infinite loops and recursions #59546

Open
wants to merge 1 commit into
base: master
from

Conversation

@sfanxiang
Copy link

commented Mar 30, 2019

LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like loop {}.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

#28728

UPDATE: Mentoring instructions here to unstall this PR

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Mar 30, 2019

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @michaelwoerister (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@sfanxiang sfanxiang force-pushed the sfanxiang:interminable-ub branch from 1d1e574 to b3848ce Mar 30, 2019

@nagisa
Copy link
Contributor

left a comment

Thanks for the PR!

The outcome here is about exactly what I had expected when to me it occurred that llvm.sideeffect is not a great workaround. I do not see any particularly strong issues with the implementation itself, so r=me on that.

The question is whether we are fine with regressing (at times seriously) the code quality for just about everything else in exchange for solving our most notable longstanding codegen issue, @rust-lang/compiler?

@@ -6,8 +6,8 @@
#[no_mangle]
pub fn issue_34947(x: i32) -> i32 {
// CHECK: mul
// CHECK-NEXT: mul
// CHECK-NEXT: mul
// CHECK-NEXT: ret

This comment has been minimized.

Copy link
@nagisa

nagisa Mar 30, 2019

Contributor

This is a regression test to ensure that pow(<constant>) would unroll the loop properly. This change regresses that and the test adjusted to ignore its original intent.

This comment has been minimized.

Copy link
@nagisa

nagisa Mar 30, 2019

Contributor

See #34947

This comment has been minimized.

Copy link
@sfanxiang

sfanxiang Mar 31, 2019

Author

I don't think pow(<constant>) regresses in this case. The codegen simply inserts a bunch of @llvm.sideeffect in between mul so we can't do CHECK-NEXT. The resulting code should be no different.

This comment has been minimized.

Copy link
@nagisa

nagisa Mar 31, 2019

Contributor

Can we then CHECK-NOT for branches between the multiply instructions?

This comment has been minimized.

Copy link
@sfanxiang
@@ -14,6 +14,6 @@ pub fn helper(_: usize) {
// CHECK-LABEL: @repeat_take_collect
#[no_mangle]
pub fn repeat_take_collect() -> Vec<u8> {
// CHECK: call void @llvm.memset.p0i8.[[USIZE]](i8* {{(nonnull )?}}align 1 %{{[0-9]+}}, i8 42, [[USIZE]] 100000, i1 false)

This comment has been minimized.

Copy link
@nagisa

nagisa Mar 30, 2019

Contributor

Is this regressing and now the buffer size given to the intrinsic is not constant anymore?

This comment has been minimized.

Copy link
@sfanxiang

sfanxiang Mar 31, 2019

Author

Not exactly. IDK why but we get a single store and then memset with size = 99999 here.

This comment has been minimized.

Copy link
@nagisa

nagisa Mar 31, 2019

Contributor

Fair, can the test check for this specific pattern please? With {{.*}} it is possible to regress to, say, 100000 memsets all with 1 byte size without noticing.

This comment has been minimized.

Copy link
@sfanxiang
@@ -1,3 +1,5 @@
// ignore-test LLVM can't prove that these loops terminate.

This comment has been minimized.

Copy link
@nagisa

nagisa Mar 30, 2019

Contributor

This is a regression as well, right? Not great that even trivial examples like these regress...

This comment has been minimized.

Copy link
@nagisa

nagisa Mar 30, 2019

Contributor

See #45222

Show resolved Hide resolved src/test/run-pass/non-terminate/infinite-loop.rs Outdated
Show resolved Hide resolved src/test/run-pass/non-terminate/infinite-recursion.rs Outdated
@nagisa

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2019

@bors try

Lets do a perf run.

@bors

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2019

⌛️ Trying commit b3848ce with merge 2d35e03...

bors added a commit that referenced this pull request Mar 30, 2019

Auto merge of #59546 - sfanxiang:interminable-ub, r=<try>
Add llvm.sideeffect to potential infinite loops and recursions

LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

#28728
@bors

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2019

☀️ Try build successful - checks-travis
Build commit: 2d35e03

@nagisa

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2019

@rust-timer

This comment has been minimized.

Copy link

commented Mar 30, 2019

Success: Queued 2d35e03 with parent 709b72e, comparison URL.

@rust-timer

This comment has been minimized.

Copy link

commented Mar 31, 2019

Finished benchmarking try commit 2d35e03

@sfanxiang sfanxiang force-pushed the sfanxiang:interminable-ub branch 4 times, most recently from 39e58a6 to 0ea0e06 Mar 31, 2019

@sfanxiang

This comment has been minimized.

Copy link
Author

commented Mar 31, 2019

@nagisa Looking closer at perf, some benchmarks (e.g. keccak) spend quite some time in is_predecessor_of. So I replaced it with a simpler comparison of block index, assuming mir always generates in sequential order. If the assumption is false, the codegen would still be correct but the generated code would be less performant.

@nagisa

This comment has been minimized.

Copy link
Contributor

commented Mar 31, 2019

Alas, the blocks are not required to be seuential by the time codegen happens.

If the assumption is false, the codegen would still be correct but the generated code would be less performant.

I’m not sure I see how: if we have start -> bb10 (loop head) -> bb1 (loop body) -> bb10 then the sideeffect would not get generated at all, would it?

@sfanxiang

This comment has been minimized.

Copy link
Author

commented Apr 1, 2019

@nagisa
By sequential I mean, as long as there isn't a loop, the blocks should always execute in strictly increasing index (bb1 -> bb2, etc.) order. And we generate @llvm.sideeffect when we see equal or decreasing target index (e.g. bb2 -> bb1).

Let's suppose @llvm.sideeffect is not generated, which means all branches go to a strictly increasing index. Because the index is strictly increasing it's impossible to go back to a visited block, therefore can't form a loop. In other words, if @llvm.sideeffect isn't generated, there's no loop, regardless of the sequential assumption.

Now what if the assumption is false? That means the index goes back where there isn't a loop. In that case, an extra @llvm.sideeffect will be generated even when it's not needed, which hurts performance but not correctness. If the assumption is false, and if there's no loop, @llvm.sideeffect may still be generated.

And if the assumption is true, @llvm.sideeffect will be generated when and only when loop exists. These are only for blocks though. Functions always get a sideeffect.

Alas, the blocks are not required to be seuential by the time codegen happens.

I realize it's not required, but I couldn't find where rustc doesn't follow this assumption. Could you give me an example code where sequential code isn't generated sequentially when converted to mir?

I’m not sure I see how: if we have start -> bb10 (loop head) -> bb1 (loop body) -> bb10 then the sideeffect would not get generated at all, would it?

It will be generated in bb10 before branching to bb1.

@nagisa

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

I see. Well, I guess we do another perf run to see how it fares this time around and perhaps it would be good to collect some benchmarks as well, although I’m not sure of what. Then we can just wait for the decision from the team meeting.

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

⌛️ Trying commit 0ea0e06 with merge ef94533...

bors added a commit that referenced this pull request Apr 1, 2019

Auto merge of #59546 - sfanxiang:interminable-ub, r=<try>
Add llvm.sideeffect to potential infinite loops and recursions

LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

#28728
@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

☀️ Try build successful - checks-travis
Build commit: ef94533

@oli-obk

This comment was marked as outdated.

Copy link
Contributor

commented Apr 1, 2019

@bors rust-timer build ef94533

@oli-obk

This comment was marked as outdated.

Copy link
Contributor

commented Apr 1, 2019

@rust-timer

This comment was marked as outdated.

Copy link

commented Apr 1, 2019

Please provide the full 40 character commit hash.

@oli-obk

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

@pnkfelix

This comment has been minimized.

Copy link
Member

commented May 29, 2019

I can work on gathering benchmark data, but not until Monday.

@pnkfelix

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

(fyi, the codegen tests fail on my linux box for this PR...)

e.g.:

---- [codegen] codegen/vec-clear.rs stdout ----

error: verification with 'FileCheck' failed
status: exit code: 1
command: "/home/pnkfelix/Dev/Mozilla/issue59546/rust-59546-pull/objdir-opt/build/x86_64-unknown-linux-gnu/llvm/build/bin/FileChec\
k" "--input-file" "/home/pnkfelix/Dev/Mozilla/issue59546/rust-59546-pull/objdir-opt/build/x86_64-unknown-linux-gnu/test/codegen/v\
ec-clear/vec-clear.ll" "/home/pnkfelix/Dev/Mozilla/issue59546/rust-59546-pull/src/test/codegen/vec-clear.rs"
stdout:
------------------------------------------

------------------------------------------
stderr:
------------------------------------------
/home/pnkfelix/Dev/Mozilla/issue59546/rust-59546-pull/src/test/codegen/vec-clear.rs:9:16: error: CHECK-NOT: excluded string found\
 in input
 // CHECK-NOT: load
               ^
/home/pnkfelix/Dev/Mozilla/issue59546/rust-59546-pull/objdir-opt/build/x86_64-unknown-linux-gnu/test/codegen/vec-clear/vec-clear.\
ll:16:7: note: found here
 %1 = load i64, i64* %0, align 8
      ^~~~

------------------------------------------

@sfanxiang sfanxiang force-pushed the sfanxiang:interminable-ub branch from 88b0130 to 8d00154 Jun 4, 2019

@sfanxiang

This comment has been minimized.

Copy link
Author

commented Jun 4, 2019

@pnkfelix Thanks, I somehow missed these. Should be fixed now.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Jun 4, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:02769acb:start=1559644181080313347,finish=1559644269695394326,duration=88615080979
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:04:16] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:04:16] tidy error: /checkout/src/test/codegen/naked-functions.rs: ignoring line length unnecessarily
[00:04:23] some tidy checks failed
[00:04:23] 
[00:04:23] 
[00:04:23] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:04:23] 
[00:04:23] 
[00:04:23] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:04:23] Build completed unsuccessfully in 0:01:20
---
travis_time:end:1c24c512:start=1559644543354462294,finish=1559644543359667202,duration=5204908
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0db61978
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:01571c77
travis_time:start:01571c77
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:028447d7
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@sfanxiang sfanxiang force-pushed the sfanxiang:interminable-ub branch from 8d00154 to b0ce620 Jun 4, 2019

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Jun 4, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:10eaf3ec:start=1559651554285793499,finish=1559651641038788450,duration=86752994951
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@sfanxiang sfanxiang force-pushed the sfanxiang:interminable-ub branch from b0ce620 to 1b6b480 Jun 6, 2019

@bors

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2019

☔️ The latest upstream changes (presumably #59625) made this pull request unmergeable. Please resolve the merge conflicts.

Add llvm.sideeffect to potential infinite loops and recursions
LLVM assumes that a thread will eventually cause side effect. This is
not true in Rust if a loop or recursion does nothing in its body,
causing undefined behavior even in common cases like `loop {}`.
Inserting llvm.sideeffect fixes the undefined behavior.

As a micro-optimization, only insert llvm.sideeffect when jumping back
in blocks or calling a function.

A patch for LLVM is expected to allow empty non-terminate code by
default and fix this issue from LLVM side.

#28728

@sfanxiang sfanxiang force-pushed the sfanxiang:interminable-ub branch from 1b6b480 to caf0943 Jun 19, 2019

@pnkfelix

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

(still haven't gathered benchmark data; other matters, largely related to upcoming release, have been more pressing)

@pnkfelix

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

(I'll be going on leave for two months, so I'm going to re-assign this to @nagisa with the hopes that they will find someone else to take charge on gathering benchmark data.)

@pnkfelix pnkfelix assigned nagisa and unassigned pnkfelix Jul 10, 2019

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

commented Jul 25, 2019

Dear world,

This PR is an attempt to close a long-standing soundness problem, where LLVM mis-optimizes infinite loops. The problem is that we need to measure its effect on the performance of generated code. There are some good instructions for how to do this right here -- all it takes is someone to do it and post the results! Perhaps that someone could be you?

@mati865

This comment has been minimized.

Copy link
Contributor

commented Jul 25, 2019

I'll see if it finishes in reasonable time on 2700X box, ping me if I don't respond by the Monday.

@nikic

This comment has been minimized.

Copy link
Contributor

commented Jul 25, 2019

Because building rust with & without this PR is probably a huge blocker for anyone who wants to do some quick testing, it's possible to get prebuilt binaries:

rustup-toolchain-install-master 4cb14446465d8d3cabfb3706f424a2938628b9f6 -n with-sideeffect
rustup-toolchain-install-master eab3eb38df8dca99110b6149b3a15deeb4ef0413 -n without-sideeffect
@mati865

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2019

Because building rust with & without this PR is probably a huge blocker for anyone who wants to do some quick testing

I think it's not that big deal, clean build of this PR rebased on master took less than 30 minutes. I'm rather worried about the time all the benchmarks are going to take.

The blocker right now is broken lolbench: anp/lolbench#69

@nagisa

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2019

I have seen that issue in the past, it is caused by an outdated dependency. Please do a cargo update and try again.

@mati865

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2019

With nagisa tip I was able to finish molly runner for both toolchain (it took 2 hours for each) but the instructions are incomplete. It only gave me raw data stored in .json files.
The only generated metric is nanoseconds (guessing by the name it will be noisy). I tried to generate webpage for it after patching out other metrics like instructions, cpu-cycles but generated graphs are empty.

I can upload this raw data but I won't spend any more time on this unless proper instructions are added.

@sfanxiang

This comment has been minimized.

Copy link
Author

commented Jul 27, 2019

@mati865 echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid before benchmarking should give you perf metrics. (https://github.com/anp/perf_events)

Does anyone know how to properly generate the website? I can write a custom one if provided what data are of interest.

@@ -340,6 +362,7 @@ impl<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>> FunctionCx<'a, 'tcx, Bx> {
FnType::of_instance(&bx, &drop_fn))
}
};
bx.sideeffect();

This comment has been minimized.

Copy link
@nikic

nikic Aug 4, 2019

Contributor

Would it be possible to insert this on function entry, rather than on every call?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2019

Check-in from compiler triage meeting:

Hey @mati865! Many thanks for gathering that data. Is the raw data still available?

@sfanxiang, I don't know how to properly generate the website, but I'd certainly be happy with a one-off measurement at this point!

@mati865

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2019

@nikomatsakis the only gethered metrics is nanoseconds and it's very noisy.
I'll try to find time this or next weekend to get more useful data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.