Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent compiler stack overflow for deeply recursive code #55617

Open
wants to merge 9 commits into
base: master
from

Conversation

@oli-obk
Copy link
Contributor

commented Nov 2, 2018

I was unable to write a test that

  1. runs in under 1s
  2. overflows on my machine without this patch

The following reproduces the issue, but I don't think it's sensible to include a test that takes 30s to compile. We can now easily squash newly appearing overflows by the strategic insertion of calls to ensure_sufficient_stack.

// compile-pass

#![recursion_limit="1000000"]

macro_rules! chain {
    (EE $e:expr) => {$e.sin()};
    (RECURSE $i:ident $e:expr) => {chain!($i chain!($i chain!($i chain!($i $e))))};
    (Z $e:expr) => {chain!(RECURSE EE $e)};
    (Y $e:expr) => {chain!(RECURSE Z $e)};
    (X $e:expr) => {chain!(RECURSE Y $e)};
    (A $e:expr) => {chain!(RECURSE X $e)};
    (B $e:expr) => {chain!(RECURSE A $e)};
    (C $e:expr) => {chain!(RECURSE B $e)};
    // causes overflow on x86_64 linux
    // less than 1 second until overflow on test machine
    // after overflow has been fixed, takes 30s to compile :/
    (D $e:expr) => {chain!(RECURSE C $e)};
    (E $e:expr) => {chain!(RECURSE D $e)};
    (F $e:expr) => {chain!(RECURSE E $e)};
    // more than 10 seconds
    (G $e:expr) => {chain!(RECURSE F $e)};
    (H $e:expr) => {chain!(RECURSE G $e)};
    (I $e:expr) => {chain!(RECURSE H $e)};
    (J $e:expr) => {chain!(RECURSE I $e)};
    (K $e:expr) => {chain!(RECURSE J $e)};
    (L $e:expr) => {chain!(RECURSE L $e)};
}


fn main() {
    let x = chain!(D 42.0_f32);
}

fixes #55471
fixes #41884
fixes #40161
fixes #34844
fixes #32594

cc @alexcrichton @rust-lang/compiler

I looked at all code that checks the recursion limit and inserted stack growth calls where appropriate.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Nov 2, 2018

r? @michaelwoerister

(rust_highfive has picked a reviewer for you, use r? to override)

@nagisa

This comment was marked as resolved.

Copy link
Contributor

commented Nov 2, 2018

With this, lets drop the default stack size down from 16MB to something more reasonable (8MB or even 4MB).

https://github.com/rust-lang/rust/blob/master/src/librustc_driver/driver.rs#L94

https://github.com/rust-lang/rust/blob/master/src/librustc_driver/lib.rs#L1473

With that change and reduced growth factor (i.e. growing stack by something like 1MB rather than 16MB) crater should be very capable at answering whether we put the guarantee_one_mb_stack_left in all the necessary places. After such a crater run we can bump the growth factor back to a larger value.

@rust-highfive

This comment was marked as resolved.

Copy link
Collaborator

commented Nov 2, 2018

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:04179d74:start=1541172590903696106,finish=1541172592062927090,duration=1159230984
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#Pull-Requests-and-Security-Restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-5.0
---
[00:04:05] * stacker 
[00:04:05] some tidy checks failed
[00:04:05] 
[00:04:05] 
[00:04:05] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:04:05] 
[00:04:05] 
[00:04:05] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:04:05] Build completed unsuccessfully in 0:00:48
[00:04:05] Build completed unsuccessfully in 0:00:48
[00:04:05] make: *** [tidy] Error 1
[00:04:05] Makefile:79: recipe for target 'tidy' failed

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:11008ef8
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
---
travis_time:end:12b4f8f4:start=1541172848851444780,finish=1541172848857785787,duration=6341007
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:021dd4b6
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:199160e8
travis_time:start:199160e8
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:30b90c34
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@@ -34,6 +34,7 @@ byteorder = { version = "1.1", features = ["i128"]}
chalk-engine = { version = "0.8.0", default-features=false }
rustc_fs_util = { path = "../librustc_fs_util" }
smallvec = { version = "0.6.5", features = ["union"] }
stacker = "0.1.3"

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Nov 2, 2018

Member

I'm not sure stacker is necessarily rustc-ready right now, but that doesn't mean that it can't be! Some issues I can think of are:

  • It only has support for x86 platforms basically, and only Windows/Mac/Linux. It should be easy enough to "add support" for other platforms by basically doing nothing. Full support could be added over time as necessary
  • I don't think stacker does anything with guard pages, but ideally it'd also be sure to allocate guard pages for larger segments to protect agains accidental stack overflow
  • I'm not entirely sure how well panics and such work? It should be relatively easy to catch_unwind and resume_unwind though when necessary (just needs to be done)

These are all pretty minor, but I'd want to be sure to handle them before merging if possible!

src/librustc/ty/query/plumbing.rs Outdated Show resolved Hide resolved
@@ -1460,6 +1460,8 @@ fn parse_crate_attrs<'a>(sess: &'a Session, input: &Input) -> PResult<'a, Vec<as
}
}

const STACK_SIZE: usize = 4 * 1024 * 1024; // 4MB

This comment has been minimized.

Copy link
@eddyb

eddyb Nov 3, 2018

Member

Maybe make this dependent on whether stacker actually works?

This comment has been minimized.

Copy link
@oli-obk

oli-obk Nov 3, 2018

Author Contributor

How's that relevant? When stacker doesn't work, these values don't matter, because they don't do anything

This comment has been minimized.

Copy link
@eddyb

eddyb Nov 3, 2018

Member

I thought this was the default thread stack size, unrelated to stacker. My bad if that's not the case.

This comment has been minimized.

Copy link
@oli-obk

oli-obk Nov 4, 2018

Author Contributor

I'm gonna rename the constant to be clearer about this

This comment has been minimized.

Copy link
@oli-obk

oli-obk Nov 4, 2018

Author Contributor

Oh you're right, I was looking at the wrong value. But this change is just for crater as noted by @nagisa in #55617 (comment)

We'll bump it back after crater succeeds

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

commented Nov 5, 2018

Let's do a perf run.
@bors try

@bors

This comment has been minimized.

Copy link
Contributor

commented Nov 5, 2018

🔒 Merge conflict

This pull request and the master branch diverged in a way that cannot be automatically merged. Please rebase on top of the latest master branch, and let the reviewer approve again.

How do I rebase?

Assuming self is your fork and upstream is this repository, you can resolve the conflict following these steps:

  1. git checkout stacker (switch to your branch)
  2. git fetch upstream master (retrieve the latest master)
  3. git rebase upstream/master -p (rebase on top of it)
  4. Follow the on-screen instruction to resolve conflicts (check git status if you got lost).
  5. git push self stacker --force-with-lease (update this PR)

You may also read Git Rebasing to Resolve Conflicts by Drew Blessing for a short tutorial.

Please avoid the "Resolve conflicts" button on GitHub. It uses git merge instead of git rebase which makes the PR commit history more difficult to read.

Sometimes step 4 will complete without asking for resolution. This is usually due to difference between how Cargo.lock conflict is handled during merge and rebase. This is normal, and you should still perform step 5 to update this PR.

Error message
warning: Cannot merge binary files: src/Cargo.lock (HEAD vs. heads/homu-tmp)
Auto-merging src/librustc_typeck/check/mod.rs
Auto-merging src/librustc_traits/dropck_outlives.rs
Auto-merging src/librustc_mir/monomorphize/collector.rs
Auto-merging src/librustc_driver/lib.rs
Auto-merging src/librustc/traits/select.rs
Auto-merging src/librustc/traits/query/normalize.rs
Auto-merging src/librustc/traits/project.rs
Auto-merging src/librustc/hir/lowering.rs
Auto-merging src/Cargo.lock
CONFLICT (content): Merge conflict in src/Cargo.lock
Automatic merge failed; fix conflicts and then commit the result.

@oli-obk oli-obk force-pushed the oli-obk:stacker branch from 7ecd20f to 9f61d00 Nov 6, 2018

@oli-obk

This comment has been minimized.

Copy link
Contributor Author

commented Nov 6, 2018

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2018

⌛️ Trying commit 9f61d00 with merge 2b10b3d...

bors added a commit that referenced this pull request Nov 6, 2018

Auto merge of #55617 - oli-obk:stacker, r=<try>
Prevent compiler stack overflow for deeply recursive code

I was unable to write a test that

1. runs in under 1s
2. overflows on my machine without this patch

The following reproduces the issue, but I don't think it's sensible to include a test that takes 30s to compile. We can now easily squash newly appearing overflows by the strategic insertion of calls to `ensure_sufficient_stack`.

```rust
// compile-pass

#![recursion_limit="1000000"]

macro_rules! chain {
    (EE $e:expr) => {$e.sin()};
    (RECURSE $i:ident $e:expr) => {chain!($i chain!($i chain!($i chain!($i $e))))};
    (Z $e:expr) => {chain!(RECURSE EE $e)};
    (Y $e:expr) => {chain!(RECURSE Z $e)};
    (X $e:expr) => {chain!(RECURSE Y $e)};
    (A $e:expr) => {chain!(RECURSE X $e)};
    (B $e:expr) => {chain!(RECURSE A $e)};
    (C $e:expr) => {chain!(RECURSE B $e)};
    // causes overflow on x86_64 linux
    // less than 1 second until overflow on test machine
    // after overflow has been fixed, takes 30s to compile :/
    (D $e:expr) => {chain!(RECURSE C $e)};
    (E $e:expr) => {chain!(RECURSE D $e)};
    (F $e:expr) => {chain!(RECURSE E $e)};
    // more than 10 seconds
    (G $e:expr) => {chain!(RECURSE F $e)};
    (H $e:expr) => {chain!(RECURSE G $e)};
    (I $e:expr) => {chain!(RECURSE H $e)};
    (J $e:expr) => {chain!(RECURSE I $e)};
    (K $e:expr) => {chain!(RECURSE J $e)};
    (L $e:expr) => {chain!(RECURSE L $e)};
}

fn main() {
    let x = chain!(D 42.0_f32);
}
```

fixes #55471
fixes #41884
fixes #40161
fixes #34844
fixes #32594

cc @alexcrichton @rust-lang/compiler

I looked at all code that checks the recursion limit and inserted stack growth calls where appropriate.
@bors

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2018

☀️ Test successful - status-travis
State: approved= try=True

@oli-obk

This comment has been minimized.

Copy link
Contributor Author

commented Nov 6, 2018

@rust-timer

This comment has been minimized.

Copy link

commented Nov 6, 2018

Success: Queued 2b10b3d with parent f90aab7, comparison URL.

@rust-timer

This comment has been minimized.

Copy link

commented Nov 6, 2018

Finished benchmarking try commit 2b10b3d

@oli-obk

This comment has been minimized.

Copy link
Contributor Author

commented Nov 6, 2018

Improvements for ctfe stress tests (spurious?), regressions up to 3% for everything else except the clean-incremental part

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2018

Makes sense that the clean-incremental cases don't see a difference since they hardly execute any of the changed code. The other regressions are rather unfortunate though. Can we optimize this?

@oli-obk

This comment has been minimized.

Copy link
Contributor Author

commented Nov 7, 2018

Well, this is an operation we now run on every single query (not every call, just every evaluation). There are loads of queries. It seems logical that this introduces some regression that we can't get rid of.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2018

But

  • do we need to run the operation really for every query invocation?
  • can we make the operation less expensive, especially in the case where the stack doesn't have to be grown?
@eddyb

This comment has been minimized.

Copy link
Member

commented Nov 7, 2018

I feel like the actual stack check shouldn't be noticeable compared to hashing and looking up a key in a hashmap. Maybe we're growing the stack more often than we need to?

@nikic

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2018

Looking at the stacker implementation, a possible issue might be that we're sitting somewhere close to the stack limit and regularly go over it and below it again. This will allocate and deallocate a new stack every time. Maybe retaining the last allocation would help to reduce the performance impact?

@nagisa

This comment has been minimized.

Copy link
Contributor

commented Nov 8, 2018

@rust-timer

This comment has been minimized.

Copy link

commented Dec 13, 2018

Finished benchmarking try commit d1556fa

@oli-obk

This comment has been minimized.

Copy link
Contributor Author

commented Dec 13, 2018

cc @rust-lang/infra the comparison URL link says "commit not found" even though rust-timer is finished.

@kennytm

This comment has been minimized.

Copy link
Member

commented Dec 13, 2018

@oli-obk: The commit missing the one in the master branch. It is still being queued. See https://perf.rust-lang.org/status.html for the complete queue.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Dec 13, 2018

We're waiting for the parent commit to finish benchmarking; the bot (correctly) indicated that we finished the try commit, if a bit misleadingly.

https://perf.rust-lang.org/status.html if you want to follow along

@mati865

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2018

Perf results are up 🎉

@nagisa

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2019

I was unable to write a test that

runs in under 1s
overflows on my machine without this patch

It would be fine to add a test that runs for longer, but given that compiler changes all the time I don’t see that test serving its purpose for any extent of the time, I expect there to be very many different locations in the compiler where many stack frames could be created at once.

@oli-obk oli-obk force-pushed the oli-obk:stacker branch from 8e7e4d7 to 8df7c9b Jan 18, 2019

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Jan 18, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:106246d4:start=1547816006121000851,finish=1547816007188768019,duration=1067767168
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---
##########################################################                81.8%
######################################################################## 100.0%
[00:01:53] extracting /checkout/obj/build/cache/2019-01-04/cargo-beta-x86_64-unknown-linux-gnu.tar.gz
[00:01:53]     Updating crates.io index
[00:02:03] error: the lock file /checkout/Cargo.lock needs to be updated but --locked was passed to prevent this
[00:02:03] Build completed unsuccessfully in 0:00:23
[00:02:03] make: *** [prepare] Error 1
[00:02:03] Makefile:71: recipe for target 'prepare' failed
[00:02:04] Command failed. Attempt 2/5:
[00:02:04] Command failed. Attempt 2/5:
[00:02:04]     Updating crates.io index
[00:02:04] error: the lock file /checkout/Cargo.lock needs to be updated but --locked was passed to prevent this
[00:02:04] Build completed unsuccessfully in 0:00:00
[00:02:04] Makefile:71: recipe for target 'prepare' failed
[00:02:04] make: *** [prepare] Error 1
[00:02:06] Command failed. Attempt 3/5:
[00:02:06] Command failed. Attempt 3/5:
[00:02:06]     Updating crates.io index
[00:02:07] error: the lock file /checkout/Cargo.lock needs to be updated but --locked was passed to prevent this
[00:02:07] Build completed unsuccessfully in 0:00:00
[00:02:07] Makefile:71: recipe for target 'prepare' failed
[00:02:07] make: *** [prepare] Error 1
[00:02:10] Command failed. Attempt 4/5:
[00:02:10] Command failed. Attempt 4/5:
[00:02:10]     Updating crates.io index
[00:02:10] error: the lock file /checkout/Cargo.lock needs to be updated but --locked was passed to prevent this
[00:02:10] Build completed unsuccessfully in 0:00:00
[00:02:10] make: *** [prepare] Error 1
[00:02:10] Makefile:71: recipe for target 'prepare' failed
[00:02:14] Command failed. Attempt 5/5:
[00:02:14] Command failed. Attempt 5/5:
[00:02:14]     Updating crates.io index
[00:02:15] error: the lock file /checkout/Cargo.lock needs to be updated but --locked was passed to prevent this
[00:02:15] Build completed unsuccessfully in 0:00:00
[00:02:15] Makefile:71: recipe for target 'prepare' failed
[00:02:15] The command has failed after 5 attempts.
[00:02:15] make: *** [prepare] Error 1
---
9528 ./.git/modules/src/tools/rustfmt/objects/pack
9428 ./src/tools/lldb/unittests/Process
9344 ./src/tools/lldb/unittests/Process/minidump
9328 ./src/doc
9316 ./src/tools/lldb/unittests/Process/minidump/Inputs
9160 ./src/llvm/lib/CodeGen
9076 ./src/tools/lldb/packages/Python/lldbsuite/test/functionalities/postmortem/wow64_minidump
8780 ./src/llvm/test/DebugInfo
8568 ./src/llvm-emscripten/lib/CodeGen
travis_time:end:1cc80f38:start=1547816155057659653,finish=1547816155505010016,duration=447350363
travis_fold:end:after_failure.1

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors

This comment has been minimized.

Copy link
Contributor

commented Feb 7, 2019

☔️ The latest upstream changes (presumably #58254) made this pull request unmergeable. Please resolve the merge conflicts.

@Dylan-DPC

This comment has been minimized.

Copy link
Member

commented Mar 4, 2019

ping from triage @oli-obk you have pending conflicts to resolve.

@Dylan-DPC

This comment has been minimized.

Copy link
Member

commented Mar 11, 2019

ping from triage @oli-obk what's the update on this?

@oli-obk

This comment has been minimized.

Copy link
Contributor Author

commented Mar 12, 2019

I think this is still blocked on @nagisa's stacker backend rewrite.

@TimNN TimNN added S-blocked and removed S-waiting-on-author labels Mar 19, 2019

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

commented May 31, 2019

r? @pnkfelix for re-assignment (I'll be on leave for a while)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.