Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hooks for Miri panic unwinding #60026

Open
wants to merge 1 commit into
base: master
from

Conversation

@Aaron1011
Copy link
Contributor

commented Apr 17, 2019

This commits adds in some additional hooks to allow Miri to properly
handle panic unwinding. None of this should have any impact on CTFE mode

This supports rust-lang/miri#693

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Apr 17, 2019

r? @estebank

(rust_highfive has picked a reviewer for you, use r? to override)

@Aaron1011

This comment has been minimized.

Copy link
Contributor Author

commented Apr 17, 2019

@rust-highfive rust-highfive assigned RalfJung and unassigned estebank Apr 17, 2019

@Aaron1011

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2019

@RalfJung: Are there any changes that you'd like me to make?

@RalfJung

This comment has been minimized.

Copy link
Member

commented Apr 23, 2019

Sorry, I didn't yet have the chance to look at this. It's on my list though!

@RalfJung

This comment has been minimized.

Copy link
Member

commented Apr 29, 2019

I am a bit surprised by the implementation strategy here. Logically speaking, what happens with unwinding is that every function has two "return continuations", as in, two "return addresses" that it might jump to when execution is done: the successful one for normal completion, and the "unwind" address for when unwinding is happening.

So, what I'd expect is that StackPopCleanup::Goto stores to addresses to continue at, one for return and one for unwind. Then when a stack frame gets popped, whether that is a "return" or an "unwind" pop decides which of the two addresses we jump to.

I think with that approach you can also keep using the normal interpreter loop in run, instead of having to run your own loop (which, as @bjorn3 mentioned, won't work well for Priroda). (EDIT: Ah, turns out that loop was for the box_me_up call, so that's different.)

Is there a particular reason you chose the approach you did (which I am still trying to understand)?

@RalfJung

This comment has been minimized.

Copy link
Member

commented Apr 29, 2019

Also, this is super cool! I didn't expect unwind support for Miri to happen any time soon, so I am very happy to see you tackle this. :)

@Aaron1011

This comment has been minimized.

Copy link
Contributor Author

commented Apr 29, 2019

@RalfJung: My main goal was to make the minimum amount of changes to rustc, while leaving the core logic in Miri.

@RalfJung

This comment has been minimized.

Copy link
Member

commented May 19, 2019

If you want to keep that kind of information out of the engine, make StackPopCleanup generic over Machine, add a StackPopCleanup assoc type to Machine and add a Machine variant to StackPopCleanup that contains a single value of Machine::StackPopCleanup type. Then miri can fill in its own enum there.

TBH that sounds like overkill to me. The action that happens on a pop is so simple (just take the unwind continuation instead of the return continuation), I don't think this warrants the mental overhead of a machine hook. I mean, it's literally just let target_block = if unwind { unwind_block } else { return_block }; self.goto(target_block).

@oli-obk

This comment has been minimized.

Copy link
Contributor

commented May 19, 2019

I have no personal preference. I was mainly aiming to explain how to achieve @Aaron1011 's target

@Aaron1011 Aaron1011 force-pushed the Aaron1011:feature/miri-unwind branch 2 times, most recently from 5a5b99e to 8aa3d5d May 29, 2019

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented May 29, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:02752a56:start=1559097167387587572,finish=1559097168209689654,duration=822102082
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:04:47] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:04:47] tidy error: /checkout/src/librustc_mir/interpret/terminator.rs:10: line longer than 100 chars
[00:04:47] tidy error: /checkout/src/librustc_mir/interpret/eval_context.rs:56: trailing whitespace
[00:04:47] tidy error: /checkout/src/librustc_mir/interpret/eval_context.rs:574: line longer than 100 chars
[00:04:47] tidy error: /checkout/src/librustc_mir/interpret/eval_context.rs:578: line longer than 100 chars
[00:04:47] tidy error: /checkout/src/librustc_mir/interpret/eval_context.rs:664: line longer than 100 chars
[00:04:47] tidy error: /checkout/src/librustc_mir/interpret/eval_context.rs:684: line longer than 100 chars
[00:04:47] tidy error: /checkout/src/librustc_mir/interpret/eval_context.rs:697: trailing whitespace
[00:04:52] some tidy checks failed
[00:04:52] 
[00:04:52] 
[00:04:52] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:04:52] 
[00:04:52] 
[00:04:52] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:04:52] Build completed unsuccessfully in 0:01:12
[00:04:52] Build completed unsuccessfully in 0:01:12
[00:04:52] make: *** [tidy] Error 1
[00:04:52] Makefile:67: recipe for target 'tidy' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:27789af2
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Wed May 29 02:37:51 UTC 2019
---
travis_time:end:1f038874:start=1559097471905737640,finish=1559097471910094757,duration=4357117
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0bfb89c9
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:08298745
travis_time:start:08298745
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:0c2d8d9f
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@Aaron1011 Aaron1011 force-pushed the Aaron1011:feature/miri-unwind branch from 1ae1ec5 to 32542b4 May 30, 2019

@Aaron1011

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2019

@RalfJung: I've made the changes you requested

@bors

This comment has been minimized.

Copy link
Contributor

commented Jun 2, 2019

☔️ The latest upstream changes (presumably #61278) made this pull request unmergeable. Please resolve the merge conflicts.

@Aaron1011 Aaron1011 force-pushed the Aaron1011:feature/miri-unwind branch from 32542b4 to 3b7be70 Jun 3, 2019

@bors

This comment has been minimized.

Copy link
Contributor

commented Jun 4, 2019

☔️ The latest upstream changes (presumably #61467) made this pull request unmergeable. Please resolve the merge conflicts.

/// 'in_cleanup' indicates whether or not we are currently running the
/// 'unwind' block assocaited with a terminator. When 'in_cleanup' is true,
/// execution proceeds normally until we hit a 'Resume' terminator
Unwinding { in_cleanup: bool }

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jun 4, 2019

Member

Why is this information relevant? The comment should explain that. Isn't hitting Resume enough to know that we are unwinding?

This comment has been minimized.

Copy link
@Aaron1011

Aaron1011 Jun 5, 2019

Author Contributor

I wanted to ensure that we explicitly track whether or not we're currently unwinding. This allows us to detect double panics independent of libstd, and to detect any weird MIR (e.g. an unwind block that doesn't eventually lead to Resume).

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jun 8, 2019

Member

What is wrong with such an Unwind block? That's effectively catching a panic, right?

I'd say detecting double-panics independently of libstd is explicitly a non-goal. At least if it requires so much extra state and complexity as the tracking you are doing here.

/// Whether or not the machine is currently unwinding.
/// This affects the target block that will be jumped
/// to after the stack frame is popped.
pub unwinding: bool

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jun 4, 2019

Member

Why is that needed here? Seems like whether we are running a Return or Resume terminator should be enough to know this?

This comment has been minimized.

Copy link
@Aaron1011

Aaron1011 Jun 5, 2019

Author Contributor

This is used by the machine implementation to indicate when to stop unwinding. We tag the catch_unwind stack frame in Miri, so Miri needs some way of indicating to rustc that it should stop unwinding.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jun 8, 2019

Member

The way to stop unwinding is to jump into code that doesn't unwind.

So one thing that might make sense here is to change the stack_pop machine hook a bit. It could take as a parameter whether we are executing a "resume" or "return"-style pop, and it takes the StackPopCleanup, and it returns which block to jump to (if any). That could be used both to implement catch_panic, and also to make sure that CTFE execution will never do "resume".

That said, I am still not sure why catch_panic cannot just set up the frame such that the "unwind" edge points to the same thing as the "return" edge. But that might be a bit too hacky; clearly that stack frame needs special treatment either way.

info!("LEAVING({}) {}", self.cur_frame(), self.frame().instance);
::log_settings::settings().indentation -= 1;
let frame = self.stack.pop().expect(
"tried to pop a stack frame, but there were none",
);
M::stack_pop(self, frame.extra)?;
let stack_pop_info = M::stack_pop(self, frame.extra)?;
let prev = self.unwinding;

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jun 4, 2019

Member

prev is not very descriptive... maybe prev_unwinding?

Also I am still trying to figure out what this state machine does and what it corresponds to in a real execution.

This comment has been minimized.

Copy link
@Aaron1011

Aaron1011 Jun 5, 2019

Author Contributor

There are three possible states that we can be in:

  1. Normal execution
  2. Stack unwinding
  3. Cleanup block during stack unwinding

I'm not sure if libpanic_unwind explicitly does this kind of validation, as it relies on platform-specific unwinding mechanisms for part of the work. This match is mainly to catch errors in Miri or the MIR itself.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jun 8, 2019

Member

I remain doubtful about doing this kind of state tracking at all, but I am very doubtful about distinguishing (2) and (3) in your list. Do I understand correctly that this is about distinguishing whether we are inside a destructor or whether we are inside the glue code that calls all destructors?

@@ -546,57 +566,143 @@ impl<'a, 'mir, 'tcx: 'mir, M: Machine<'a, 'mir, 'tcx>> InterpretCx<'a, 'mir, 'tc
}
}

pub(super) fn pop_stack_frame(&mut self) -> EvalResult<'tcx> {

pub(super) fn pop_stack_frame_internal(&mut self) -> EvalResult<'tcx, StackPopCleanup> {

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jun 4, 2019

Member

My expectation was that this function is told, via a parameter, whether this is a returning or an unwinding pop. Whoever triggers the pop should know that.

This comment has been minimized.

Copy link
@Aaron1011

Aaron1011 Jun 5, 2019

Author Contributor

That parameter would either be redundant (if it agrees with the tracked unwinding state), or wrong (if it disagrees with the tracked unwinding state).

Besides, I use this method in pop_stack_frame, when we need to pop off frames that don't contain unwind terminators.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jun 8, 2019

Member

The thing that is redundant here is the tracked unwinding state. :) This kind of indirect passing of "where do we even unwind to" makes the code very hard to follow. In particular, this entire state machine here IMO shows that there is redundant state.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jun 8, 2019

Member

Besides, I use this method in pop_stack_frame, when we need to pop off frames that don't contain unwind terminators.

Frames that don't contain Unwind terminators should never occur during unwinding, that sounds like UB to me.

/// Just do nohing: Used by Main and for the box_alloc hook in miri.
/// `cleanup` says whether locals are deallocated. Static computation
/// wants them leaked to intern what they need (and just throw away
/// the entire `ecx` when it is done).
None { cleanup: bool },
/// `wundin` stores the block used for cleanup during unwinding
None { cleanup: bool, unwind: Option<mir::BasicBlock> },

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jun 4, 2019

Member

This very special StackPopCleanup is only used for a few cases as indicated, and I don't think there is a possibility of unwinding there. So why is an unwind field needed in None?

exchange_malloc even has a comment saying "This function must not unwind.", and start_fn fairly immediately calls catch_unwind.

This comment has been minimized.

Copy link
@Aaron1011

Aaron1011 Jun 5, 2019

Author Contributor

I can remove it.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jun 8, 2019

Member

Please do.

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jun 10, 2019

Good catch, there is!

So @Aaron1011 you wouldn't even have to add a field to Frame that tracks if we are cleaning up, you can add a method instead that checks if the basic block we are currently in is marked as a cleanup block.

@bjorn3

This comment has been minimized.

Copy link
Contributor

commented Jun 10, 2019

isn't there an "is this a cleanup block" flag in MIR?

Yes, but #58892 shows that it has been set to the wrong value in some cases in the past, so it may be nice to keep a per-frame unwind flag as sanity check.

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jun 10, 2019

Well that sounds like a good argument for having Miri check it then. ;)

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

@Aaron1011 we haven't heard from you in a while. What is the current status of this? Do you still intend to work on it?

@Aaron1011

This comment has been minimized.

Copy link
Contributor Author

commented Jun 26, 2019

@RalfJung: Sorry - I've been fairly busy recently. I do intend to pick this PR back up soon.

@Aaron1011

This comment has been minimized.

Copy link
Contributor Author

commented Jun 30, 2019

@RalfJung: I took a shot at removing UnwindingState in favor of the simplified single-boolean approach you mentioned. However, I still think my UnwindingState approach makes more sense:

With your approach, we need Miri to be able to signal that unwinding has stopped or started (e.g. via the stack_pop callback). However, rustc will still need to keep track of the unwinding state for each frame, in order to determine which block to jump to (normal or unwind) when the frame is popped.

This would actually end up being more complicated than my current approach. We still have logic split across miri and rustc. However, we now have even more states we need to deal with:

  1. Miri signals unwinding start during an unwind frame
  2. Miri signals unwinding start during a non-unwind frame
  3. Miri signals unwinding stop during an unwind frame
  4. Miri signals unwinding stop during a non-unwind frame.

This is what UnwindingState is designed to represent. However, the two states will not be split across Miri and rustc, making things much harder to follow.

@Aaron1011 Aaron1011 force-pushed the Aaron1011:feature/miri-unwind branch from 252c217 to fe26ed9 Jun 30, 2019

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

(An important deadline is approaching and I didn't get around to page all this back in before. The deadline is next week, I'll respond as soon after that as I can.)

@Aaron1011 Aaron1011 force-pushed the Aaron1011:feature/miri-unwind branch 3 times, most recently from 94d5f1b to b11d080 Jul 6, 2019

@joelpalmer

This comment has been minimized.

Copy link

commented Jul 22, 2019

Hi @Aaron1011 & @RalfJung - this is a ping from Triage. Any updates on this PR? I'll leave the labels as is for now. Thanks!

@joelpalmer

This comment has been minimized.

Copy link

commented Jul 22, 2019

Triage: Changing to waiting on review

@rustbot modify labels to -S-waiting-on-author, +S-waiting-on-review

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jul 22, 2019

Sorry, it's been taken me a while to get back to this as I had a deadline and a correlated overflow of my inbox.

@bors

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2019

☔️ The latest upstream changes (presumably #62902) made this pull request unmergeable. Please resolve the merge conflicts.

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jul 24, 2019

@Aaron1011

I took a shot at removing UnwindingState in favor of the simplified single-boolean approach you mentioned. However, I still think my UnwindingState approach makes more sense

I do not understand what problem you are trying to describe here. I think you are saying there would be duplicated state, why that? What is the rustc state ("rustc will still need to keep track of the unwinding state") you are describing above?

Note that, as I mentioned above, you don't even need to add a boolean -- it already exists: to test if a stack frame is unwinding, look at the is_cleanup flag of its current basic block.

@Aaron1011

This comment has been minimized.

Copy link
Contributor Author

commented Jul 27, 2019

@RalfJung:

What is the rustc state ("rustc will still need to keep track of the unwinding state") you are describing above?

The state is 'whether or not we are currently in the process of unwinding'. This affects how we pop normal frames:

if let Unwinding { in_cleanup: false } = self.unwinding {

If we're unwinding, we want to jump to the cleanup block in the StackPopCleanup data.

Note that, as I mentioned above, you don't even need to add a boolean -- it already exists: to test if a stack frame is unwinding, look at the is_cleanup flag of its current basic block.

That's not the same information. pop_stack_frame needs to know if the current ordinary frame is being popped during unwinding or not. If we're not unwinding, we want to jump to the normal return block from StackPopCleaup. If we are unwinding, we want to jump to the unwind cleanup block.

The is_cleanup flag just statically tells us if a particular block is an unwinding block. This isn't want we want to know - we need to make decisions about how to pop blocks where is_cleanup = false - that is, blocks which can be exited normally (by returning) or abnormally (as a result of a panic).

Add hooks for Miri panic unwinding
This commits adds in some additional hooks to allow Miri to properly
handle panic unwinding. None of this should have any impact on CTFE mode

@Aaron1011 Aaron1011 force-pushed the Aaron1011:feature/miri-unwind branch from b11d080 to 09a6150 Jul 27, 2019

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

The state is 'whether or not we are currently in the process of unwinding'. This affects how we pop normal frames:

You don't need any state for that. Whether you jump to the cleanup or return block should be determined directly by whether you are executing an Resume or Return terminator.

The state you are adding here AFAIK does not exist at run-time, so it cannot be needed to actually execute the program.

The rest of your post seems to be written under the assumption that somehow it's hard to figure out which of the two continuations to use, so I don't follow.

@bors

This comment has been minimized.

Copy link
Contributor

commented Aug 2, 2019

☔️ The latest upstream changes (presumably #63214) made this pull request unmergeable. Please resolve the merge conflicts.

@Aaron1011

This comment has been minimized.

Copy link
Contributor Author

commented Aug 4, 2019

@RalfJung: This allows us to implement the sanity check mentioned by @bjorn3 - that is, checking that we don't attempt to execute an incorrect terminator for the current state (Return during cleanup, or Resume during normal execution).

@RalfJung

This comment has been minimized.

Copy link
Member

commented Aug 4, 2019

I feel we are going in circles.

My proposal above was to do the sanity check by doing, in terminator.rs:

        match terminator.kind {
            Return => {
                assert!(!self.frame_bb().is_cleanup, "returning from cleanup block?");
                // ...
            }
            Resume => {
                assert!(self.frame_bb().is_cleanup, "resuming from non-cleanup block?");
                // ...
            }
        }

After I proposed this you claimed you still need the extra state. The question stands: why?

@joelpalmer

This comment has been minimized.

Copy link

commented Aug 12, 2019

Ping from Triage. Hi, @Aaron1011 - any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.