Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort on stack overflow instead of re-raising SIGSEGV #31333

Merged
merged 1 commit into from
Feb 6, 2016

Conversation

lambda
Copy link
Contributor

@lambda lambda commented Jan 31, 2016

Abort on stack overflow instead of re-raising SIGSEGV

We use guard pages that cause the process to abort to protect against
undefined behavior in the event of stack overflow. We have a handler
that catches segfaults, prints out an error message if the segfault was
due to a stack overflow, then unregisters itself and returns to allow
the signal to be re-raised and kill the process.

This caused some confusion, as it was unexpected that safe code would be
able to cause a segfault, while it's easy to overflow the stack in safe
code. To avoid this confusion, when we detect a segfault in the guard
page, abort instead of the previous behavior of re-raising SIGSEGV.

To test this, we need to adapt the tests for segfault to actually check
the exit status. Doing so revealed that the existing test for segfault
behavior was actually invalid; LLVM optimizes the explicit null pointer
reference down to an illegal instruction, so the program aborts with
SIGILL instead of SIGSEGV and the test didn't actually trigger the
signal handler at all. Use a C helper function to get a null pointer
that LLVM can't optimize away, so we get our segfault instead.

This is a [breaking-change] if anyone is relying on the exact signal
raised to kill a process on stack overflow.

Closes #31273

@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@lambda
Copy link
Contributor Author

lambda commented Feb 1, 2016

I've only tested this on Mac OS X, but I'm pretty sure that everything it does is kosher according to POSIX, and everything that's Unix-specific has been protected with #[cfg(unix)]), so I don't think this should break any platforms. The one thing that isn't in POSIX is MAP_ANON in the mmap that's used to map an unmap a page in the tests to test out that our processes are killed with the appropriate signals; however, every platform I checked in a quick search (Linux, FreeBSD, NetBSD, OpenBSD, Solaris, Mac OS X) supports that.


let mut buf = libc::mmap(0 as *mut libc::c_void,
pagesize as libc::size_t,
libc::PROT_WRITE | libc::PROT_READ,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps just do not pass in PROT_WRITE here to avoid the potent race condition where pointer could become valid between the unmap and dereference?

Not that it matters much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tried that, but then I was getting SIGBUS instead of SIGSEGV like with mprotect. I figured that just unmapping would be a better way to get a guaranteed SIGSEGV. I don't know of any circumstances in which the kernel would re-map that memory for us without anything going on on our process to cause it to do so, so I don't think we need to worry about such a race here; also, POSIX requires that after calling munmap, further references to those pages will produce SIGSEGV.

@lambda lambda force-pushed the 31273-abort-on-stack-overflow branch from 49cb4e1 to bba4272 Compare February 1, 2016 02:43
@lambda
Copy link
Contributor Author

lambda commented Feb 1, 2016

cc @geofft @Zoxc @brson


// See comment above for why this function returns.
libc::raise(libc::SIGABRT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To behave more like rtabort! this may wish to use intrinsics::abort() as it mimics other runtime-abort behavior and means we don't need to mess around with signals or anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but intrinsics::abort() does so via an illegal instruction, giving us SIGILL, which doesn't seem ideal if we're trying to clear up confusion caused by the particular signal received.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I was just going off @brson's desire that stack overflow "should abort like any other fatal error". Our other fatal errors today use rtabort! which ends up translating to intrinsics::abort.

It would indeed do so via an illegal instruction, resulting in SIGILL, and likely resulting in a core dump as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to change rtabort! to call libc::abort instead of intrinsics::abort, now that we have libc::abort? libc::abort is supposed to handle all of the edge cases like this, either unregistering the signal handler or killing itself another way if the signal handler returns or just infinitely looping if all else fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be down with that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, to be clear, on Unix that seems fine but on Windows we're trying to avoid the CRT wherever possible, so in that sense it may be a platform-specific abort process. I guess it makes less sense in that case :(

@alexcrichton
Copy link
Member

r? @brson

You seem more opinionated than I, but both the current strategy an the proposed strategy are fine by me!

@rust-highfive rust-highfive assigned brson and unassigned alexcrichton Feb 1, 2016
@lambda lambda force-pushed the 31273-abort-on-stack-overflow branch from bba4272 to 563904e Compare February 1, 2016 05:38
@brson
Copy link
Contributor

brson commented Feb 2, 2016

Yeah, but intrinsics::abort() does so via an illegal instruction, giving us SIGILL, which doesn't seem ideal if we're trying to clear up confusion caused by the particular signal received.

My preference is for all aborts to be the same, and would prefer this to be an rtabort!. If it would be more proper to terminate with SIGABORT than SIGILL let's do that for all aborts, not just this one (but in a separate PR). Please do turn this into an rtabort.

@brson brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Feb 2, 2016
@brson
Copy link
Contributor

brson commented Feb 2, 2016

This is a breaking change, but probably one nobody is relying on.

@lambda lambda force-pushed the 31273-abort-on-stack-overflow branch from 563904e to 557313b Compare February 2, 2016 05:18
@lambda
Copy link
Contributor Author

lambda commented Feb 2, 2016

Switched to rtabort! instead of manually raising SIGABRT, updated the commit message to indicate possible breakage.

@lambda lambda force-pushed the 31273-abort-on-stack-overflow branch from 557313b to f608abb Compare February 2, 2016 05:28
use std::os::unix::process::ExitStatusExt;

assert!(!status.success());
assert!(status.signal() != Some(libc::SIGSEGV));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this is also true on Windows?

@brson
Copy link
Contributor

brson commented Feb 5, 2016

@bors r+

@bors
Copy link
Contributor

bors commented Feb 5, 2016

📌 Commit f608abb has been approved by brson

@bors
Copy link
Contributor

bors commented Feb 6, 2016

⌛ Testing commit f608abb with merge 8a0874b...

@bors
Copy link
Contributor

bors commented Feb 6, 2016

💔 Test failed - auto-mac-32-opt

We use guard pages that cause the process to abort to protect against
undefined behavior in the event of stack overflow.  We have a handler
that catches segfaults, prints out an error message if the segfault was
due to a stack overflow, then unregisters itself and returns to allow
the signal to be re-raised and kill the process.

This caused some confusion, as it was unexpected that safe code would be
able to cause a segfault, while it's easy to overflow the stack in safe
code.  To avoid this confusion, when we detect a segfault in the guard
page, abort instead of the previous behavior of re-raising the SIGSEGV.

To test this, we need to adapt the tests for segfault to actually check
the exit status.  Doing so revealed that the existing test for segfault
behavior was actually invalid; LLVM optimizes the explicit null pointer
reference down to an illegal instruction, so the program aborts with
SIGILL instead of SIGSEGV and the test didn't actually trigger the
signal handler at all.  Use a C helper function to get a null pointer
that LLVM can't optimize away, so we get our segfault instead.

This is a [breaking-change] if anyone is relying on the exact signal
raised to kill a process on stack overflow.

Closes rust-lang#31273
@lambda lambda force-pushed the 31273-abort-on-stack-overflow branch from f608abb to ee79bfa Compare February 6, 2016 01:42
@lambda
Copy link
Contributor Author

lambda commented Feb 6, 2016

Whoops, some architectures throw SIGBUS on a null pointer dereference rather than SIGSEGV. Pushed a fix that will check for either of those in the tests.

@alexcrichton
Copy link
Member

@bors: r=brson ee79bfa

@bors
Copy link
Contributor

bors commented Feb 6, 2016

⌛ Testing commit ee79bfa with merge 3be9ca1...

bors added a commit that referenced this pull request Feb 6, 2016
Abort on stack overflow instead of re-raising SIGSEGV

We use guard pages that cause the process to abort to protect against
undefined behavior in the event of stack overflow.  We have a handler
that catches segfaults, prints out an error message if the segfault was
due to a stack overflow, then unregisters itself and returns to allow
the signal to be re-raised and kill the process.

This caused some confusion, as it was unexpected that safe code would be
able to cause a segfault, while it's easy to overflow the stack in safe
code.  To avoid this confusion, when we detect a segfault in the guard
page, abort instead of the previous behavior of re-raising SIGSEGV.

To test this, we need to adapt the tests for segfault to actually check
the exit status.  Doing so revealed that the existing test for segfault
behavior was actually invalid; LLVM optimizes the explicit null pointer
reference down to an illegal instruction, so the program aborts with
SIGILL instead of SIGSEGV and the test didn't actually trigger the
signal handler at all.  Use a C helper function to get a null pointer
that LLVM can't optimize away, so we get our segfault instead.

This is a [breaking-change] if anyone is relying on the exact signal
raised to kill a process on stack overflow.

Closes #31273
@bors
Copy link
Contributor

bors commented Feb 6, 2016

💔 Test failed - auto-win-msvc-32-opt

@lambda
Copy link
Contributor Author

lambda commented Feb 6, 2016

Failure looks spurious.

@brson
Copy link
Contributor

brson commented Feb 6, 2016

@bors retry

@brson
Copy link
Contributor

brson commented Feb 6, 2016

cc @rust-lang/lang In some ways this is a pretty significant change to how processes terminate on stack overflow. It makes stack overflow terminate the process with rtabort! (which produces SIGILL), instead of the SIGSEGV / SEGBUS signal delivered by the OS. The rationale is that the segfault is an implementation detail of how std catches the stack overflow, we may want to change stack over behavior, etc.

@bors
Copy link
Contributor

bors commented Feb 6, 2016

⌛ Testing commit ee79bfa with merge 35635ae...

bors added a commit that referenced this pull request Feb 6, 2016
Abort on stack overflow instead of re-raising SIGSEGV

We use guard pages that cause the process to abort to protect against
undefined behavior in the event of stack overflow.  We have a handler
that catches segfaults, prints out an error message if the segfault was
due to a stack overflow, then unregisters itself and returns to allow
the signal to be re-raised and kill the process.

This caused some confusion, as it was unexpected that safe code would be
able to cause a segfault, while it's easy to overflow the stack in safe
code.  To avoid this confusion, when we detect a segfault in the guard
page, abort instead of the previous behavior of re-raising SIGSEGV.

To test this, we need to adapt the tests for segfault to actually check
the exit status.  Doing so revealed that the existing test for segfault
behavior was actually invalid; LLVM optimizes the explicit null pointer
reference down to an illegal instruction, so the program aborts with
SIGILL instead of SIGSEGV and the test didn't actually trigger the
signal handler at all.  Use a C helper function to get a null pointer
that LLVM can't optimize away, so we get our segfault instead.

This is a [breaking-change] if anyone is relying on the exact signal
raised to kill a process on stack overflow.

Closes #31273
@bors bors merged commit ee79bfa into rust-lang:master Feb 6, 2016
lambda added a commit to lambda/rust that referenced this pull request Feb 6, 2016
intrinsics::abort compiles down to an illegal instruction, which on
Unix-like platforms causes the process to be killed with SIGILL.  A more
appropriate way to kill the process would be SIGABRT; this indicates
better that the runtime has explicitly aborted, rather than some kind of
compiler bug or architecture mismatch that SIGILL might indicate.

For rtassert!, replace this with libc::abort.  libc::abort raises
SIGABRT, but is defined to do so in such a way that it will terminate
the process even if SIGABRT is currently masked or caught by a signal
handler that returns.

On non-Unix platforms, retain the existing behavior.  On Windows we
prefer to avoid depending on the C runtime, and we need a fallback for
any other platforms that may be defined.  An alternative on Windows
would be to call TerminateProcess, but this seems less essential than
switching to using SIGABRT on Unix-like platforms, where it is common
for the process-killing signal to be printed out or logged.

This is a [breaking-change] for any code that depends on the exact
signal raised to abort a process via rtabort!

cc rust-lang#31273
cc rust-lang#31333
lambda added a commit to lambda/rust that referenced this pull request May 23, 2016
intrinsics::abort compiles down to an illegal instruction, which on
Unix-like platforms causes the process to be killed with SIGILL.  A more
appropriate way to kill the process would be SIGABRT; this indicates
better that the runtime has explicitly aborted, rather than some kind of
compiler bug or architecture mismatch that SIGILL might indicate.

For rtassert!, replace this with libc::abort.  libc::abort raises
SIGABRT, but is defined to do so in such a way that it will terminate
the process even if SIGABRT is currently masked or caught by a signal
handler that returns.

On non-Unix platforms, retain the existing behavior.  On Windows we
prefer to avoid depending on the C runtime, and we need a fallback for
any other platforms that may be defined.  An alternative on Windows
would be to call TerminateProcess, but this seems less essential than
switching to using SIGABRT on Unix-like platforms, where it is common
for the process-killing signal to be printed out or logged.

This is a [breaking-change] for any code that depends on the exact
signal raised to abort a process via rtabort!

cc rust-lang#31273
cc rust-lang#31333
bors added a commit that referenced this pull request May 23, 2016
Use libc::abort, not intrinsics::abort, in rtabort!

intrinsics::abort compiles down to an illegal instruction, which on
Unix-like platforms causes the process to be killed with SIGILL.  A more
appropriate way to kill the process would be SIGABRT; this indicates
better that the runtime has explicitly aborted, rather than some kind of
compiler bug or architecture mismatch that SIGILL might indicate.

For rtassert!, replace this with libc::abort.  libc::abort raises
SIGABRT, but is defined to do so in such a way that it will terminate
the process even if SIGABRT is currently masked or caught by a signal
handler that returns.

On non-Unix platforms, retain the existing behavior.  On Windows we
prefer to avoid depending on the C runtime, and we need a fallback for
any other platforms that may be defined.  An alternative on Windows
would be to call TerminateProcess, but this seems less essential than
switching to using SIGABRT on Unix-like platforms, where it is common
for the process-killing signal to be printed out or logged.

This is a [breaking-change] for any code that depends on the exact
signal raised to abort a process via rtabort!

cc #31273
cc #31333
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants