Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'solaris' runtime stack guard conflicts with OS stack-clash mitigation #52577

Open
pfmooney opened this Issue Jul 20, 2018 · 15 comments

Comments

Projects
None yet
6 participants
@pfmooney
Copy link

pfmooney commented Jul 20, 2018

We have found the stack guard logic present in rust 1.27 conflicts with the OS-provided mitigation present in illumos distributions. This became clear on SmartOS when the 2018Q2 pkgsrc update moved from rust 1.24 to 1.27. Those binaries, compiled on an old platform without the mitigation, bail immediately on machines that do possess it

# rustc
thread 'main' panicked at 'failed to allocate a guard page', libstd/sys/unix/thread.rs:337:17
note: Run with `RUST_BACKTRACE=1` for a backtrace.
# RUST_BACKTRACE=full rustc
thread 'main' panicked at 'failed to allocate a guard page', libstd/sys/unix/thread.rs:337:17
stack backtrace:
   0: 0xfffffd7fde402f3e - std::sys::unix::backtrace::tracing::imp::unwind_backtrace::ha22b267de1618f10
   1: 0xfffffd7fde3cdee6 - std::sys_common::backtrace::print::h9be6459ca953ea86
   2: 0xfffffd7fde410a1b - std::panicking::default_hook::{{closure}}::h11d519068135b02a
   3: 0xfffffd7fde4106e1 - std::panicking::default_hook::he7d6fa94ad994305
   4: 0xfffffd7fde41110c - std::panicking::rust_panic_with_hook::hef4aa8f9edd49f7f
   5: 0xfffffd7fde410ef6 - std::panicking::begin_panic::h350ed1a97f2676c6
   6: 0xfffffd7fde3e7b24 - std::sys::unix::thread::guard::init::h66e847be7b85328f
   7: 0xfffffd7fde3e44fd - std::rt::update_stack_guard::h3749f1240dd6decc
   8: 0xfffffd7fde786185 - rustc_driver::run::hf99f9e1bb314cfb8
   9: 0xfffffd7fde7954da - rustc_driver::main::he45a01066d380276
  10:           0x401842 - std::rt::lang_start::{{closure}}::h50ce2fc38c241613
  11: 0xfffffd7fde410be2 - std::panicking::try::do_call::h260c23fa696aee03
  12: 0xfffffd7fde417919 - __rust_maybe_catch_panic
  13: 0xfffffd7fde3e4325 - std::rt::lang_start_internal::hef1606e7d03cbe2a
  14:           0x4018a3 - main
  15:           0x40167b - _start

The details of how the two guards are conflicting can be found written up in OS-7059. The stack-clash mitigation was added to SmartOS and OmniOS back in October 2017. It was merged into illumos-gate this past week, marking its availability for any remaining downstream distributions.

Our temporary mitigation has been to float a patch in pkgsrc to treat the 'solaris' platform like 'linux', disabling the stack-guard mapping:

--- src/libstd/sys/unix/thread.rs.orig 2018-06-24 22:42:29.203295357 +0000
+++ src/libstd/sys/unix/thread.rs
@@ -309,7 +309,7 @@ pub mod guard {

         let stackaddr = get_stack_start_aligned()?;

-        if cfg!(target_os = "linux") {
+        if cfg!(any(target_os = "linux", target_os = "solaris")) {
             // Linux doesn't allocate the whole stack right away, and
             // the kernel has its own stack-guard mechanism to fault
             // when growing too close to an existing mapping.  If we map
@@ -345,7 +354,7 @@ pub mod guard {
     }

     pub unsafe fn deinit() {
-        if !cfg!(target_os = "linux") {
+        if cfg!(not(any(target_os = "linux", target_os = "solaris"))) {
             if let Some(stackaddr) = get_stack_start_aligned() {
                 // Remove the protection on the guard page.
// FIXME: we cannot unmap the page, because when we mmap()

Considering the rest of the illumos ecosystem is (or soon will be) effected by this issue, it would be nice to get that stack-guard exclusion moved into upstream rust.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jul 21, 2018

Please submit a PR with the patch.

@nagisa nagisa added the O-solaris label Jul 21, 2018

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jul 21, 2018

The most important question here probably concerns the backwards compatibility. "Just" disabling the code path for solaris will also remove the stack guard for all the older/other solarises that do not have their built-in stack mitigation implemented. So in the best case we end up with code that checks the mmap call result and does not panic if it fails (potentially ignoring the genuine errors).

Seems to me that "fixing" this in illumos would be better if we care about back-compat.

@pfmooney

This comment has been minimized.

Copy link
Author

pfmooney commented Jul 23, 2018

Please submit a PR with the patch.

I planned to wait until consensus was reached about the right path forward before submitting a PR with anything.

"Just" disabling the code path for solaris will also remove the stack guard for all the older/other solarises that do not have their built-in stack mitigation implemented.

The same can be said about older deployments of Linux which lack the enhanced stack-clash protections. Is it inappropriate to rely on its simpler OS-provided guard page on systems without stack-clash protection?

The mapping of ld.so on older illumos platforms which lack the stack-clash protections is deterministic. It still results in a page gap between the end of the stack and the next valid mapping. Programs overrunning the stack limit will still take a fault.

Seems to me that "fixing" this in illumos would be better if we care about back-compat.

Considering that there are over 9 months of illumos platforms in the wild on which rust's latest bits will abort, "fixing" illumos does not seem so attractive. How would it differ from Linux's change to its own stack handling?

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jul 25, 2018

Is it true that this bug has only begun with 1.27 (I’ve only noticed this point now)? Given that the only changes present in 1.27 in that file are from #48575 perhaps reverting that PR would fix the issue altogether?

The only other approach I’m comfortable with is ignoring the errors from syscalls on solaris, which will hide the actual bugs if they ever happen, but at least both new and old solarises will be properly supported.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jul 25, 2018

@ishitatsuyuki

This comment has been minimized.

Copy link
Member

ishitatsuyuki commented Jul 25, 2018

@nagisa I think stack guard is always enabled so it has nothing to do with my PR?

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jul 25, 2018

This became clear on SmartOS when the 2018Q2 pkgsrc update moved from rust 1.24 to 1.27.

seems to imply that 1.24 was working but 1.27 is not, everything else being the same?

@pfmooney is that right? Is the "2018Q2 update" something resembling an update between, say, Ubuntu releases? If so, can you please check versions 1.26 to 1.24 with other parts of your system being the same?

@cuviper

This comment has been minimized.

Copy link
Member

cuviper commented Jul 25, 2018

@pfmooney

Is it inappropriate to rely on its simpler OS-provided guard page on systems without stack-clash protection?

My argument for #43072 was that this is fine. Rust was only adding a simple single-page guard, so if the OS already has this, we're not protecting it any better. Then with the additional protections for Stack Clash, Rust's guard page made things actually worse.

Do the older illumos systems still have some basic protection for overflow? If so, then I think you can use the same logic I did. Trust that the older OS guard is at least as good as our own, and that the newer OS will do even better with Stack Clash. So perhaps illumos can follow what linux is doing since #43072.

@ishitatsuyuki

I think stack guard is always enabled so it has nothing to do with my PR?

If you look at the backtrace, this error is happening from the guard::init within rust_driver::main -- i.e. this is the re-initialization added in your PR. The initial guard that rt::lang_start would have added was apparently OK, since it got past that.

@pfmooney

This comment has been minimized.

Copy link
Author

pfmooney commented Jul 25, 2018

seems to imply that 1.24 was working but 1.27 is not, everything else being the same?

Correct. The rustc 1.24.1 shipped in 2018Q1 works fine on a system bearing the stack-clash protections. In pre-release testing of 1.27.0 for 2018Q2, it was found to abort due to the added guard. (In order to make it useful, 2018Q2 bears the included patch)

@pfmooney is that right? Is the "2018Q2 update" something resembling an update between, say, Ubuntu releases?

SmartOS ships images based on branches/snapshots of pkgsrc. The OS platform image, consisting of the kernel and system libraries, is independent of these images and can vary from machine to machine. My test environment, for example, happens to be running a joyent_20171215T051107Z platform to host instances based on both 2018Q1 and 2018Q2 images.

If that is right, can you please check versions 1.26 to 1.24 with other parts of your system being the same?

I only have the 1.24.0 binaries presently at hand, but I can look into building a few of the intermediary versions.

Do the older illumos systems still have some basic protection for overflow? If so, then I think you can use the same logic I did.

On older platforms, the deterministic layout of the address space results in an unoccupied page between the upper limit of the stack and the last mapping of ld.so.

So perhaps illumos can follow what linux is doing since #43072.

Yep, that sounds ideal to me.

@pfmooney

This comment has been minimized.

Copy link
Author

pfmooney commented Jul 31, 2018

Binaries from 1.26.2 appear to work fine on a system with the stack-clash protection.

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Nov 29, 2018

discussed at T-compiler meeting. Assigning to self to take point on this. Not 100% sure whether we will treat this as P-medium (which is my current inclination) or P-high, so I'm leaving the P- label unset for now until I can investigate further

@pnkfelix pnkfelix self-assigned this Nov 29, 2018

@pfmooney

This comment has been minimized.

Copy link
Author

pfmooney commented Nov 29, 2018

Thanks for looking at this @pnkfelix.

The pkgsrc builds still maintain the floated patch from above (Seen here) although it would be nice to get it upstream on the assumption that the illumos platform will have rustup builds one day.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Nov 29, 2018

By the way @pfmooney are there any instances of illumos getting their own target triplet?

@pfmooney

This comment has been minimized.

Copy link
Author

pfmooney commented Nov 29, 2018

I've heard nothing besides what you wrote up in #55553.

@jasonbking

This comment has been minimized.

Copy link

jasonbking commented Feb 7, 2019

Just to follow up on this, we're in the process of creating and testing a new illumos OS target for rust. The current proposed plan is that the illumos target will disable rust stack probes in its spec file. Newer versions of the rust toolchain in pkgsrc will be updated to use this new target as the default (instead of x86_64-sun-solaris).

This should I believe fix the problem, while allowing anyone using rust on actual Solaris platforms to benefit from the rust stack guards while allowing illumos platforms to use the built-in stack-clash mitigation. Once ready, we would also like to contribute back the code for the new target (essentially the spec files + a small amount of updates other crates to make them aware of the illumos target as necessary, plus a few bits fo code to fill in a few gaps).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.