Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::process::unix: Command: Do not unwind past fork(), in child #80263

Closed
wants to merge 1 commit into from

Conversation

ijackson
Copy link
Contributor

@ijackson ijackson commented Dec 21, 2020

Unwinding past fork() in the child causes whatever traps the unwind
to return twice. This is very strange and clearly not desirable.

With the default behaviour of the thread library, this can even result
in a panic in the child being transformed into zero exit status (ie,
success) as seen in the parent!

If unwinding reaches the fork point, the child should abort.

@rust-highfive
Copy link
Collaborator

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 21, 2020
@ijackson
Copy link
Contributor Author

@rustbot modify labels +T-libs +A-runtime

@rustbot rustbot added A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Dec 21, 2020
@cuviper
Copy link
Member

cuviper commented Dec 21, 2020

I don't think the current behavior is strange, or at least no more than fork itself is strange. You've created a new process as a copy of the first, and it could certainly return from the fork point, so why wouldn't unwinding do that too? Think of a forking daemon, for example.

@cuviper
Copy link
Member

cuviper commented Dec 21, 2020

Oh, but you're only changing the fork in Command -- that's much more specific than your PR description implies. I thought you meant all forks.

@ijackson
Copy link
Contributor Author

ijackson commented Dec 21, 2020 via email

@ijackson
Copy link
Contributor Author

ijackson commented Dec 21, 2020 via email

@ijackson ijackson changed the title std::process::unix: Do not unwind past fork(), in child std::process::unix: Command: Do not unwind past fork(), in child Dec 21, 2020
@ijackson
Copy link
Contributor Author

I edited the title to clarify the scope. Thanks for your attention.

Copy link
Member

@m-ou-se m-ou-se left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

library/std/src/sys/unix/process/process_unix/tests.rs Outdated Show resolved Hide resolved
@@ -53,7 +57,8 @@ impl Command {

let pid = unsafe {
match result {
0 => {
0 => (#[unwind(aborts)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the first usage of #[unwind(aborts)] outside a test, and the first usage of #[unwind] on a closure. Not sure if this is stable enough.

(I believe this attribute was originally only meant for extern functions.)

@Mark-Simulacrum Do you know, or do you know who would know?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review.

In defence of using this here: my MR adds a test case which exercises this panic -> unwind -> abort path, and of course there is no non-abort return path because the closure ends with _exit (and returns !)

If #[unwind(aborts)] is not stable enough, should I open-code something with catch_unwind? It seemed to me that using a built-in library facility (even an unstable one) for this was better than an ad-hoc reimplementation of the same functionality.

@m-ou-se m-ou-se added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 30, 2020
@ijackson
Copy link
Contributor Author

ijackson commented Jan 7, 2021

@rustbot modify labels -S-waiting-on-author +S-waiting-on-review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 7, 2021
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 25, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Jan 25, 2021

r? @Mark-Simulacrum to validate this usage of #[unwind(aborts)]. See above.

@Mark-Simulacrum
Copy link
Member

Yes, catch_unwind is probably the better option. I'm pretty sure the unwind attribute in theory should work, but I wouldn't want to rely on it in this context personally. catch_unwind should still compile to similar or identical assembly as the unwind attr.

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 26, 2021
@Mark-Simulacrum
Copy link
Member

@ijackson it also looks like your git commits aren't associated with your github account, fwiw (fine for contributing, just saying in case it's unintentional).

If you want to squash commits on the next push that'll also avoid another runaround.

@rust-log-analyzer
Copy link
Collaborator

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
Successfully built 3b6c825b9799
Successfully tagged rust-ci:latest
Built container sha256:3b6c825b9799bc30f348d39c6ca4570b9feb1f8611b2087d36eab3c6fc2a1baa
Uploading finished image to https://ci-caches.rust-lang.org/docker/80afe504501370b4d310121e20e04a989f302196b07831c4375b96e05bc067556c2046e20ab2062b28a9dc9b2ae132b37d419cc55a065dfcd25501527e829ab9
upload failed: - to s3://rust-lang-ci-sccache2/docker/80afe504501370b4d310121e20e04a989f302196b07831c4375b96e05bc067556c2046e20ab2062b28a9dc9b2ae132b37d419cc55a065dfcd25501527e829ab9 Unable to locate credentials
[CI_JOB_NAME=mingw-check]
---
configure: rust.channel         := nightly
configure: rust.debug-assertions := True
configure: llvm.assertions      := True
configure: dist.missing-tools   := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure: 
configure: run `python /checkout/x.py --help`
configure: 
---
skip untracked path cpu-usage.csv during rustfmt invocations
skip untracked path src/doc/book/ during rustfmt invocations
skip untracked path src/doc/rust-by-example/ during rustfmt invocations
skip untracked path src/llvm-project/ during rustfmt invocations
Diff in /checkout/library/std/src/sys/unix/process/process_unix.rs at line 80:
                     libc::_exit(1)
                 })) {
                     Err(_) => crate::process::abort(),
+                },
                 n => n,
             }
         };
         };
Running `"/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/rustfmt" "--config-path" "/checkout" "--edition" "2018" "--unstable-features" "--skip-children" "--check" "/checkout/library/std/src/sys/unix/process/process_unix.rs"` failed.
If you're running `tidy`, try again with `--bless`. Or, if you just want to format code, run `./x.py fmt` instead.
Build completed unsuccessfully in 0:00:22

@bors bors added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 28, 2021
@ijackson
Copy link
Contributor Author

The failure is this, from my new test case:

---- sys::unix::process::process_inner::tests::test_command_fork_no_unwind stdout ----
ExitStatus(ExitStatus(139))
got=Ok(ExitStatus(ExitStatus(139)))
thread 'sys::unix::process::process_inner::tests::test_command_fork_no_unwind' panicked at 'assertion failed: signal == libc::SIGABRT || signal == libc::SIGILL', library/std/src/sys/unix/process/process_unix/tests.rs:21:5

139 is SIGSEGV with a coredump.

I could add SIGSEGV to the permitted list. But, is this really right? It seems odd. I think std::process::abort ought not to produce SIGSEGV, at least unless I am missing something. If this is expected then, fine, I'll add it to the list. But if it is not expected then I think there may be a problem with catch_unwind or abort or something.

@Mark-Simulacrum do you know if this is an expected result from abort ?

@jonas-schievink
Copy link
Contributor

@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 30, 2021
@ijackson ijackson marked this pull request as draft February 1, 2021 17:30
@ijackson
Copy link
Contributor Author

ijackson commented Feb 1, 2021

I want to reproduce this failure locally. Can someone help me with a build/install problem, which is blocking me?

I think I need a cross rustc targeting i686-unknown-linux-musl. I think if I manage to make a working rustc with that target I will be able to run the resulting binaries on my x86_64 laptop, so I will be able to debug the situation properly and decide if the SIGSEGV is expected.

But I had trouble finding how to build that cross toolchain. Rust's ./x.py wants a "musl root". Where would I get one of those? My distro (Debian) has some musl packages but they are in multiarch paths so not under a single root.

I looked for instructions in various places including general search engines and the in-tree README.md.

@ijackson
Copy link
Contributor Author

ijackson commented Feb 1, 2021

@rustbot modify labels +E-help-wanted +O-musl +A-rustbuild -T-libs -A-runtime

@rustbot rustbot added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) E-help-wanted Call for participation: Help is requested to fix this issue. O-musl Target: The musl libc and removed A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Feb 1, 2021
@Mark-Simulacrum
Copy link
Member

My recommendation is to test via the Docker container (./src/ci/docker/run.sh dist-i586-gnu-i586-i686-musl) it's possible you can reproduce the environment outside it, though, by inspecting the setup scripts called.

@ijackson
Copy link
Contributor Author

ijackson commented Feb 4, 2021

My recommendation is to test via the Docker container (./src/ci/docker/run.sh dist-i586-gnu-i586-i686-musl) it's possible you can reproduce the environment outside it, though, by inspecting the setup scripts called.

I managed to get a musl i686 build by inspecting the rules and doing stuff by hand. It worked just fine both before and after my change. I'm now wrestling docker.

@ijackson
Copy link
Contributor Author

ijackson commented Feb 5, 2021

Well, eventually my formal docker run with ./src/ci/docker/run.sh dist-i586-gnu-i586-i686-musl (as recommended by @Mark-Simulacrum) completed. Unfortunately, my new test case succeeded:

test sys::unix::process::process_inner::tests::test_command_fork_no_unwind ... ok

That's with 20e2172 which is tree-identical to the 73ae9d635b31311e980b2d8fad7277e384f6f2cb which bors built. (Strictly, I added one further change, to drop --rm from the arguments to docker in src/ci/docker/run.sh.)

So I apparently cannot reproduce this failure locally. I hesitate to suggest just trying it again since "random lossage" is really not a very convincing explanation. Is there some way to debug this in something more closely resembling the CI environment ?

@ijackson
Copy link
Contributor Author

ijackson commented Feb 5, 2021

OTOH this does give me confidence that the test case is correct not to consider SIGSEGV a pass.

So the fact that this segfaulted in the CI suggests a real bug. I doubt that's in my patch to the stdlib (which doesn't even introduce any new unsafe). It also seems to me that there is nothing in my test case which ought to cause UB. The ony unsafe is this:

        unsafe {
            c.pre_exec(|| panic!("crash now!"));
        }

So I am led to think there is a pre-existing bug in panic handling :-/

@ghost
Copy link

ghost commented Feb 5, 2021

I doubt that's in my patch to the stdlib (which doesn't even introduce any new unsafe). It also seems to me that there is nothing in my test case which ought to cause UB.

I think panicking from the child process is already UB (in a multithreaded program)?

The documentation of CommandExt::pre_exec() says:

This closure will be run in the context of the child process after a fork.

The fork(2) manual page says:

After a fork() in a multithreaded program, the child can safely call only async-signal-safe functions (see signal-safety(7)) until such time as it calls execve(2).

(EDIT: May not be relevant here - and the signal-safety(7) manual page explains why violating signal safety rules may cause UBs)

An async-signal-safe function is one that can be safely called from within a signal handler. Many functions are not async-signal-safe. In particular, nonreentrant functions are generally unsafe to call from a signal handler.

The kinds of issues that render a function unsafe can be quickly understood when one considers the implementation of the stdio library, all of whose functions are not async-signal-safe.

When performing buffered I/O on a file, the stdio functions must maintain a statically allocated data buffer along with associated counters and indexes (or pointers) that record the amount of data and the current position in the buffer. Suppose that the main program is in the middle of a call to a stdio function such as printf(3) where the buffer and associated variables have been partially updated. If, at that moment, the program is interrupted by a signal handler that also calls printf(3), then the second call to printf(3) will operate on inconsistent data, with unpredictable results.

I believe panicking or catching unwinding is not signal-safe, because std::panic::catch_unwind returns a Box if an unwinding is caught, which means they must allocate some memory, which is not signal-safe.

@ijackson
Copy link
Contributor Author

ijackson commented Feb 5, 2021

Hmmm.

I don't think I really agree. In C, malloc after fork is not, in general, UB. The spec says

If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.

When the application calls fork() from a signal handler and any of the fork handlers registered by pthread_atfork() calls a function that is not async-signal-safe, the behavior is undefined.

"Errors" is quite vague but my interpretation is that eg trying to acquire a stdio lock might fail because it was locked by a thread in the parent at the time of the fork. But maybe musl is more restrictive about this.

What can be done about this? I don't think we can really have a programming environment where panicking is UB.

Perhaps the right answer is to install an aborting panic hook so that we never unwind. Even with std::panic::set_hook, panic! would still go wrong if you asked it to format. I bet there are things in the stdlib that do that (eg, array bounds check). So perhaps some kind of special case with a private (or unstable) raw panic hook that doesn't require panic! to allocate before aborting.

[edited to link to SuS]

@ijackson
Copy link
Contributor Author

ijackson commented Feb 5, 2021

More digging found me these links:

https://lists.uclibc.org/pipermail/uclibc/2011-March/045130.html
https://wiki.strongswan.org/issues/990

The first one is from the musl authors. One striking statement there is that malloc after fork in a multithreaded program used to be specified to work but nowadays the libc is allowed to make it UB. I'm glad that I'm not going completely mad!

In practical terms it seems that at least for musl this is difficult to make work and not likely to be done any time soon, if at all. And the Rust stdlib should be conservative. So I think that means preventing any calls to malloc after fork. I'll see what I can do to ensure that at least in a plausible subset of cases (which I think has to include at least array bounds violations!).

Maybe I can also somehow nobble the global allocator to abort immediately. The footgun here is quite large...

@ijackson
Copy link
Contributor Author

ijackson commented Feb 7, 2021

I think I have succeeded. The result doesn't look much like this MR and I think it is probably more sensible to start afresh. So I will close this one and make a new MR shortly.

@ijackson ijackson closed this Feb 7, 2021
@ijackson ijackson deleted the fork-no-unwind branch February 7, 2021 18:12
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request May 7, 2021
Do not allocate or unwind after fork

### Objective scenarios

 * Make (simple) panics safe in `Command::pre_exec_hook`, including most `panic!` calls, `Option::unwrap`, and array bounds check failures.
 * Make it possible to `libc::fork` and then safely panic in the child (needed for the above, but this requirement means exposing the new raw hook API which the `Command` implementation needs).
 * In singlethreaded programs, where panic in `pre_exec_hook` is already memory-safe, prevent the double-unwinding malfunction rust-lang#79740.

I think we want to make panic after fork safe even though the post-fork child environment is only experienced by users of `unsafe`, beause the subset of Rust in which any panic is UB is really far too hazardous and unnatural.

#### Approach

 * Provide a way for a program to, at runtime, switch to having panics abort.  This makes it possible to panic without making *any* heap allocations, which is needed because on some platforms malloc is UB in a child forked from a multithreaded program (see rust-lang#80263 (comment), and maybe also the SuS [spec](https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html)).
 * Make that change in the child spawned by `Command`.
 * Document the rules comprehensively enough that a programmer has a fighting chance of writing correct code.
 * Test that this all works as expected (and in particular, that there aren't any heap allocations we missed)

Fixes rust-lang#79740

#### Rejected (or previously attempted) approaches

 * Change the panic machinery to be able to unwind without allocating, at least when the payload and message are both `'static`.  This seems like it would be even more subtle.  Also that is a potentially-hot path which I don't want to mess with.
 * Change the existing panic hook mechanism to not convert the message to a `String` before calling the hook.  This would be a surprising change for existing code and would not be detected by the type system.
 * Provide a `raw_panic_hook` function to intercept panics in a way that doesn't allocate.  (That was an earlier version of this MR.)

### History

This MR could be considered a v2 of rust-lang#80263.  Thanks to everyone who commented there.  In particular, thanks to `@m-ou-se,` `@Mark-Simulacrum` and `@hyd-dev.`  (Tagging you since I think you might be interested in this new MR.)  Compared to rust-lang#80263, this MR has very substantial changes and additions.

Additionally, I have recently (2021-04-20) completely revised this series following very helpful comments from `@m-ou-se.`

r? `@m-ou-se`
bors added a commit to rust-lang-ci/rust that referenced this pull request May 15, 2021
Do not allocate or unwind after fork

### Objective scenarios

 * Make (simple) panics safe in `Command::pre_exec_hook`, including most `panic!` calls, `Option::unwrap`, and array bounds check failures.
 * Make it possible to `libc::fork` and then safely panic in the child (needed for the above, but this requirement means exposing the new raw hook API which the `Command` implementation needs).
 * In singlethreaded programs, where panic in `pre_exec_hook` is already memory-safe, prevent the double-unwinding malfunction rust-lang#79740.

I think we want to make panic after fork safe even though the post-fork child environment is only experienced by users of `unsafe`, beause the subset of Rust in which any panic is UB is really far too hazardous and unnatural.

#### Approach

 * Provide a way for a program to, at runtime, switch to having panics abort.  This makes it possible to panic without making *any* heap allocations, which is needed because on some platforms malloc is UB in a child forked from a multithreaded program (see rust-lang#80263 (comment), and maybe also the SuS [spec](https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html)).
 * Make that change in the child spawned by `Command`.
 * Document the rules comprehensively enough that a programmer has a fighting chance of writing correct code.
 * Test that this all works as expected (and in particular, that there aren't any heap allocations we missed)

Fixes rust-lang#79740

#### Rejected (or previously attempted) approaches

 * Change the panic machinery to be able to unwind without allocating, at least when the payload and message are both `'static`.  This seems like it would be even more subtle.  Also that is a potentially-hot path which I don't want to mess with.
 * Change the existing panic hook mechanism to not convert the message to a `String` before calling the hook.  This would be a surprising change for existing code and would not be detected by the type system.
 * Provide a `raw_panic_hook` function to intercept panics in a way that doesn't allocate.  (That was an earlier version of this MR.)

### History

This MR could be considered a v2 of rust-lang#80263.  Thanks to everyone who commented there.  In particular, thanks to `@m-ou-se,` `@Mark-Simulacrum` and `@hyd-dev.`  (Tagging you since I think you might be interested in this new MR.)  Compared to rust-lang#80263, this MR has very substantial changes and additions.

Additionally, I have recently (2021-04-20) completely revised this series following very helpful comments from `@m-ou-se.`

r? `@m-ou-se`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-help-wanted Call for participation: Help is requested to fix this issue. O-musl Target: The musl libc S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet