Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::os::unix::process::CommandExt::before_exec should be unsafe #39575

Closed
fweimer opened this issue Feb 6, 2017 · 74 comments · Fixed by #58059
Closed

std::os::unix::process::CommandExt::before_exec should be unsafe #39575

fweimer opened this issue Feb 6, 2017 · 74 comments · Fixed by #58059
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-medium Medium priority T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@fweimer
Copy link

fweimer commented Feb 6, 2017

The before_exec method should be marked unsafe because after a fork from a multi-threaded process, POSIX allows only allows async-signal-safe functions to be called. Otherwise, the result is undefined behavior.

The only way to enforce this in Rust is to mark the function as unsafe.

So to be clear, this should not compile:

use std::process::Command;
use std::os::unix::process::CommandExt;

fn main() {
    let status = Command::new("sh")
        .arg("-c")
             .arg("echo world")
        .before_exec(|| { println!("hello"); Ok(()) })
        .status().unwrap();
    println!("status: {}", status);
}
@alexcrichton
Copy link
Member

Could you clarify the unsafety that can happen here? We chose to stabilize this function on the grounds that we couldn't come up with any memory safety reasons this would cause problems, just deadlocks and such.

@fweimer
Copy link
Author

fweimer commented Feb 6, 2017

It's probably best to treat anything that is undefined according to POSIX as a memory safety violation. Once POSIX says that you cannot assume anything about the process's behavior, you need to assume that memory safety is violated as well.

I'll try to give two examples why this facility is unsafe in practice. I have to argue based on the glibc implementation, but we can hopefully assume that it is a valid implementation, matching the POSIX requirements in play here.

First, assume that you have a PRNG for use for cryptography written in Rust. This PRNG uses locks or thread-local variables to protect its internal state. It does not have fork protection because there is no way to implement that in Rust. This PRNG will return the same sequence of bytes in the before_exec callback and the parent thread, violating the expectations associated with a PRNG. Maybe this does not count as a memory safety violation, but I think it is still a problem for PRNGs.

For the second problem, we look at the implementation of recursive mutexes in libpthread. They embed a PID/TID, a small numeric value which gets reused over time. Let's look at this sequence of events:

  1. A thread in the parent process acquires a recursive mutex.
  2. Another thread invokes an external command with a before_exec callback.
  3. The before_exec callback invokes another external command with another before_exec callback.
  4. The initial parent process exits (so the thread in step 1 ceases to exist and its TID can be reused).
  5. The second before_exec callback spawns a new thread. The TID of that thread happens to match the TID of the thread in step 1.
  6. This new thread attempts to acquire the recursive mutex which has been acquired in step 1.

The attempt in step 6 will succeed because libpthread incorrectly assumes that the same thread tried to acquire the mutex. But the process was copied while the thread in step 1 was holding the mutex, so the data protected by it may not be consistent, and the thread in steps 5 and 6 may access intermediate results which are not supposed to be visible. This is a thread safety violation, and hence a memory safety violation.

I think the second example at least is quite compelling.

@alexcrichton
Copy link
Member

That'd do it!

@Mark-Simulacrum
Copy link
Member

This is not marked as unsafe today and to do so would be backwards incompatible; however, a quick search on GitHub didn't seem to turn up any code that uses it.

@Mark-Simulacrum Mark-Simulacrum added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels May 20, 2017
@brson brson added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label May 23, 2017
@alexcrichton
Copy link
Member

The libs team discussed this a few weeks ago at the last triage and the conclusion was that we'd prefer to see concrete evidence (e.g. a proof-of-concept) showing the memory unsafety here. At that point we'll consider remediations but for now we're fine just adding some documentation that "care should be taken"

@Stebalien
Copy link
Contributor

In general, anything involving files may not behave as expected.

For example, a concrete memory safety issue would be a memory mapped allocator and assumes that the memory mapped file can't be accessed by multiple processes (e.g., by creating a file with the O_TMPFILE on Linux or enforced through some other means (e.g., by contract)).

Other shenanigans include:

  • A database that locks the underlying file and thus assumes that there is at most one writer can end up corrupting the database.
  • Loggers can end up writing corrupted logs.
  • Values can be dropped twice per constructor call (although the values will be dropped by different processes).

@bstrie
Copy link
Contributor

bstrie commented Jun 8, 2017

Not to denigrate theoretical concerns (which is to say, just because one hasn't yet demonstrated that this violates memory safety doesn't mean that we're going to reject this out-of-hand, and it doesn't mean that it won't eventually become a severe bug somehow), but I second Alex's desire to see a working proof-of-concept in code.

@steveklabnik
Copy link
Member

@Stebalien
Copy link
Contributor

I honestly don't have time to write a convincing POC but this breaks all wayland implementations (assuming memory corruption counts as a safety bug) because wayland clients and servers communicate by memory mapped files. Any attempt (accidental or otherwise) to update a wayland buffer from within a before_exec callback might corrupt the shared buffer. Now, this can be mitigated by taking a global lock and then checking the PID before writing to memmapped buffers but that's not something one wants to do every frame.


@steveklabnik

Those aren't quite the same:

  1. They're more useful than before_exec (a reason to make an exception).
  2. The unsafety only shows up when doing FFI.

Unlike those cases, this affects a fundamental rust concept: move/copy semantics. Without before_exec (the ability to fork without unsafe code), it's impossible to (safely) copy a value of a type that doesn't implement Copy. With before_exec, this guarantee is relaxed by allowing copies (in safe code) as long as those copies reside in different processes. This prevents processes from relying on any form of "one-constructor-call -> at-most-one-value" guarantee.


As before_exec can be used to emulate fork, these threads also apply:

@bstrie
Copy link
Contributor

bstrie commented Jun 20, 2017

@alexcrichton I like to see all soundness issues with an associated priority tag, does your comment above imply that this is considered P-low? Alternatively, if you don't consider this a soundness issue at all, would you like to remove the I-unsound label?

@alexcrichton
Copy link
Member

Ah yes the libs team decided this was basically P-medium, so I will tag as such.

@alexcrichton alexcrichton added the P-medium Medium priority label Jun 20, 2017
@briansmith
Copy link
Contributor

Even if there were no way to trigger memory unsafety today, at any time a change to libpthreads or libc or the operating system in the future could cause memory unsafety where there was previously none. Accordingly, I don't think blocking the change on a PoC makes sense, and further I think requiring somebody to make a PoC before the change is made would be a waste of those people's time. Any time we trigger undefined behavior, memory unsafety should just be assumed.

@Kixunil
Copy link
Contributor

Kixunil commented Jul 7, 2017

BTW have anyone considered that on macOS if an application is using libdispatch, it must not call any libdispatch code after fork? I don't have PoC now but I think it'd not be difficult to create.

@bstrie
Copy link
Contributor

bstrie commented Oct 19, 2017

I concur with the arguments here that this is a different beast than remove_var and set_var. Thus far we have chosen to always interpret specified undefined behavior as memory unsafety, demonstrable or otherwise. It would be a unique exception to our policy of "all memory unsafety may be traced back to an unsafe block" to add "(...unless using std::os::unix::process::CommandExt::before_exec)". The longer we wait, the more painful this will be. I propose that we begin emitting a warning ASAP with a plan to mark this unsafe at a specified future date.

@vi
Copy link
Contributor

vi commented Nov 5, 2017

With some OS help one can always cause memory unsafety:

#![feature(getpid)]

fn hacky_ptr_write<T>(ptr: &T, value: u32) {
    std::process::Command::new("gdb").arg("-batch").arg("-quiet")
        .arg("-pid").arg(format!("{}", std::process::id())).arg("-ex")
        .arg(format!("set {{unsigned long}}{:p}={}",ptr,value))
        .output() .unwrap();
}
fn main() {
    let q = &mut Box::new(55);
    hacky_ptr_write(&q, 0);
    *q = Box::new(44); // Segmentation fault
}

But this does not mean Command should be unsafe.

@Stebalien
Copy link
Contributor

@vi

The problem here is that the type-based assumption that a non-Copy type can't be copied is broken (even though the copies end up in different processes). Any code relying on this assumption is potentially incorrect and any unsafe code relying on it is potentially memory unsafe. However, given the "different process" constraint, this bug necessarily involves the operating system.

Yes, you can usually shoot yourself in the foot using the OS. However, you generally have to explicitly request this behavior. In this case, you can take two apparently safe and independent APIs (e.g., wayland and before_exec) and combine them to corrupt memory.

It's actually harder than you might think as long as you aren't root. Most modern Linux distros, at least, don't allow non-root processes to debug (or access/modify the memory of) non-child processes.

@vi
Copy link
Contributor

vi commented Nov 5, 2017

I expect there to be some unsafe{} inside that Wayland code. Can one reproduce the unsafety without using non-std unsafe{} code?

Does the question boil down to "Should every unsafe{} block also expect forks or should the fork itself just be be unsafe"?

non-Copy type can't be copied

During fork() I expect it to stay on the same virtual address, so that it is as if not copied. !Copy from Rust perspective does not mean non-copyable from OS kernel perspective.

@Stebalien
Copy link
Contributor

I expect there to be some unsafe{} inside that Wayland code.

Yes. It memory maps a shared buffer.

Can one reproduce the unsafety without using non-std unsafe{} code?

Not unless you were to move the shared mmap functionality from wayland into std. It's not really about where the code lives but what are valid assumptions and what are not (although any assumptions made by std are assumed to be valid).

Does the question boil down to "Should every unsafe{} block also expect forks or should the fork itself just be be unsafe"?

Pretty much, yes. If we do say "this API is fine", we should also provide a fork function (the same way we made mem::forget safe).

During fork() I expect it to stay on the same virtual address, so that it is as if not copied. !Copy from Rust perspective does not mean non-copyable from OS kernel perspective.

From an outside perspective, the value is copied and:

  1. Will be dropped twice.
  2. Will share any os-level objects (e.g. file descriptors).

@vi
Copy link
Contributor

vi commented Nov 5, 2017

From an outside perspective, the value is copied and:

Will be dropped twice.
Will share any os-level objects (e.g. file descriptors).

Just what is typically needed. Consider a C program:

#include <stdlib.h>
#include <unistd.h>

int main() {
    int fd = open("myfile.txt", O_RDONLY);
    void* mem = malloc(300);
    if (fork()) {
        close(fd); // first "drop" of fd
        free(mem); // first "drop" of mem
    } else {
        close(fd); // second "drop" of fd
        free(mem); // second "drop" of mem
    }
}

There are two frees, but it's not that "double free" error. Likewise file descriptors should be closed twice after a fork.

Unsafety of fork may be like unsafety of FromRawFd - not directly messing with memory, just confusing other parts of system with surprises, not causing unsafety from within Rust, but causing operating system to behave unsafely for Rust program. And it's workaroundable by std::fs::File::open("/proc/self/fd/0") or something like that.

Relationship between processes after fork() is IPC. Shared memory buffers is also a sort of IPC. Triggering program crash by using IPC is unsafety of the program...

What exactly happens in Wayland if evil peer (or even evil Wayland server) deliberately corrupts our buffer? Shound't Rust program not trust memory-mapped buffers that are writable from outside anyway? I expect storing pointers inside a memory-mapped file is unsafe anyways. And if no pointers getting stored there then it would be wrong, but safe behaviour.

@vi
Copy link
Contributor

vi commented Nov 5, 2017

Not unless you were to move the shared mmap functionality from wayland into std.

What function? map? But it's unsafe anyways. Also it gives you [u8], and those bytes can jump around unpredictably because of it's IPC and somebody may be interfering. Storing safety-important things in that [u8] is your decision and that's where the wrong assumption ("when I get mapping, it should be safe to work with") kicks in.

@Diggsey
Copy link
Contributor

Diggsey commented Dec 16, 2017

There's also this comment from the implementation of Command:

    // Currently we try hard to ensure that the call to `.exec()` doesn't
    // actually allocate any memory. While many platforms try to ensure that
    // memory allocation works after a fork in a multithreaded process, it's
    // been observed to be buggy and somewhat unreliable, so we do our best to
    // just not do it at all!

If this is true it seems pretty clear that this function should be considered unsafe. If it's not true then there's a whole lot of unnecessary and complex unsafe code in the standard library.

@RalfJung
Copy link
Member

So crater found 32 root regressions. Looks like if we decide we want to take a stanza for tracking ownership of external resources, we'd have to go the gradual-deprecation route.

@alexcrichton
Copy link
Member

Thanks for gathering the data @RalfJung! This is a nominated issue for the next T-libs meeting where we can try to discuss and discover a new name here. Do others have suggestions for what to call this method alternatively? Some ideas I might have are:

  • before_exec2
  • pre_exec
  • after_fork

@Centril
Copy link
Contributor

Centril commented Nov 25, 2018

Nominating for discussion on next T-lang meeting which is on the 29th.
If @RalfJung and @alexcrichton can participate that'd be good.

Ralf couldn't participate on the 29th.

@Centril
Copy link
Contributor

Centril commented Nov 25, 2018

My preference is to deprecate the method temporarily with:

#[deprecated =
    "This method will become unsafe because $reason.\
     Check whether what you are doing is safe and then wrap the call in `unsafe { ... }`."]

And then we can let say 3-6 months pass while we file fixing PRs; after that we make before_exec into unsafe fn.

I don't mind having an extra method; but I would like before_exec to eventually become unsafe fn.
As for naming, all of the mentioned candidates seem fine; another potential name could be before_exec_unchecked.

@alexcrichton
Copy link
Member

This was discussed briefly with the libs team during today's triage, and there was no objection to moving forward with a deprecation. The leading contender for a name was before_exec2 which clearly indicates it's "basically the same" as the prior method modulo a few small details (in this case the safety).

Moving this issue forward largely just needs someone to champion the PR and implementation to push it through!

@mitsuhiko
Copy link
Contributor

Not my call to make but I really do not like the idea of just incrementing the count on methods. This made a lot of code in Python just very ugly to begin with.

If you want to find some better names here are similar APIs in other languages: function with the same behavior in python is called prexec_fn for subprocesses, in boost it's called on_exec_setup. In glib the closest equivalent is GSpawnChildSetupFunc (setup_func).

@Diggsey
Copy link
Contributor

Diggsey commented Nov 28, 2018

If editions could be used to solve this in the future, that would be a huge benefit. (I realise this is difficult given that libstd has to only be compiled once, but perhaps there is a way to attach some metadata to the function signature so that the compiler can do the mapping).

eg. an attribute like:

#[rename("before_exec", edition=2018)]
fn before_exec2(...)

@Centril
Copy link
Contributor

Centril commented Nov 28, 2018

@Diggsey ostensibly you could do that with "edition visibility", e.g. pub(>= 2018) and pub(<2018) as an unstable mechanism. I'd like to use some mechanism like that to say e.g. pub(< 2021) fn uninitialized<T>() -> T.

@RalfJung
Copy link
Member

The leading contender for a name was before_exec2 which clearly indicates it's "basically the same" as the prior method modulo a few small details (in this case the safety).

This makes sense for someone coming from today's Rust, but in a future where before_exec is not used at all any more, many people will be confused by what the 2 means. Basically, before_exec2 means we will always have to explain the name as "successor of a now-deprecated function".

@oli-obk
Copy link
Contributor

oli-obk commented Nov 29, 2018

We can have "soft unsafe" operations which only produce a (deny by default) lint that states that one should be wrapping it in unsafe, but also mention that the only reason it's a lint is because of backwards compatibility.

Implementing such a scheme in the unsafe checker is quite simple. I am volunteering to write a PR if such a change is desired (or if we just want to see how such a change would look).

@vi
Copy link
Contributor

vi commented Nov 29, 2018

"soft unsafe"

Like what happens when taking a reference of a field of a repr(packed) struct?

@alexcrichton
Copy link
Member

I think there's definitely enough valid pushback to not call it before_exec2! I like preexec or maybe on_exec_setup myself perhaps? (I don't feel too strongly one way or another).

@oli-obk I think I'd personally prefer to see a different function, because there's also the matter of creating a function pointer to this method which today is safe and creates a safe function pointer, but aftewards would need to ideally be safe and create an unsafe function pointer but would in practice have to unsafely create a safe function pointer

@joshtriplett
Copy link
Member

The language team agreed that before_exec should be unsafe, and leaves the details of a transition plan to the libs team.

@joshtriplett
Copy link
Member

Speaking only for myself: I like pre_exec.

@nikomatsakis
Copy link
Contributor

I do feel like there is a meta question lurking here of unsafe composition and how we should manage it. For now, it seems good to err on the side of caution where possible, though, and avoid thorny questions -- but -- as has been amptly demonstrated here -- it's sort of hard to tell what the limits ought to be on what unsafe code can and cannot do when it comes to e.g. external resources. I wonder if at some point we're going to want to try and allow unsafely implemented libraries to declare more precisely the kinds of things they rely on other libraries not to do.

Centril added a commit to Centril/rust that referenced this issue Feb 20, 2019
deprecate before_exec in favor of unsafe pre_exec

Fixes rust-lang#39575

As per the [lang team decision](rust-lang#39575 (comment)):

> The language team agreed that before_exec should be unsafe, and leaves the details of a transition plan to the libs team.

Cc @alexcrichton @rust-lang/libs how would you like to proceed?
Centril added a commit to Centril/rust that referenced this issue Feb 22, 2019
deprecate before_exec in favor of unsafe pre_exec

Fixes rust-lang#39575

As per the [lang team decision](rust-lang#39575 (comment)):

> The language team agreed that before_exec should be unsafe, and leaves the details of a transition plan to the libs team.

Cc @alexcrichton @rust-lang/libs how would you like to proceed?
Centril added a commit to Centril/rust that referenced this issue Feb 22, 2019
deprecate before_exec in favor of unsafe pre_exec

Fixes rust-lang#39575

As per the [lang team decision](rust-lang#39575 (comment)):

> The language team agreed that before_exec should be unsafe, and leaves the details of a transition plan to the libs team.

Cc @alexcrichton @rust-lang/libs how would you like to proceed?
@mahkoh
Copy link
Contributor

mahkoh commented May 24, 2020

Here is a simple example showing memory unsafety in the presence of fork:

#![feature(thread_id_value)]

use std::sync::atomic::{AtomicU64, AtomicPtr};
use std::sync::atomic::Ordering::SeqCst;
use std::{thread, ptr};

pub fn f() {
    static I: AtomicU64 = AtomicU64::new(0);
    static D: AtomicPtr<i32> = AtomicPtr::new(ptr::null_mut());

    let data = Box::leak(Box::new(0));
    let tid = thread::current().id().as_u64().get();
    match I.load(SeqCst) {
        0 => {
            I.store(tid, SeqCst);
            /*
             * Assumption for safety: No call to `f` can have the same `tid` while we're
             * in the critical section that spans this comment.
             */
            D.store(data, SeqCst);
        },
        n if n == tid => {
			/* D has been set to a valid pointer because of the assumption above */
            let _v = unsafe { *D.load(SeqCst) };
        },
        _ => { },
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-medium Medium priority T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.