Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for linux_pidfd #82971

Open
5 of 8 tasks
voidc opened this issue Mar 10, 2021 · 26 comments
Open
5 of 8 tasks

Tracking Issue for linux_pidfd #82971

voidc opened this issue Mar 10, 2021 · 26 comments
Labels
C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. F-linux_pidfd `#![feature(linux_pidfd)]` T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@voidc
Copy link
Contributor

voidc commented Mar 10, 2021

Feature gate: #![feature(linux_pidfd)]

This is a tracking issue for Linux-specific extension methods allowing to obtain process file descriptors for processes spawned with the standard Command API.

Public API

// std::os::linux::process

pub struct PidFd;

impl PidFd {
   pub fn kill(&self) -> Result<()> {...}
   pub fn wait(&self) -> Result<ExitStatus> {...}
   pub fn try_wait(&self) -> Result<Option<ExitStatus>> {...}
}

impl AsRawFd for PidFd {}
impl FromRawFd for PidFd {}
impl IntoRawFd for PidFd {}

// sealed trait, implemented for std::process::Child
pub trait ChildExt {
    fn pidfd(&self) -> Result<&PidFd>;
    fn into_pidfd(self) -> Result<PidFd, Self>;
}

// sealed trait, implemented for std::process::Command
pub trait CommandExt {
    fn create_pidfd(&mut self, val: bool) -> &mut process::Command;
}

Steps / History

Unresolved Questions

  • How can we properly open a pidfd for the new process? The current implementation using a manual clone3 means we can't safely call libc in the child: cargo 1.56 beta hang when run inside Gentoo's sandbox #89522 (comment)
    • pidfd_open may work, but it has conditions on avoiding pid-recycling races.
  • should Child::pidfd(&self) be removed? It can lead to Child::wait returning errors instead of a saved exit status if PidFd::wait obtains the exit status first, which may be surprising behavior.
@voidc voidc added C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Mar 10, 2021
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Mar 14, 2021
Add Linux-specific pidfd process extensions (take 2)

Continuation of rust-lang#77168.
I addressed the following concerns from the original PR:

- make `CommandExt` and `ChildExt` sealed traits
- wrap file descriptors in `PidFd` struct representing ownership over the fd
- add `take_pidfd` to take the fd out of `Child`
- close fd when dropped

Tracking Issue: rust-lang#82971
@the8472
Copy link
Member

the8472 commented Mar 16, 2021

The PR adds support for obtaining PidFds, but you can't actually do anything with them. Do we want additional methods on PidFd in std to wait, send signals, obtain the /proc directory etc. or should that be left to 3rd party crates?

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Mar 18, 2021
Add Linux-specific pidfd process extensions (take 2)

Continuation of rust-lang#77168.
I addressed the following concerns from the original PR:

- make `CommandExt` and `ChildExt` sealed traits
- wrap file descriptors in `PidFd` struct representing ownership over the fd
- add `take_pidfd` to take the fd out of `Child`
- close fd when dropped

Tracking Issue: rust-lang#82971
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Mar 18, 2021
Add Linux-specific pidfd process extensions (take 2)

Continuation of rust-lang#77168.
I addressed the following concerns from the original PR:

- make `CommandExt` and `ChildExt` sealed traits
- wrap file descriptors in `PidFd` struct representing ownership over the fd
- add `take_pidfd` to take the fd out of `Child`
- close fd when dropped

Tracking Issue: rust-lang#82971
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Mar 19, 2021
Add Linux-specific pidfd process extensions (take 2)

Continuation of rust-lang#77168.
I addressed the following concerns from the original PR:

- make `CommandExt` and `ChildExt` sealed traits
- wrap file descriptors in `PidFd` struct representing ownership over the fd
- add `take_pidfd` to take the fd out of `Child`
- close fd when dropped

Tracking Issue: rust-lang#82971
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Mar 26, 2021
Add Linux-specific pidfd process extensions (take 2)

Continuation of rust-lang#77168.
I addressed the following concerns from the original PR:

- make `CommandExt` and `ChildExt` sealed traits
- wrap file descriptors in `PidFd` struct representing ownership over the fd
- add `take_pidfd` to take the fd out of `Child`
- close fd when dropped

Tracking Issue: rust-lang#82971
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 1, 2021
Add Linux-specific pidfd process extensions (take 2)

Continuation of rust-lang#77168.
I addressed the following concerns from the original PR:

- make `CommandExt` and `ChildExt` sealed traits
- wrap file descriptors in `PidFd` struct representing ownership over the fd
- add `take_pidfd` to take the fd out of `Child`
- close fd when dropped

Tracking Issue: rust-lang#82971
@cuviper
Copy link
Member

cuviper commented Oct 15, 2021

I added an unresolved question about using clone3, given what we found in #89522.

@the8472
Copy link
Member

the8472 commented Oct 15, 2021

Could we use clone instead of clone3?

From #77168 (comment)

The reason that I used __clone3 is that glibc's clone() implementation requires the caller to allocate a stack for the new process (even though the kernel allows passing a null stack pointer to the raw clone syscall). We could allocate a stack ourself, but I think it would be better to avoid that if at all possible.

@richfelker
Copy link

It's not safe to call either clone syscall directly. It's possible that the clone function handles this correctly and makes a consistent child state, but this isn't properly documented anywhere, requires the function to be aware of the argument semantics, and does not seem to be happening in practice. It's an open question on musl what we should do with this; we'll probably end up making it work like fork eventually, but it's in limbo for now. In short: don't use clone. Prefer posix_spawn, and use fork and execve when you can't.

@the8472
Copy link
Member

the8472 commented Oct 15, 2021

In short: don't use clone. Prefer posix_spawn, and use fork and execve when you can't.

The issue is that neither of those approaches provides a pidfd atomically.

@cuviper mentions pidfd_open as an option but I don't think we can uphold its requirements for race-freedom. Practically speaking using it will still be an improvement over using pids but it still undermines one of the promises of pidfds.

@richfelker
Copy link

What atomicity requirements would you be concerned about? As the parent of the child you just created, you are the one process who absolutely can know, with no ambiguity due to identifier reuse races, that the pid refers to the child process you created. The only way this can be undermined is by the process's own bad behavior, e.g. calling wait rather than waitpid with a pid (which is inherently a resource-lifetime-unsafe operation, so don't do that).

@detly
Copy link

detly commented Apr 22, 2022

Ah thanks, I had completely overlooked that. Indeed the stdlib docs even mention that as a potential exhaustion risk. Oops.

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Sep 22, 2022
…se-in-std, r=Mark-Simulacrum

Remove use of `io::ErrorKind::Other` in std

The documentation states that this `ErrorKind` is not used by the standard library. Instead, `io::ErrorKind::Uncategorized` should be used.

The two instances are in the unstable API [linux_pidfd](rust-lang#82971).
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Sep 22, 2022
…se-in-std, r=Mark-Simulacrum

Remove use of `io::ErrorKind::Other` in std

The documentation states that this `ErrorKind` is not used by the standard library. Instead, `io::ErrorKind::Uncategorized` should be used.

The two instances are in the unstable API [linux_pidfd](rust-lang#82971).
thomcc pushed a commit to tcdi/postgrestd that referenced this issue Feb 10, 2023
…, r=Mark-Simulacrum

Remove use of `io::ErrorKind::Other` in std

The documentation states that this `ErrorKind` is not used by the standard library. Instead, `io::ErrorKind::Uncategorized` should be used.

The two instances are in the unstable API [linux_pidfd](rust-lang/rust#82971).
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 9, 2023
…imulacrum

open pidfd in child process and send to the parent via SOCK_SEQPACKET+CMSG

This avoids using `clone3` when a pidfd is requested while still getting it in a 100% race-free manner by passing it up from the child process.
This should solve most concerns in rust-lang#82971
@the8472
Copy link
Member

the8472 commented Aug 9, 2023

#113939 solves the clone3 issue by replacing it with pidfd_open in the child and then transferring the file descriptor.
With that create_pidfd should now be usable in more environments.

@the8472
Copy link
Member

the8472 commented Nov 21, 2023

For anyone following along: with #117957 pidfds are now used by by a few more methods. And two bugs have been fixed.

@NobodyXu
Copy link
Contributor

The new pidfd_open + send over unix socket uses fork AFAIK, it would be great if it can use vfork for better perf and less memory usage.

@the8472
Copy link
Member

the8472 commented Nov 22, 2023

We have a posix_spawn path that uses vfork-like clone() under the hood. But that requires glibc to implement pidfd support in posix_spawn, which hasn't happend yet
Using vfork directly already would have issues, it's extremely tricky to use correctly and it doesn't call pthread_atfork handlers. Though I don't know if we guarantee that they're called on process spawning.
Using clone3 directly would be even more troublesome since we can't use glibc after it's called. We'd have to do raw syscalls.

If @joshtriplett gets his iouring_spawn work into the kernel then we could use that instead and avoid all those problems.

@NobodyXu
Copy link
Contributor

NobodyXu commented Nov 22, 2023

We have a posix_spawn path that uses vfork-like clone() under the hood. But that requires glibc to implement pidfd support in posix_spawn, which hasn't happend yet

Good to see that they have plan adding that!

Though to use it, I think Rust would have to bump the minimum glibc version?
Otherwise it would have to check for availability of the new functionality and fallback to existing implementation.

Using vfork directly already would have issues, it's extremely tricky to use correctly and it doesn't call pthread_atfork handlers. Though I don't know if we guarantee that they're called on process spawning.

Yes but I think it will be the most efficient implementation given that posix_spawn currently does not support pidfd.

Since posix_spawn internally uses clone on linux and vfork on unix, I think it's possible to write it in Rust.

Using clone3 directly would be even more troublesome since we can't use glibc after it's called. We'd have to do raw syscalls.

Thanks, now I see why this is a problem.

IIRC last time I tried calling clone syscall directly it's really painful, not just the arg passing, but also have to manually setup new stack.

If @joshtriplett gets his iouring_spawn work into the kernel then we could use that instead and avoid all those problems.

But it would require an io-uring to be initialised?

It would definitely be reasonable for async runtime like tokio to use it, but for stdlib to use it it would have a heavy initialisation cost to pay and not sure it's reasonable for all use cases?

@the8472
Copy link
Member

the8472 commented Nov 22, 2023

Since posix_spawn internally uses clone on linux and vfork on unix, I think it's possible to write it in Rust.

That's not necessarily true. glibc can use vfork more easily since it controls the libc state, like glibc can use clone3 internally but user code can't. At the very least in rust we'd have to use the unstable #[ffi_returns_twice] and there probably are other problems with it such as not touching anything upstack or global state.

But it would require an io-uring to be initialised?
It would definitely be reasonable for async runtime like tokio to use it, but for stdlib to use it it would have a heavy initialisation cost to pay and not sure it's reasonable for all use cases?

I assume that setting up a ring once and caching it isn't that expensive, especially in comparison to bringing up a whole process (consider all the stuff that happens after exec).

Anyway, using vfork is a separate topic. We should discuss this in a new issue or on zulip. The fork + exec code-path already exists and is used for everything that posix_spawn doesn't cover. So this isn't exactly a novel issue.

@NobodyXu
Copy link
Contributor

That's not necessarily true. glibc can use vfork more easily since it controls the libc state, like glibc can use clone3 internally but user code can't. At the very least in rust we'd have to use the unstable #[ffi_returns_twice] and there probably are other problems with it such as not touching anything upstack or global state.

Thanks I didn't know ffi_returns_twice is required.

I assume that setting up a ring once and caching it isn't that expensive, especially in comparison to bringing up a whole process (consider all the stuff that happens after exec).

True but I think that it would probably be better if it can be configured and is optional.

@NobodyXu
Copy link
Contributor

The PR adds support for obtaining PidFds, but you can't actually do anything with them. Do we want additional methods on PidFd in std to wait, send signals, obtain the /proc directory etc. or should that be left to 3rd party crates?

I just opened tokio-rs/tokio#6281 and realized that Pidfd has no method.

IMO having methods to wait and send signals is indeed useful, while other more exotic ones (e.g. process_madvice) is better left for third-party crates

@NobodyXu
Copy link
Contributor

NobodyXu commented Feb 1, 2024

glibc just adds support for pidfd_spawn in glibc 2.39.

Once rust bumps minimum glibc ton2.39, I think we can use it instead, which will probably enable use of vfork again.

@cuviper
Copy link
Member

cuviper commented Feb 1, 2024

We can use it as a weak symbol even before we ever raise our minimum that far, just like we do for spawning with chdir.

@the8472
Copy link
Member

the8472 commented Feb 7, 2024

Alas, the released pidfd_spawn API lacks functionality that clone3 has, namely the ability to return pid and pidfd at the same time. And pidfd_getpid requires procfs. This can be worked around by pre-opening /proc/self before trying to use the new API but it does add some complexity and additional sources of failure.

@traviscross traviscross added the F-linux_pidfd `#![feature(linux_pidfd)]` label Jun 13, 2024
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Jun 21, 2024
Add PidFd::{kill, wait, try_wait}

rust-lang#117957 changed `Child` kill/wait/try_wait to use its pidfd instead of the pid, when one is available.
This PR extracts those implementations and makes them available on `PidFd` directly.

The `PidFd` implementations differ significantly from the corresponding `Child` methods:

* the methods can be called after the child has been reaped, which will result in an error but will be safe. This state is not observable in `Child` unless something stole the zombie child
* the `ExitStatus` is not kept, meaning that only the first time a wait succeeds it will be returned
* `wait` does not close stdin
* `wait` only requires `&self` instead of `&mut self` since there is no state to maintain and subsequent calls are safe

Tracking issue: rust-lang#82971
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jun 21, 2024
Add PidFd::{kill, wait, try_wait}

rust-lang#117957 changed `Child` kill/wait/try_wait to use its pidfd instead of the pid, when one is available.
This PR extracts those implementations and makes them available on `PidFd` directly.

The `PidFd` implementations differ significantly from the corresponding `Child` methods:

* the methods can be called after the child has been reaped, which will result in an error but will be safe. This state is not observable in `Child` unless something stole the zombie child
* the `ExitStatus` is not kept, meaning that only the first time a wait succeeds it will be returned
* `wait` does not close stdin
* `wait` only requires `&self` instead of `&mut self` since there is no state to maintain and subsequent calls are safe

Tracking issue: rust-lang#82971
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 21, 2024
Add PidFd::{kill, wait, try_wait}

rust-lang#117957 changed `Child` kill/wait/try_wait to use its pidfd instead of the pid, when one is available.
This PR extracts those implementations and makes them available on `PidFd` directly.

The `PidFd` implementations differ significantly from the corresponding `Child` methods:

* the methods can be called after the child has been reaped, which will result in an error but will be safe. This state is not observable in `Child` unless something stole the zombie child
* the `ExitStatus` is not kept, meaning that only the first time a wait succeeds it will be returned
* `wait` does not close stdin
* `wait` only requires `&self` instead of `&mut self` since there is no state to maintain and subsequent calls are safe

Tracking issue: rust-lang#82971
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 22, 2024
Add PidFd::{kill, wait, try_wait}

rust-lang#117957 changed `Child` kill/wait/try_wait to use its pidfd instead of the pid, when one is available.
This PR extracts those implementations and makes them available on `PidFd` directly.

The `PidFd` implementations differ significantly from the corresponding `Child` methods:

* the methods can be called after the child has been reaped, which will result in an error but will be safe. This state is not observable in `Child` unless something stole the zombie child
* the `ExitStatus` is not kept, meaning that only the first time a wait succeeds it will be returned
* `wait` does not close stdin
* `wait` only requires `&self` instead of `&mut self` since there is no state to maintain and subsequent calls are safe

Tracking issue: rust-lang#82971
@brauner
Copy link

brauner commented Jun 24, 2024

Alas, the released pidfd_spawn API lacks functionality that clone3 has, namely the ability to return pid and pidfd at the same time. And pidfd_getpid requires procfs. This can be worked around by pre-opening /proc/self before trying to use the new API but it does add some complexity and additional sources of failure.

I'd be open to merging a patch that extends pidfs by adding an ioctl() that returns the pid associated with the pidfd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. F-linux_pidfd `#![feature(linux_pidfd)]` T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants