syscall: add PidFD, CgroupFD, and UseCgroupFD options for Linux clone to SysProcAttr #51246

kolyshkin · 2022-02-17T22:00:45Z

Linux 5.3 added a new system call, clone3(), which provides a superset of the functionality of the older clone(). In particular, it adds a way for a child to be spawned in a different cgroup, which solves a number of issues (see CLONE_INTO_CGROUP flag in clone3(2) man page for details).

It would be great for some software written in Go to benefit from this new functionality. Obviously, calling clone3() syscall directly won't work since Go runtime needs to perform specific steps around fork and exec.

So, the support for clone3 needs to be in the syscall package, where clone() is called.

Looks like this can be implemented by amending linux SysProcAttr structure with a pointer to CloneArgs (a new structure that mirrors that of C.clone_args). Then, forkAndExecInChild would use clone3 instead of clone if this pointer is non-nil. This can be used from e.g. os/exec can set cmd.SysProcArgs.CloneArgs.

The text was updated successfully, but these errors were encountered:

cuiweixie · 2022-02-19T00:24:37Z

i think this kernel version is too new, the minimal kernel version is still 2.6.23。

kolyshkin · 2022-02-19T00:47:45Z

think this kernel version is too new, the minimal kernel version is still 2.6.23。

This is not a proposal to switch to the new syscall, this is a proposal to make its functionality available to userspace.

OneOfOne · 2022-02-20T18:49:10Z

i think this kernel version is too new, the minimal kernel version is still 2.6.23。

Just to add perspective, 5.3 came out in Sept 2019.

rsc · 2022-02-23T18:53:12Z

Retitled. There are three parts to this:

Add CloneArgs struct
Add CloneArgs *CloneArgs field to SysProcAttr
When SysProcAttr.CloneArgs is set, use clone3 during syscall.ForkExec.

All of this seems unobjectionable.

rsc · 2022-02-23T19:59:43Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

prattmic · 2022-02-23T20:17:33Z

Is the intention for SysProcAttr.CloneArgs to be the entirety of the full C clone_args structure?

struct clone_args {
  u64 flags;        /* Flags bit mask */
  u64 pidfd;        /* Where to store PID file descriptor (int *) */
  u64 child_tid;    /* Where to store child TID, in child's memory (pid_t *) */
  u64 parent_tid;   /* Where to store child TID, in parent's memory (pid_t *) */
  u64 exit_signal;  /* Signal to deliver to parent on child termination */
  u64 stack;        /* Pointer to lowest byte of stack */
  u64 stack_size;   /* Size of stack */
  u64 tls;          /* Location of new TLS */
  u64 set_tid;      /* Pointer to a pid_t array (since Linux 5.5) */
  u64 set_tid_size; /* Number of elements in set_tid (since Linux 5.5) */
  u64 cgroup;       /* File descriptor for target cgroup of child (since Linux 5.7) */
};

This is a superset of the arguments to clone(2), and it seems like a significant departure from the existing SysProcAttr to allow unfettered access, rather than providing specific desired features.

Many of these arguments cannot be safely set by Go code. e.g., setting stack, tls, or certain flags is likely to cause the child (or parent!) to crash before it can complete syscall.ForkExec.

As an alternative, we could add specific useful features enabled by this syscall. e.g.,

SysProcAttr.PidFD *int which is assigned to the pidfd of the child.
SysProcAttr.CgroupFD *int which the fd for target cgroup of child.

kolyshkin · 2022-02-23T22:56:19Z

Is the intention for SysProcAttr.CloneArgs to be the entirety of the full C clone_args structure?

Definitely not.

Indeed, this proposal can be reduced to adding PidFD and CgroupFD, and making use of clone3 CgroupFD is set (note that PidFD can be obtained via clone(2) already, so it's not clear if the code should switch to using clone3 in this case).

I fully agree with your analysis, and should have proposed what you had.

(Not really related to the proposal, but as far as the implementation goes, my biggest roadblock is obsoleted code for generating per-arch syscall numbers in src/syscall. The files that are supposed to be auto-generated are now changed manually so I am not exactly sure how to add a new syscall number.)

prattmic · 2022-02-24T17:43:13Z

(Not really related to the proposal, but as far as the implementation goes, my biggest roadblock is obsoleted code for generating per-arch syscall numbers in src/syscall. The files that are supposed to be auto-generated are now changed manually so I am not exactly sure how to add a new syscall number.)

Indeed, our system call code is a mess. We hope to clean it up (see #51087 and #15282), but right now it is a pain to change.

prattmic · 2022-02-24T17:48:32Z

Indeed, this proposal can be reduced to adding PidFD and CgroupFD.

Thanks. In some sense this is really two proposals, I'm not sure if we should split it up.

Following the API we came to in #47049 (comment), I'll adjust the proposed API to adding the following to SysProcAttr:

PidFD *int // set to pid FD of child if not nil
UseCgroup bool
Cgroup int // fd of cgroup to place child in

prattmic · 2022-02-24T18:02:20Z

Support for the cgroup FD seems fine and uncontroversial to me.

The pid FD case is a bit more interesting. I certainly think it should be possible to get a pid FD. In fact, I think we should use a pid FD whenever possible. I wonder if instead of providing only a low-level SysProcAttr field, we should automatically make os.StartProcess return an os.Process backed by a pid FD rather than the raw pid (Wait, Kill, etc perform pid FD operations). A new os.Process.Fd() could return the pid FD for additional direct use. os.Process closes the pid FD in os.Process.Release (which is also called with a finalizer).

Down a level, syscall.StartProcess could return the pid FD in the handle return slot, which is currently unused on Linux. Some extra argument or SysProcAttr field would be needed to request this, as callers other than os.StartProcess wouldn't know to close the FD by default. (syscall.ForkExec doesn't have this return value and would need a new variant or SysProcAttr field).

One possibility to merge the limitations above would be to add PidFD *int to SysProcAttr, but also make os.StartProcess automatically set this field if not set by callers.

kolyshkin · 2022-02-28T19:07:38Z

PidFD *int // set to pid FD of child if not nil

So, I played with this a bit, and it seems it's easier to have something like this:

       // PidFD is set to a PID file descriptor referring to the child process,
       // if CLONE_PIDFD flag is set in Cloneflags. Available since Linux 5.2.
       PidFD int

The only thing about it is a slightly unusual way of returning a value. The gist of implementation, using existing clone() call, is this:

@@ -213,12 +218,12 @@ func forkAndExecInChild1(argv0 *byte, argv, envv []*byte, chroot, dir *byte, att
        runtime_BeforeFork()
        locked = true
        switch {
-       case sys.Cloneflags&CLONE_NEWUSER == 0 && sys.Unshareflags&CLONE_NEWUSER == 0:
+       case sys.Cloneflags&(CLONE_NEWUSER|CLONE_PIDFD) == 0 && sys.Unshareflags&CLONE_NEWUSER == 0:
                r1, err1 = rawVforkSyscall(SYS_CLONE, uintptr(SIGCHLD|CLONE_VFORK|CLONE_VM)|sys.Cloneflags)
        case runtime.GOARCH == "s390x":
-               r1, _, err1 = RawSyscall6(SYS_CLONE, 0, uintptr(SIGCHLD)|sys.Cloneflags, 0, 0, 0, 0)
+               r1, _, err1 = RawSyscall6(SYS_CLONE, 0, uintptr(SIGCHLD)|sys.Cloneflags, 0, uintptr(unsafe.Point
er(&sys.PidFD)), 0, 0)
        default:
-               r1, _, err1 = RawSyscall6(SYS_CLONE, uintptr(SIGCHLD)|sys.Cloneflags, 0, 0, 0, 0, 0)
+               r1, _, err1 = RawSyscall6(SYS_CLONE, uintptr(SIGCHLD)|sys.Cloneflags, 0, uintptr(unsafe.Pointer(
&sys.PidFD)), 0, 0, 0)
        }
        if err1 != 0 || r1 != 0 {
                // If we're in the parent, we must return immediately

We can certainly do it with a pointer, too, if that is preferred.

The only problem is adding new constants; at this point, I guess, I'd rather patch all the needed files in place, since the generator is outright broken and fixing this is very out of scope for this.

ianlancetaylor · 2022-02-28T19:17:35Z

Currently SysProcAttr is not changed by StartProcess. I would be reluctant to introduce such a case. So I think that a pointer is better. Also, using a pointer makes clear when the caller expects the new descriptor; we don't want to create it if the caller doesn't want it.

rsc · 2022-03-02T18:18:10Z

Retitled per discussion above. The current proposal is #51246 (comment), to add PidFD, Cgroup, Cgroup.

Certainly you would expect "Pid *int" to be a process ID, not a process ID file descriptor. Hence "PidFD *int".
What about Cgroup? Are there integer IDs for cgroups already? Should it be CgroupFD?

(In contrast, we have "Ctty int" but I think most people would expect a tty int to be a file descriptor, not a tty number. The question is whether Cgroup is more like Ctty or more like Pid.)

prattmic · 2022-03-02T18:55:37Z

Certainly you would expect "Pid *int" to be a process ID, not a process ID file descriptor. Hence "PidFD *int". What about Cgroup? Are there integer IDs for cgroups already? Should it be CgroupFD?

As far as I know there are no standard integer IDs for cgroups. cgroups are primarily managed as directories within a cgroupfs mount. Technically those directories would have integer inode IDs, but I've never heard of those being used for anything.

Even proc files identify cgroups with their string path within the cgroupfs. e.g., the last field below,

$ cat /proc/225775/cgroup 
13:freezer:/
12:memory:/user.slice/user-200669.slice/session-c16.scope
11:perf_event:/
10:pids:/user.slice/user-200669.slice/session-c16.scope
9:misc:/
8:rdma:/
7:hugetlb:/
6:devices:/user.slice
5:net_cls,net_prio:/
4:cpuset:/
3:cpu,cpuacct:/user.slice/user-200669.slice/session-c16.scope
2:blkio:/user.slice
1:name=systemd:/user.slice/user-200669.slice/session-c16.scope
0::/user.slice/user-200669.slice/session-c16.scope

The first field is the "hierarchy ID", which is a numeric description for the cgroup type:

$ cat /proc/cgroups
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  4       2       1
cpu     3       96      1
cpuacct 3       96      1
blkio   2       87      1
memory  12      124     1
devices 6       87      1
freezer 13      2       1
net_cls 5       2       1
perf_event      11      2       1
net_prio        5       2       1
hugetlb 7       2       1
pids    10      96      1
rdma    8       1       1
misc    9       1       1

I suppose that could be confusing, but again this describes a cgroup type, not a specific cgroup.

Despite all that, in hindsight I think CgroupFD is a better name. I suggested Cgroup because that is the name used in Linux's clone_args, but CgroupFD IMO only adds clarity.

Additionally, as far as I know, "cgroup FDs" are not a common concept [1]. In fact, I'm not certain if there is anything you can do with this FD besides pass it to clone. Most cgroup operations are performed on files in the cgroup directory. e.g., an existing process is added to a cgroup by opening the cgroup.procs file in the directory and writing a PID. So I think adding clarity in the name is helpful. (The documentation on the field should also explain how to get this FD).

[1] A "cgroup FD" in the context of clone is just open(O_RDONLY) of the cgroup directory.

prattmic · 2022-03-02T18:57:36Z

In fact, I'm not certain if there is anything you can do with this FD besides pass it to clone.

Oh, of course, the other obvious thing you could do is use this with openat (e.g., openat(cgroupFD, "cgroup.procs")) to avoid needing to know the name of the cgroup (or even the cgroupfs mount point). But I'm not sure if that is something commonly done.

kolyshkin · 2022-03-07T23:37:28Z

@prattmic what you look at is cgroup v1, which is a set of cgroups per controller, i.e. we have a forest. This is being superseded by cgroup v2, where we have a single unified cgroup directory for each process (IOW a tree, not a forest). Cgroup v2 is enabled by default in some modern distros, e.g. recent Fedora. For Ubuntu, you need a kernel options

The cgroup parameter to clone3 only supports cgroup v2, and it's a file description to cgroup directory.

You're right that there is not much use for cgroupfd outside of this clone3. One other use I can think of is openat2, but implementing openat2 is a separate unrelated topic. (Update: I missed your last comment, @prattmic, where you say the same about openat)

rsc · 2022-03-09T18:25:59Z

It sounds like there is enough confusion about cgroups that calling it CgroupFD would help clarify matters.

rsc · 2022-03-09T18:27:25Z

Retitled. Does anyone object to adding PidFD, CgroupFD, and UseCgroupFD to the Linux SysProcAttr?

rsc · 2022-03-09T18:29:32Z

@kolyshkin, we might want to reserve access to the pidfd to Go itself, so that we have the ability to use it for signal delivery on new enough Linux systems. If we add this option to SysProcAttr, we are essentially giving away the pidfd and giving up the ability to use it in the Go runtime or os package in the future. We might not want to do that.

What use for pidfd did you have? Perhaps we should figure out an API that would let you do what you need while the Go runtime or os package still owns the pidfd.

(Or maybe if we need two in the future we could use Dup.)

rsc · 2022-03-16T17:19:08Z

@kolyshkin We are still interested in whether the pidfd itself needs to be exposed, and for what use. Thanks!

kolyshkin · 2022-03-16T21:55:18Z

Sorry for not replying earlier, was on vacay.

Support for pidfd (I mean full and transparent support, such as embedding it into os.Process and using from e.g. Wait (that would be waitid(P_PIDFD, ...)) and Kill methods) would be nice to have, since using it will protect from pid reuse, and I've seen a few projects that have their own semi-crappy ways of dealing with it (usually by storing and checking process start time).

Obviously, those projects need a way to see if pidfd is supported by the runtime/kernel.

Some low-level software, like runc, may find other uses for pidfd (I'm not sure what would be it exactly, perhaps pidfd_getfd(2), and I guess future kernels may add more ways to use pidfd), so having some method to get it (hidden under Sys() or the like) would be nice to have, too, and could also be used as a way to check if pidfd is supported.

The constants for these were auto-generated from the C includes into zerrors_linux* files quite some time ago. The generator is currently broken, but some new flags need to be added nevertheless. As the flags won't change and the values are the same for all architectures, we can just define them statically (as it's already done in the runtime package): - remove the CLONE_* constants from zerrors_linux_*.go; - patch mkerrors.sh to not generate CLONE_ constants (in case it will be fixed and used in the future); - add the constants and some comments about them to exec_linux.go, using Linux v5.17 include/uapi/sched.h as the ultimate source. This adds the following new flags: - CLONE_CLEAR_SIGHAND - CLONE_INTO_CGROUP - CLONE_NEWCGROUP - CLONE_NEWTIME - CLONE_PIDFD For #51246. Change-Id: I0c635723926218bd403d37e113ee4d62194463a8 Reviewed-on: https://go-review.googlesource.com/c/go/+/407574 Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joedian Reid <joedian@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com>

Implement CLONE_INTO_CGROUP feature, allowing to put a child in a specified cgroup in a clean and simple way. Note that the feature only works for cgroup v2, and requires Linux kernel 5.7 or newer. Using the feature requires a new syscall, clone3. Currently this is the only reason to use clone3, but the code is structured in a way so that other cases may be easily added in the future. Add a test case. While at it, try to simplify the syscall calling code in forkAndExecInChild1, which became complicated over time because: 1. It was using either rawVforkSyscall or RawSyscall6 depending on whether CLONE_NEWUSER was set. 2. On Linux/s390, the first two arguments to clone(2) system call are swapped (which deserved a mention in Linux ABI hall of shame). It was worked around in rawVforkSyscall on s390, but had to be implemented via a switch/case when using RawSyscall6, making the code less clear. Let's - modify rawVforkSyscall to have two arguments (which is also required for clone3); - remove the arguments workaround from s390 asm, instead implementing arguments swap in the caller (which still looks ugly but at least it's done once and is clearly documented now); - use rawVforkSyscall for all cases (since it is essentially similar to RawSyscall6, except for having less parameters, not returning r2, and saving/restoring the return address before/after syscall on 386 and amd64). Updates #51246. Change-Id: Ifcd418ebead9257177338ffbcccd0bdecb94474e Reviewed-on: https://go-review.googlesource.com/c/go/+/417695 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Run-TryBot: Kirill Kolyshkin <kolyshkin@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>

zhuangqh · 2023-08-02T02:30:21Z

Can we backport this feature to go1.19? Because linux5.3 has been released in 2019 and go1.20 is too new for us. thanks
@rsc @kolyshkin

kolyshkin · 2023-08-02T02:45:09Z

Can we backport this feature to go1.19? Because linux5.3 has been released in 2019 and go1.20 is too new for us. thanks @rsc @kolyshkin

I'm not a Go maintainer, but I do maintain a bunch of Go packages. From that perspective, I can suggest you to

move to Go 1.20 ASAP (since Go 1.19 won't be supported once Go 1.21 is released later this month);
in general, be proactive about supporting new Go versions (before they are out -- this is what betas and RCs are for);
do not add comments not directly related to the issue being discussed (doing a backport is clearly a separate issue)

The constants for these were auto-generated from the C includes into zerrors_linux* files quite some time ago. The generator is currently broken, but some new flags need to be added nevertheless. As the flags won't change and the values are the same for all architectures, we can just define them statically (as it's already done in the runtime package): - remove the CLONE_* constants from zerrors_linux_*.go; - patch mkerrors.sh to not generate CLONE_ constants (in case it will be fixed and used in the future); - add the constants and some comments about them to exec_linux.go, using Linux v5.17 include/uapi/sched.h as the ultimate source. This adds the following new flags: - CLONE_CLEAR_SIGHAND - CLONE_INTO_CGROUP - CLONE_NEWCGROUP - CLONE_NEWTIME - CLONE_PIDFD For golang#51246. Change-Id: I0c635723926218bd403d37e113ee4d62194463a8 Reviewed-on: https://go-review.googlesource.com/c/go/+/407574 Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joedian Reid <joedian@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com>

Implement CLONE_INTO_CGROUP feature, allowing to put a child in a specified cgroup in a clean and simple way. Note that the feature only works for cgroup v2, and requires Linux kernel 5.7 or newer. Using the feature requires a new syscall, clone3. Currently this is the only reason to use clone3, but the code is structured in a way so that other cases may be easily added in the future. Add a test case. While at it, try to simplify the syscall calling code in forkAndExecInChild1, which became complicated over time because: 1. It was using either rawVforkSyscall or RawSyscall6 depending on whether CLONE_NEWUSER was set. 2. On Linux/s390, the first two arguments to clone(2) system call are swapped (which deserved a mention in Linux ABI hall of shame). It was worked around in rawVforkSyscall on s390, but had to be implemented via a switch/case when using RawSyscall6, making the code less clear. Let's - modify rawVforkSyscall to have two arguments (which is also required for clone3); - remove the arguments workaround from s390 asm, instead implementing arguments swap in the caller (which still looks ugly but at least it's done once and is clearly documented now); - use rawVforkSyscall for all cases (since it is essentially similar to RawSyscall6, except for having less parameters, not returning r2, and saving/restoring the return address before/after syscall on 386 and amd64). Updates golang#51246. Change-Id: Ifcd418ebead9257177338ffbcccd0bdecb94474e Reviewed-on: https://go-review.googlesource.com/c/go/+/417695 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Run-TryBot: Kirill Kolyshkin <kolyshkin@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>

ianlancetaylor · 2023-08-04T19:39:28Z

@zhuangqh Sorry, we don't backport new features like this to older Go releases. See https://go.dev/wiki/MinorReleases for our backport policy.

gopherbot · 2023-08-17T02:43:54Z

Change https://go.dev/cl/520266 mentions this issue: syscall: add PidFD support to SysProcAttr

Add PidFD support, so that if the PidFD pointer in SysProcAttr is not nil, ForkExec (and thus all its users) obtains a pidfd from the kernel during clone(), and writes the result (or -1, if the functionality is not supported by the kernel) into *PidFD. The functionality to get pidfd is implemented for both clone3 and clone. For the latter, an extra argument to rawVforkSyscall is needed, thus the change in asm files. Add a trivial test case checking the obtained pidfd can be used to send a signal to a process, using pidfd_send_signal. To test clone3 code path, add a flag available to tests only. Updates #51246. Change-Id: I2212b69e1a657163c31b4a6245b076bc495777a3 Reviewed-on: https://go-review.googlesource.com/c/go/+/520266 Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Kirill Kolyshkin <kolyshkin@gmail.com>

gopherbot · 2023-09-14T16:57:37Z

Change https://go.dev/cl/528436 mentions this issue: internal/syscall/unix: add PidFDSendSignal for linux

gopherbot · 2023-09-14T21:58:12Z

Change https://go.dev/cl/528439 mentions this issue: os: use PidfdSendSignal on linux if possible

gopherbot · 2023-09-16T02:57:32Z

Change https://go.dev/cl/528798 mentions this issue: os: make use of pidfd on linux

kolyshkin · 2023-09-19T18:14:02Z

Support for the cgroup FD seems fine and uncontroversial to me.

The pid FD case is a bit more interesting. I certainly think it should be possible to get a pid FD. In fact, I think we should use a pid FD whenever possible. I wonder if instead of providing only a low-level SysProcAttr field, we should automatically make os.StartProcess return an os.Process backed by a pid FD rather than the raw pid (Wait, Kill, etc perform pid FD operations). A new os.Process.Fd() could return the pid FD for additional direct use. os.Process closes the pid FD in os.Process.Release (which is also called with a finalizer).

Down a level, syscall.StartProcess could return the pid FD in the handle return slot, which is currently unused on Linux. Some extra argument or SysProcAttr field would be needed to request this, as callers other than os.StartProcess wouldn't know to close the FD by default. (syscall.ForkExec doesn't have this return value and would need a new variant or SysProcAttr field).

One possibility to merge the limitations above would be to add PidFD *int to SysProcAttr, but also make os.StartProcess automatically set this field if not set by callers.

@prattmic Opened https://go.dev/issue/62654 based on this, and wrote a partial implementation in https://go-review.googlesource.com/c/go/+/528438 and https://go-review.googlesource.com/c/go/+/528798/2?usp=related-change

gopherbot · 2023-11-16T09:00:20Z

Change https://go.dev/cl/542698 mentions this issue: syscall: fix getting pidfd when using CLONE_NEWUSER

While working on CL 528798, I found out that sys.PidFD field (added in CL 520266) is not filled in when CLONE_NEWUSER is used. This happens because the code assumed that the parent and the child run in the same memory space. This assumption is right only when CLONE_VM is used for clone syscall, and the code only sets CLONE_VM when CLONE_NEWUSER is not used. Fix this, and add a test case (which fails before the fix). Updates #51246. Change-Id: I805203c1369cadd63d769568b132a9ffd92cc184 Reviewed-on: https://go-review.googlesource.com/c/go/+/542698 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>

gopherbot · 2023-11-21T20:41:52Z

Change https://go.dev/cl/544318 mentions this issue: syscall: check SyscallIsNotSupported in TestPidFDWithUserNS

For #51246. Change-Id: Ief2e2e14f039123a6580cb60be7ee74f4a20a649 Reviewed-on: https://go-review.googlesource.com/c/go/+/544318 TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

CL 520266 added pidfd_send_signal linux syscall numbers to the syscall package for the sake of a unit test. As pidfd_send_signal will be used from the os package, let's revert the changes to syscall package, add the pidfd_send_signal syscall numbers and the implementation to internal/syscall/unix, and change the above test to use it. Updates #51246. For #62654. Change-Id: I862174c3c1a64baf1080792bdb3a1c1d1b417bb4 Reviewed-on: https://go-review.googlesource.com/c/go/+/528436 Run-TryBot: Kirill Kolyshkin <kolyshkin@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>

gopherbot · 2024-03-11T21:34:48Z

Change https://go.dev/cl/570681 mentions this issue: os: make FindProcess use pidfd on Linux

kolyshkin · 2024-03-11T21:47:04Z

I think this could be closed now since the proposal is fully implemented as of Go 1.22. The continuation of pidfd use from golang runtime discussion happens in #62654.

This is a continuation of CL 570036. Amend FindProcess to use pidfdFind, and make it return a special Process with Pid of pidDone (-2) if the process is not found. Amend Wait and Signal to return ErrProcessDone if pid == pidDone. The alternative to the above would be to make FindProcess return ErrProcessDone, but this is unexpected and incompatible API change, as discussed in #65866 and #51246. For #62654. Rework of CL 542699 (which got reverted in CL 566476). Change-Id: Ifb4cd3ad1433152fd72ee685d0b85d20377f8723 Reviewed-on: https://go-review.googlesource.com/c/go/+/570681 TryBot-Bypass: Dmitri Shuralyov <dmitshur@golang.org> Run-TryBot: Kirill Kolyshkin <kolyshkin@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>

kolyshkin added the Proposal label Feb 17, 2022

gopherbot added this to the Proposal milestone Feb 17, 2022

rsc changed the title ~~proposal: syscall: support clone3 / clone_args on Linux~~ proposal: syscall: add CloneArgs struct, add SysProcAttr.CloneArgs field, use clone3 on Linux Feb 23, 2022

prattmic changed the title ~~proposal: syscall: add CloneArgs struct, add SysProcAttr.CloneArgs field, use clone3 on Linux~~ proposal: syscall: add pidFD and cgroupFD options from Linux clone3 to SysProcAttr Feb 24, 2022

prattmic changed the title ~~proposal: syscall: add pidFD and cgroupFD options from Linux clone3 to SysProcAttr~~ proposal: syscall: add pidFD and cgroupFD options from Linux clone to SysProcAttr Feb 24, 2022

rsc changed the title ~~proposal: syscall: add pidFD and cgroupFD options from Linux clone to SysProcAttr~~ proposal: syscall: add PidFD, UseCgroup, and Cgroup options from Linux clone to SysProcAttr Mar 2, 2022

rsc changed the title ~~proposal: syscall: add PidFD, UseCgroup, and Cgroup options from Linux clone to SysProcAttr~~ proposal: syscall: add PidFD, CgroupFD, and UseCgroupFD options for Linux clone to SysProcAttr Mar 9, 2022

shoenig mentioned this issue Sep 3, 2022

idea: use CLONE_NEWCGROUP to fix racey cgroup assignment shoenig/nomad-pledge-driver#2

Closed

prattmic mentioned this issue Dec 6, 2022

api: audit for Go 1.20 #57126

Closed

rsc removed the Proposal-FinalCommentPeriod label Jul 26, 2023

kolyshkin mentioned this issue Sep 14, 2023

os: make use of pidfd for linux #62654

Closed

mknyszek mentioned this issue Nov 22, 2023

api: audit for Go 1.22 #64343

Closed

kolyshkin closed this as completed Mar 11, 2024

dmitshur modified the milestones: Backlog, Go1.22 May 23, 2024

dustinspecker mentioned this issue Jul 26, 2024

docs: add RFD for linux job worker service dustinspecker/linux-job-worker-service#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

syscall: add PidFD, CgroupFD, and UseCgroupFD options for Linux clone to SysProcAttr #51246

syscall: add PidFD, CgroupFD, and UseCgroupFD options for Linux clone to SysProcAttr #51246

kolyshkin commented Feb 17, 2022

cuiweixie commented Feb 19, 2022

kolyshkin commented Feb 19, 2022

OneOfOne commented Feb 20, 2022

rsc commented Feb 23, 2022

rsc commented Feb 23, 2022

prattmic commented Feb 23, 2022

kolyshkin commented Feb 23, 2022 •

edited

Loading

prattmic commented Feb 24, 2022

prattmic commented Feb 24, 2022

prattmic commented Feb 24, 2022 •

edited

Loading

kolyshkin commented Feb 28, 2022

ianlancetaylor commented Feb 28, 2022

rsc commented Mar 2, 2022

prattmic commented Mar 2, 2022

prattmic commented Mar 2, 2022

kolyshkin commented Mar 7, 2022 •

edited

Loading

rsc commented Mar 9, 2022

rsc commented Mar 9, 2022

rsc commented Mar 9, 2022 •

edited

Loading

rsc commented Mar 16, 2022

kolyshkin commented Mar 16, 2022

zhuangqh commented Aug 2, 2023

kolyshkin commented Aug 2, 2023

ianlancetaylor commented Aug 4, 2023

gopherbot commented Aug 17, 2023

gopherbot commented Sep 14, 2023

gopherbot commented Sep 14, 2023

gopherbot commented Sep 16, 2023

kolyshkin commented Sep 19, 2023

gopherbot commented Nov 16, 2023

gopherbot commented Nov 21, 2023

gopherbot commented Mar 11, 2024

kolyshkin commented Mar 11, 2024

syscall: add PidFD, CgroupFD, and UseCgroupFD options for Linux clone to SysProcAttr #51246

syscall: add PidFD, CgroupFD, and UseCgroupFD options for Linux clone to SysProcAttr #51246

Comments

kolyshkin commented Feb 17, 2022

cuiweixie commented Feb 19, 2022

kolyshkin commented Feb 19, 2022

OneOfOne commented Feb 20, 2022

rsc commented Feb 23, 2022

rsc commented Feb 23, 2022

prattmic commented Feb 23, 2022

kolyshkin commented Feb 23, 2022 • edited Loading

prattmic commented Feb 24, 2022

prattmic commented Feb 24, 2022

prattmic commented Feb 24, 2022 • edited Loading

kolyshkin commented Feb 28, 2022

ianlancetaylor commented Feb 28, 2022

rsc commented Mar 2, 2022

prattmic commented Mar 2, 2022

prattmic commented Mar 2, 2022

kolyshkin commented Mar 7, 2022 • edited Loading

rsc commented Mar 9, 2022

rsc commented Mar 9, 2022

rsc commented Mar 9, 2022 • edited Loading

rsc commented Mar 16, 2022

kolyshkin commented Mar 16, 2022

zhuangqh commented Aug 2, 2023

kolyshkin commented Aug 2, 2023

ianlancetaylor commented Aug 4, 2023

gopherbot commented Aug 17, 2023

gopherbot commented Sep 14, 2023

gopherbot commented Sep 14, 2023

gopherbot commented Sep 16, 2023

kolyshkin commented Sep 19, 2023

gopherbot commented Nov 16, 2023

gopherbot commented Nov 21, 2023

gopherbot commented Mar 11, 2024

kolyshkin commented Mar 11, 2024

kolyshkin commented Feb 23, 2022 •

edited

Loading

prattmic commented Feb 24, 2022 •

edited

Loading

kolyshkin commented Mar 7, 2022 •

edited

Loading

rsc commented Mar 9, 2022 •

edited

Loading