Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: make use of pidfd for linux #62654

Open
kolyshkin opened this issue Sep 14, 2023 · 25 comments
Open

os: make use of pidfd for linux #62654

kolyshkin opened this issue Sep 14, 2023 · 25 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@kolyshkin
Copy link
Contributor

kolyshkin commented Sep 14, 2023

Based on a discussion in #51246 (in particular, #51246 (comment)), this is a proposal to implement pidfd support for process methods in the os package -- StartProcess, FindProcess, Kill, Wait.

  1. Reuse the handle field of struct Process to keep pidfd. Since 0 is a valid value, use ^uintptr(0) as a default value (if pidfd is not known).

  2. Instrument syscall.StartProcess os.StartProcess to get pidfd from the kernel and return it as a handle. The initial idea was to modify syscall.StartProcess as it returns a handle (which we're reusing for pidfd) but that would be a change is not backward-compatible and can result in leaking pidfd as callers are not expecting it.

  3. Use pidfdSendSignal from (*process).Signal, if handle is known and the syscall is available; fallback to old code using syscall.Kill if syscall is not available (e.g. disabled by seccomp).

  4. Use waitid(P_PIDFD, ...) in Wait, if pidfd is set; fall back to old code if P_PIDFD is not supported.

  1. Amend FindProcess to use pidfd_open(2), optionally returning ErrProcessGone, and modify the documentation accordingly (adding that it can return ErrProcessGone on Linux).
  1. (Optional) Add (*Process).Handle() uintptr method to return process handle on Windows and pidfd on Linux. This might be useful for low-level operations that require handle/pidfd (e.g. pidfd_getfd on Linux), or to ensure that pidfd (rather than pid) is being used for kill/wait.
@gopherbot gopherbot added this to the Proposal milestone Sep 14, 2023
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/528436 mentions this issue: internal/syscall/unix: add PidFDSendSignal for Linux

@ianlancetaylor
Copy link
Contributor

I don't think there is an API change here (before optional item 7) so taking this out of the proposal process.

CC @golang/runtime

@ianlancetaylor ianlancetaylor changed the title proposal: os: make use of pidfd for linux os: make use of pidfd for linux Sep 15, 2023
@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed Proposal labels Sep 15, 2023
@ianlancetaylor ianlancetaylor modified the milestones: Proposal, Backlog Sep 15, 2023
@kolyshkin
Copy link
Contributor Author

I don't think there is an API change here (before optional item 7) so taking this out of the proposal process.

@ianlancetaylor can you please elaborate? Is this about FindProcess returning ErrProcessGone, or something else?

If this is about FindProcess returning an error, as I pointed out in 4 above, this is entirely optional. We can make it succeed every time.

@ianlancetaylor
Copy link
Contributor

What I mean is that we only need a proposal for an API change or a significant functionality change. This issue doesn't seem to be either, so we should treat it as an ordinary feature request issue, rather than running it through the proposal process. I hope that makes sense. Thanks.

@kolyshkin
Copy link
Contributor Author

Thanks for the explanation, makes sense.I will keep working on implementation.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/528438 mentions this issue: syscall: make StartProcess return pidfd on Linux

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/528798 mentions this issue: os: make use of pidfd on linux

@metux
Copy link

metux commented Oct 3, 2023

I'm a bit unhappy w/ #4:
API shouldn't behave different on individual OS. So I'd put it this way:

ErrProcessGone could happen on any OS. Just practically only happens on Linux for now. So, developers always need to be prepared for this.

@kolyshkin
Copy link
Contributor Author

@golang/runtime could you please take a look at this proposal as well as CLs implementing it?
This is https://go.dev/cl/528436 and the two related ones, implementing items 1 to 4 described above.

I would appreciate any feedback.

@prattmic
Copy link
Member

Hi @kolyshkin,

Sorry for the delay. I took a look at your CLs. I'm leaving some feedback here since it is about potential API changes.

  1. https://go.dev/cl/528438 changes syscall.StartProcess to automatically set ProcAttr.Sys.PidFD in order to always return a pidfd via handle. I don't think this is a safe change. Existing callers will not realize that handle is a file descriptor which needs to be closed, and will thus leak the file descriptor on every call. I think it would be better for os.startProcess to set ProcAttr.Sys.PidFD prior to calling syscall.StartProcess. os.Process will retain ownership of the file descriptor and know when to close it.

  2. Should os.Process.Wait close the pidfd? Right now, only os.Process.Release does so.

  3. Regarding os.FindProcess returning ErrProcessGone, I think this is a nice improvement to the API, but I worry that this might break some callers. If we do this, I think we should add a new GODEBUG flag to switch back to the old behavior.

@kolyshkin
Copy link
Contributor Author

Hi @kolyshkin,

Sorry for the delay. I took a look at your CLs. I'm leaving some feedback here since it is about potential API changes.

  1. https://go.dev/cl/528438 changes syscall.StartProcess to automatically set ProcAttr.Sys.PidFD in order to always return a pidfd via handle. I don't think this is a safe change. Existing callers will not realize that handle is a file descriptor which needs to be closed, and will thus leak the file descriptor on every call. I think it would be better for os.startProcess to set ProcAttr.Sys.PidFD prior to calling syscall.StartProcess. os.Process will retain ownership of the file descriptor and know when to close it.
  2. Should os.Process.Wait close the pidfd? Right now, only os.Process.Release does so.
  3. Regarding os.FindProcess returning ErrProcessGone, I think this is a nice improvement to the API, but I worry that this might break some callers. If we do this, I think we should add a new GODEBUG flag to switch back to the old behavior.

@prattmic thanks for review comments, I updated the code based on your suggestions (except for the item 3, see below). In the process, I found a bug in my older PidFD implementation (see https://go.dev/cl/542698 for the fix). The series is now ready for (re-)review, I'd be grateful about any feedback.

As for using GODEBUG, I feel the overall change is major and might break some setups (in particular, new syscalls are used, which might be blocked on some older systems, and the runtime workarounds may be too complex (see e.g. canUsePidfdSendSignal in https://go.dev/cl/528438). Therefore I guess it might make sense to gate the whole "use pidfd from os" feature (including FindProcess change) by having something like GODEBUG=osusepidfd=0.

What do you think?

CC @golang/runtime

@prattmic
Copy link
Member

Therefore I guess it might make sense to gate the whole "use pidfd from os" feature (including FindProcess change) by having something like GODEBUG=osusepidfd=0.

I agree, this seems reasonable.

gopherbot pushed a commit that referenced this issue Nov 21, 2023
CL 520266 added pidfd_send_signal linux syscall numbers to the
syscall package for the sake of a unit test.

As pidfd_send_signal will be used from the os package, let's revert the
changes to syscall package, add the pidfd_send_signal syscall numbers
and the implementation to internal/syscall/unix, and change the above
test to use it.

Updates #51246.
For #62654.

Change-Id: I862174c3c1a64baf1080792bdb3a1c1d1b417bb4
Reviewed-on: https://go-review.googlesource.com/c/go/+/528436
Run-TryBot: Kirill Kolyshkin <kolyshkin@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
@kolyshkin
Copy link
Contributor Author

I agree, this seems reasonable.

@prattmic Implemented as discussed; PTAL https://go.dev/cl/528438

gopherbot pushed a commit that referenced this issue Feb 21, 2024
Use Process.handle field to store pidfd, and make use of it. Only use
pidfd functionality if all the needed syscalls are available.

1. StartProcess: obtain the pidfd from the kernel, if available,
   using the functionality added by CL 520266. Note we could not modify
   syscall.StartProcess to return pidfd directly because it is a public
   API and its callers do not expect it, so we have to use ensurePidfd
   and getPidfd.

2. (*Process).Kill: use pidfdSendSignal, if the syscall is available
   and pidfd is known. This is slightly more complicated than it should
   be, since the syscall can be blocked by e.g. seccomp security policy,
   therefore the need for a function to check if it's actually working,
   and a soft fallback to kill. Perhaps this precaution is not really
   needed.

3. (*Process).Wait: use pidfdWait, if available, otherwise fall back to
   using waitid/wait4. This is also more complicated than expected due
   to struct siginfo_t idiosyncrasy.

NOTE pidfdSendSignal and pidfdWait are used without a race workaround
(blockUntilWaitable and sigMu, added by CL 23967) because with pidfd,
PID recycle issue doesn't exist (IOW, pidfd, unlike PID, is guaranteed
to refer to one particular process) and thus the race doesn't exist
either.

For #62654.
Updates #13987.

Change-Id: I22ebcc7142b16a3a94c422d2f32504d1a80e8a8f
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/528438
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/566179 mentions this issue: os: FindProcess must not return nil

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/566180 mentions this issue: os: add TestFindProcessDone for Linux

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/566181 mentions this issue: os: fix FindError doc typo

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/566182 mentions this issue: os: fix TestUNIXProcessAlive wrt FindProcess

@mvdan
Copy link
Member

mvdan commented Feb 23, 2024

Therefore I guess it might make sense to gate the whole "use pidfd from os" feature (including FindProcess change) by having something like GODEBUG=osusepidfd=0.

Seeing the main CL land in master, I was a bit surprised to not see any GODEBUG support included, given the discussion here. For those reading along like me, it seems like it was discarded in favor of a fallback to the old mechanism, per https://go-review.googlesource.com/c/go/+/528438/comment/12693995_7cc3bdc5/.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/566476 mentions this issue: Revert "os: make FindProcess use pidfd on Linux"

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/566477 mentions this issue: Revert "os: make use of pidfd on linux"

gopherbot pushed a commit that referenced this issue Feb 23, 2024
This reverts CL 542699.

Reason for revert: Some applications assume FindProcess does not return
errors.

For #62654.
Fixes #65866.

Change-Id: Ic185a6253c8e508b08150b618c39a9905f6cdd60
Reviewed-on: https://go-review.googlesource.com/c/go/+/566476
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue Feb 23, 2024
This reverts CL 528438.

Reason for revert: Implicated in "bad FD" test failures. Full extent of
issue still unclear.

For #62654.
Fixes #65857.

Change-Id: I066e38040544c506917e90255bd0e330964a0276
Reviewed-on: https://go-review.googlesource.com/c/go/+/566477
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@ianlancetaylor
Copy link
Contributor

@mvdan Note that CL 542699 (now rolled back) did use a GODEBUG setting for the behavior change in FindProcess.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/570036 mentions this issue: os: make use of pidfd on linux

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/570681 mentions this issue: os: make FindProcess use pidfd on Linux

gopherbot pushed a commit that referenced this issue May 17, 2024
Use Process.handle field to store pidfd, and make use of it. Only use
pidfd functionality if all the needed syscalls are available.

1. Add/use pidfdWorks, which checks that all needed pidfd-related
   functionality works.

2. os.StartProcess: obtain the pidfd from the kernel, if possible, using
   the functionality added by CL 520266. Note we could not modify
   syscall.StartProcess to return pidfd directly because it is a public
   API and its callers do not expect it, so we have to use ensurePidfd
   and getPidfd.

3. (*Process).Kill: use pidfdSendSignal, if available and the pidfd is
   known. Otherwise, fall back to the old implementation.

4. (*Process).Wait: use pidfdWait, if available, otherwise fall back to
   using waitid/wait4. This is more complicated than expected due to
   struct siginfo_t idiosyncrasy.

NOTE pidfdSendSignal and pidfdWait are used without a race workaround
(blockUntilWaitable and sigMu, added by CL 23967) because with pidfd,
PID recycle issue doesn't exist (IOW, pidfd, unlike PID, is guaranteed
to refer to one particular process) and thus the race doesn't exist
either.

Rework of CL 528438 (reverted in CL 566477 because of #65857).

For #62654.
Updates #13987.

Change-Id: If5ef8920bd8619dc428b6282ffe4fb8c258ca224
Reviewed-on: https://go-review.googlesource.com/c/go/+/570036
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Kirill Kolyshkin <kolyshkin@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue May 21, 2024
This is a continuation of CL 570036.

Amend FindProcess to use pidfdFind, and make it return a special
Process with Pid of pidDone (-2) if the process is not found.

Amend Wait and Signal to return ErrProcessDone if pid == pidDone.

The alternative to the above would be to make FindProcess return
ErrProcessDone, but this is unexpected and incompatible API change,
as discussed in #65866 and #51246.

For #62654.

Rework of CL 542699 (which got reverted in CL 566476).

Change-Id: Ifb4cd3ad1433152fd72ee685d0b85d20377f8723
Reviewed-on: https://go-review.googlesource.com/c/go/+/570681
TryBot-Bypass: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Kirill Kolyshkin <kolyshkin@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
@kolyshkin
Copy link
Contributor Author

As of today, items 1-5 in the description are implemented (with a few bugfixes on top) and will be part of Go 1.23.

For item 6, I think I will create a separate proposal.

@mauri870
Copy link
Member

mauri870 commented Jun 16, 2024

It is common to read a fd list from /proc/self/fd before/after an action and detect leaking fds by subtracting the values, eg:

func getFdCount() (count int) {
	count = -1
	if entries, err := os.ReadDir("/proc/self/fd"); err == nil {
		count = len(entries)
	}
	return
}

In 1.23 there will be an additional pidfd reported in this list so the returning count will be off by one.

Perhaps we can mention it briefly in the release notes of Go 1.23?

Another valid argument to support #67639.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants