Skip to content

test/testenv: run dial-stdio with Pdeathsig and process group#980

Merged
cpuguy83 merged 1 commit intoproject-dalec:mainfrom
invidian:fix-leaking-dial-stdio
Mar 2, 2026
Merged

test/testenv: run dial-stdio with Pdeathsig and process group#980
cpuguy83 merged 1 commit intoproject-dalec:mainfrom
invidian:fix-leaking-dial-stdio

Conversation

@invidian
Copy link
Copy Markdown
Contributor

@invidian invidian commented Mar 2, 2026

What this PR does / why we need it:

That should ensure we don't leave orphaned dial-stdio processes if the test suite is interrupted or crashes, and that all processes are cleaned up properly when the test finishes.

This approach is easier than explicit cleanup, since we delegate the cleanup to the OS.

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):
Closes #974

Special notes for your reviewer:

AI generated patch with manual testing. I am not familiar with nuances of Pdeathsig and runtime.LockOSThread(), but it seems to be addressing my issue.

That should ensure we don't leave orphaned dial-stdio processes if the
test suite is interrupted or crashes, and that all processes are cleaned
up properly when the test finishes.

This approach is easier than explicit cleanup, since we delegate the
cleanup to the OS.

Closes project-dalec#974

Signed-off-by: Mateusz Gozdek <mgozdek@microsoft.com>
Copilot AI review requested due to automatic review settings March 2, 2026 07:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the integration-test Buildx “dial-stdio” launcher to rely more on OS-level process cleanup semantics (process groups / parent-death signal) so interrupted or crashing test runs don’t leave orphaned docker buildx dial-stdio processes behind.

Changes:

  • Configure the spawned docker buildx dial-stdio process with a new process group and Pdeathsig.
  • Restructure process start/wait handling to signal completion via a channel and propagate start errors through the stderr pipe.
  • Add runtime.LockOSThread() around the child process lifetime (intended to support Pdeathsig behavior).

Comment thread test/testenv/buildx.go
Comment on lines +109 to +113
cmd.SysProcAttr = &syscall.SysProcAttr{
// Put the child in its own process group so we can kill the entire
// group (docker + docker-buildx plugin) during cleanup.
Setpgid: true,
// Send SIGTERM to the child process when the parent (test process) dies.
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syscall.SysProcAttr{Setpgid: ..., Pdeathsig: ...} is Linux/Unix-specific and this file has no build tags, so go test ./... will fail to compile on platforms where SysProcAttr lacks these fields (e.g., darwin/windows). Consider moving the SysProcAttr setup behind per-OS build-tagged helpers (no-op on unsupported OSes) or gating the entire implementation to Linux if these tests are Linux-only.

Copilot uses AI. Check for mistakes.
Comment thread test/testenv/buildx.go
Comment on lines 186 to 190
select {
case <-chWait:
case <-processDone:
case <-time.After(10 * time.Second):
// If it still doesn't exit, force kill
cmd.Process.Kill() //nolint:errcheck // Force kill if it doesn't exit after interrupt
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code sets Setpgid: true (and the comment says cleanup will kill the whole process group), but cleanup only signals cmd.Process and never targets the process group. If the intent is to avoid orphaned buildx plugin subprocesses, consider signaling/killing the process group (e.g., via a negative PID on platforms that support it) or dropping Setpgid if it’s not used.

Copilot uses AI. Check for mistakes.
Comment thread test/testenv/buildx.go
@cpuguy83 cpuguy83 merged commit 5fa6484 into project-dalec:main Mar 2, 2026
42 of 48 checks passed
Comment thread test/testenv/buildx.go
// Send SIGTERM to the child process when the parent (test process) dies.
// This prevents dial-stdio processes from being orphaned when the test
// suite is interrupted or crashes.
Pdeathsig: syscall.SIGTERM,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized this makes it so we can't invoke tests from non-Linux. We should at least gate this. But also definitely not a full solution.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was aware, hence the revert :) #981

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] docker-buildx buildx dial-stdio --progress=plain processes leaks from integration tests when suite is interrupted

3 participants