Skip to content

Fix: Orphaned process and goroutine leak in ScriptRun/CustomSync leading to split-brain deployments#6735

Open
pulkitvats2007-crypto wants to merge 1 commit intopipe-cd:masterfrom
pulkitvats2007-crypto:fix-zombie-process-leak
Open

Fix: Orphaned process and goroutine leak in ScriptRun/CustomSync leading to split-brain deployments#6735
pulkitvats2007-crypto wants to merge 1 commit intopipe-cd:masterfrom
pulkitvats2007-crypto:fix-zombie-process-leak

Conversation

@pulkitvats2007-crypto
Copy link
Copy Markdown

What this PR does:

  • Fixes goroutine and zombie process leaks in ScriptRun, CustomSync, and Kubernetes rollback executors.
  • Binds external command execution to a cancellable context using exec.CommandContext.
  • Ensures that timeout or cancellation properly terminates underlying OS processes.
  • Adds cmd.WaitDelay = time.Second to prevent hangs caused by inherited pipes from background shell jobs.
  • Updates all affected executors:
    • pkg/app/piped/executor/scriptrun/scriptrun.go
    • pkg/app/piped/executor/customsync/customsync.go
    • pkg/app/piped/executor/customsync/rollback.go
    • pkg/app/piped/executor/kubernetes/rollback.go
  • Introduces tests (TestExecuteCommandCancellation) to verify proper cancellation behavior and prevent regressions.

Why we need it:

  • Prevents concurrent deployment corruption caused by orphaned scripts continuing execution after cancellation.
  • Eliminates goroutine leaks and zombie processes that can exhaust system resources (CPU, memory, file descriptors).
  • Fixes silent failure scenarios where pipelines report "FAILED" or "CANCELLED" but scripts continue mutating infrastructure.
  • Restores correctness and reliability of PipeCD’s cancellation and timeout mechanisms.

Which issue(s) this PR fixes:

Fixes #6734

Does this PR introduce a user-facing change?:

  • No

  • How are users affected by this change:

    • Users will experience correct and deterministic cancellation behavior.
    • Cancelled or timed-out stages will no longer continue executing in the background.
    • Improved system stability under frequent cancellations or retries.
  • Is this breaking change:

    • No
  • How to migrate (if breaking change):

    • N/A

Signed-off-by: pulkitvats2007-crypto <pulkitvats2007@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zombie processes and goroutine leak in ScriptRun / CustomSync executors on cancellation

1 participant