Bound checkpoint git subprocesses to their context deadline#1282
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens checkpoint-remote git subprocess execution so context timeouts/cancellation reliably terminate hung git push/git fetch operations (especially when custom transports spawn helper descendants that keep stdout/stderr pipes open).
Changes:
- Apply subprocess hardening in
newCommandfor all checkpoint-remote git invocations. - Introduce cross-platform
WaitDelaybackstop plus Unix-only process-group kill on cancellation. - Add unit and behavioral tests validating
WaitDelayand process-group termination behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/checkpoint/remote/git.go | Ensures all checkpoint-remote git commands created via newCommand are hardened against hangs. |
| cmd/entire/cli/checkpoint/remote/command_harden.go | Adds shared hardening logic (WaitDelay + platform hook). |
| cmd/entire/cli/checkpoint/remote/command_harden_windows.go | Implements Windows no-op for process-group kill (relies on WaitDelay + default cancellation). |
| cmd/entire/cli/checkpoint/remote/command_harden_unix.go | Implements Unix process-group isolation (Setpgid) and group-wide SIGKILL on cancel. |
| cmd/entire/cli/checkpoint/remote/command_harden_unix_test.go | Adds Unix tests validating process-group kill behavior and pipe-holding descendant scenario. |
| cmd/entire/cli/checkpoint/remote/command_harden_test.go | Adds cross-platform tests validating newCommand/hardenSubprocess set WaitDelay and cancellation behavior. |
A `git push`/`git fetch` over a custom transport (e.g. entire://) could
hang the pre-push hook for ~an hour despite a 2-minute timeout. The
deadline SIGKILLs the `git` process, but the spawned remote-helper
grandchild (e.g. git-remote-entire) keeps the output pipe open, so
cmd.CombinedOutput() blocks until that helper finally exits (~58 min
observed for a stuck checkpoint push).
Harden every git subprocess built by newCommand:
- WaitDelay backstop (all platforms) so Wait force-closes the pipes
shortly after the process is killed instead of waiting on a
pipe-holding descendant.
- On unix, run the child in its own process group and SIGKILL the whole
group on cancellation so the remote helper is terminated too, not
orphaned. Windows relies on the WaitDelay backstop (reliable
process-tree kill there needs a Job Object, out of scope).
Turns "stuck forever" into a bounded failure: checkpoints stay saved
locally and the user's own push proceeds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 1d96e0531da7
36085bb to
4dfd744
Compare
The group-kill regression test asserted a hardcoded 5s deadline. That deadline must stay strictly below killWaitDelay (10s): the WaitDelay backstop would itself force the pipe closed once it elapses, so a deadline >= killWaitDelay would let the test pass even with group-kill removed, silently turning it into a no-op. Derive the deadline from killWaitDelay so the safety margin can't drift out of sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 29915cf2d150
…eckpoint-git-timeout
Address Copilot review on the checkpoint cancel tests: - Make TestNewCommand_TerminatesOnCancel and TestKillProcessGroupOnCancel_SetsSetpgidAndCancel hermetic by clearing ENTIRE_CHECKPOINT_TOKEN (and dropping t.Parallel for t.Setenv). With a token set, newCommand resolves "origin" via GetRemoteURL and would spawn a git subprocess against the ambient repo. - Rewrite TestTerminateOnCancel_KillsProcessGroup to cancel only after a stdout "ready" marker confirms the backgrounded grandchild exists, instead of a 300ms timeout that could fire before process start and silently no-op the test. - Bound cmd.Wait() in TestKillProcessGroupOnCancel_TerminatesLeader so a group-kill regression fails fast instead of hanging ~60s (this cmd sets no WaitDelay backstop). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 545d8ff9b01d
A stuck transport made the pre-push hook block ~4 minutes: each push attempt minted its own 2-minute timeout and the post-sync retry got a fresh one, so the deadline-killed initial push and retry ran back to back. Replace the per-attempt timeouts in tryPushSessionsCommon and fetchAndRebaseSessionsCommon with a single 60s budget threaded from doPushBranch. The initial push, fetch+rebase recovery, and retry now share one deadline, so a stuck push that consumes the budget makes the recovery and retry fail fast instead of each waiting a full timeout. The retry still works for fast non-fast-forward rejections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: ca420d837db1
Soph
previously approved these changes
May 28, 2026
Entire-Checkpoint: 5c5cd90917d0
…tireio/cli into harden-checkpoint-git-timeout
Collapse the multi-paragraph docstrings and inline explanations added in this branch down to one or two dense lines each, keeping the non-obvious WHY (transport-helper grandchildren holding the pipe; per-attempt timeouts stacking; the deadline < killWaitDelay invariant) and dropping narration of what the code already says. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: f2c1abb5a175
pfleidi
approved these changes
May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://entire.io/gh/entireio/cli/trails/440
Problem
A
git push/git fetchover a custom transport (e.g.entire://) can hang the pre-push hook for ~an hour despite a 2-minute timeout. Adoctor traceof a stuck checkpoint push:The context deadline SIGKILLs the
gitprocess, but the spawned remote-helper grandchild (e.g.git-remote-entire) keeps the output pipe open, socmd.CombinedOutput()blocks until that helper finally exits.Fix
Harden every git subprocess built by
newCommand(push / fetch / fetch-pack / ls-remote / cat-file):Turns "stuck forever" into a bounded failure: checkpoints stay saved locally and the user's own push proceeds.
Testing
newCommandsetsWaitDelay+ a group-killCancel(unixSetpgid).Notes
git-remote-entireside; this PR just stops it from hanging the hook.hooks-slow-push-db-mirror).🤖 Generated with Claude Code
Note
Medium Risk
Changes how all checkpoint remote git invocations terminate on timeout/cancel; incorrect group kill could affect child processes, though scope is limited to the checkpoint remote package and is well-tested on Unix.
Overview
Checkpoint git subprocesses built by
newCommand(push, fetch, fetch-pack, ls-remote, cat-file) are hardened so context timeouts are not defeated by orphaned transport helpers.Every such command gets a 10s
WaitDelaysoCombinedOutput/Waitcannot block indefinitely on pipe-holding descendants after the parent is killed. On Unix, children run in their own process group with a customCancelthat SIGKILLs the whole group, so helpers likegit-remote-entireare torn down instead of keeping stdout open for tens of minutes. Windows only gets theWaitDelaybackstop (process-tree kill deferred).Unit and behavioral tests cover
WaitDelay, group kill on Unix, and the pipe-holding grandchild scenario.Reviewed by Cursor Bugbot for commit 36085bb. Configure here.