fix: retry stale coordinator lease before command#158
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed May 25, 2026, 4:40 AM ET / 08:40 UTC. Summary Reproducibility: no. live reproduction was established in this review. Source inspection shows current main returns the before-command SSH wait error directly, and the PR adds the replacement branch for that failure path. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance: Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land the bounded retry after redacted live proof shows stale lease release, one replacement acquire, resync, and command start while preserving the documented exclusions for explicit, kept, no-sync, sync-only, and custom-slug runs. Do we have a high-confidence way to reproduce the issue? No live reproduction was established in this review. Source inspection shows current main returns the before-command SSH wait error directly, and the PR adds the replacement branch for that failure path. Is this the best way to solve the issue? Yes, the bounded one-shot replacement is a narrow maintainable fix for this failure mode. The remaining blocker is proof that the live coordinator path behaves as intended, not a different implementation direction. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 388c99fe94e7. Label changesLabel changes:
Label justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc6d61b4f3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| strings.TrimSpace(requestedSlug) != "" { | ||
| return false | ||
| } | ||
| return shouldReleaseRunLease(acquired, keep, keepOnFailure, stopAfter, err) |
There was a problem hiding this comment.
Skip replacement when keep-on-failure is requested
shouldReplaceLeaseAfterBeforeCommandSSHFailure currently delegates to shouldReleaseRunLease, which makes replacement depend on --stop-after; as a result, runs with --keep-on-failure still replace the lease when --stop-after=always or --stop-after=failure is set. That contradicts the new run behavior contract for keep-on-failure runs and can unexpectedly discard the original failing environment (and trigger an extra acquire/sync cycle) exactly when users asked to preserve failure state.
Useful? React with 👍 / 👎.
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
Summary
Verification
go test ./internal/cligo test ./...npm run docs:check/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode local(clean)