fix(worker): roll back partial Hetzner provisioning#783
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed July 2, 2026, 3:40 PM ET / 19:40 UTC. Summary Reproducibility: yes. Source inspection of the PR head shows a delete=false release can leave providerKeyCleanupPending in the stored lease, and the alarm predicate later treats that ended lease as cleanup-needed. Review metrics: 2 noteworthy metrics.
Root-cause cluster Members:
Proposal only: this assessment does not dispatch repair, suppress jobs, mutate sibling items, close, or merge anything. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Preserve no-delete releases as a hard stop for failed Hetzner cleanup while keeping exact-ID rollback/retry cleanup for normal delete paths, remove changelog churn, and require redacted live Hetzner proof before merge. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection of the PR head shows a delete=false release can leave providerKeyCleanupPending in the stored lease, and the alarm predicate later treats that ended lease as cleanup-needed. Is this the best way to solve the issue? No. The rollback/retry direction is narrow, but this implementation is not the best landing path until it honors no-delete release semantics and real provider proof confirms actual Hetzner cleanup behavior. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against ab1c29f3fd40. Label changesLabel justifications:
Evidence reviewedSecurity concerns:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
7f59c64 to
ff66ca2
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ff66ca2444
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| !leaseIsLive(lease) && | ||
| ((lease.cloudID && (lease.cleanupError || lease.cleanupStartedAt)) || | ||
| lease.providerKeyCleanupPending), |
There was a problem hiding this comment.
Honor no-delete releases with pending Hetzner cleanup
When a partial Hetzner create persists providerKeyCleanupPending together with a serverID/cloudID, the owner can still release the failed lease with {"delete": false}. That path marks releaseDeletesServer=false but leaves providerKeyCleanupPending, so this predicate schedules background cleanup anyway; HetznerProvider.releaseLease then deletes the server before deleting the SSH key. This violates the no-delete release request and can remove a server the user explicitly chose to retain.
Useful? React with 👍 / 👎.
ff66ca2 to
7106f10
Compare
7106f10 to
732a909
Compare
Summary
Closes #761
Verification
Exact head:
732a909fcd4e5495cb47c6fdf61fdd2fdc4d11d6(rebased onto the 0.34.1 development head). Exact-head CI is terminal green: https://github.com/openclaw/crabbox/actions/runs/28615334393npm test --prefix worker— 762 tests passed after rebasing onto currentmainnpm run format:check --prefix workernpm run lint --prefix workernpm run check --prefix workernpm run build --prefix workerscripts/check-docs.sh— 51 command docs, 69 providers, 202 Markdown files, and the generated docs siteNo new live Hetzner canary was run because the credential session remains locked. The regression suite exercises successful rollback, server and key 404 idempotency, fallback continuation, durable server/key ID recovery after rollback failure, server-before-key cleanup, exact-ID persistence/retry, retry backoff and early-alarm suppression, and shared/reused-key retention.