Add upgrade apply for the CLI and API#5411
Conversation
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## upgrade-4-applier #5411 +/- ##
=====================================================
+ Coverage 68.83% 68.88% +0.04%
=====================================================
Files 634 634
Lines 64301 64341 +40
=====================================================
+ Hits 44263 44319 +56
+ Misses 16758 16742 -16
Partials 3280 3280 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
With the Applier in place, expose it to users. This lets CLI users and
API clients apply an upgrade while preserving their configuration,
instead of manually re-running a workload with a new image.
Add a thv upgrade apply <name> command. It runs the check, shows the
candidate image, new env vars, and any permission/transport/network
posture drift, then prompts for confirmation. --dry-run reports the plan
without applying; --env/--secret supply values for newly required
variables; --yes (or a non-interactive shell) skips the prompt and fails
loudly on missing required values; --image-verification mirrors thv run.
Add POST /api/v1beta/workloads/{name}/upgrade, delegating to the same
Applier so all clients share one apply path. The API path is always
non-interactive (detached validator) and sources image verification from
server config; the request body can only supply env/secret values, never
redirect the image or weaken verification. Apply failures return a
sanitized 422 with the detailed cause logged server-side, so secret
references in an error chain are never echoed to the caller.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| if err != nil { | ||
| // Map preparation failures (resolve/verify/pull/build/validate) to a 422: | ||
| // the request was well-formed but the candidate could not be applied. The | ||
| // running workload is untouched on any of these. | ||
| // | ||
| // apierrors.ErrorHandler returns the error message verbatim to the client | ||
| // for 4xx codes, so the underlying error must NOT be wrapped into the | ||
| // response: it may reference the request's secret parameters (e.g. an | ||
| // env/secret validation failure). Log the detailed cause server-side and | ||
| // return a sanitized, secret-free message to the caller. The log line | ||
| // carries only the error chain (which itself references secrets by name, | ||
| // never resolved values). | ||
| slog.Error("failed to apply workload upgrade", "workload", name, "error", err) | ||
| return httperr.WithCode( | ||
| fmt.Errorf("failed to apply upgrade for workload %q", name), | ||
| http.StatusUnprocessableEntity, | ||
| ) |
There was a problem hiding this comment.
This maps every Apply error to 422, but the "the running workload is untouched on any of these" comment only holds for the preparation phase. Once Apply reaches the destructive recreate, two of its error paths fire after the workload has already been torn down:
UpdateWorkloadfailing (stop/delete already initiated)completion()failing (old workload deleted, replacement didn't start)
In those cases a 422 is misleading: it tells the client the request couldn't be processed and nothing changed, when the workload may actually be gone and not running. A client seeing 422 will reasonably assume it's safe to retry against an intact workload, when it's really a 5xx-class state.
Could we distinguish the two? A typed/sentinel error out of Apply (e.g. preparation vs recreate) lets the handler return 422 for prep failures and 500 for recreate failures. At minimum it'd be worth fixing the comment so it doesn't claim "untouched" for the post-UpdateWorkload paths.
Summary
With the
Applierin place (PR #5410), expose it to users so they can apply an upgrade while preserving configuration, instead of manually re-running a workload with a new image. Part of RFC THV-0068, local scope.thv upgrade apply <name>: runs the check, shows the candidate image, new env vars, and any permission/transport/network posture drift, then prompts for confirmation.--dry-runreports the plan without applying;--env/--secretsupply values for newly required variables;--yes(or a non-interactive shell) skips the prompt and fails loudly on missing required values;--image-verificationmirrorsthv run.POST /api/v1beta/workloads/{name}/upgrade, delegating to the sameApplierso all clients share one apply path. The API path is always non-interactive (detached validator) and sources image verification from server config; the request body can only supply env/secret values — it cannot redirect the image or weaken verification. Apply failures return a sanitized 422 with the detailed cause logged server-side, so secret references in an error chain are never echoed to the caller.Type of change
Test plan
task test) — API: happy path, no-op, 404/400, no-secret-leak (asserts a secret reference in the request never appears in success or 422 responses), route ordering. CLI: output-formatting helper.task lint-fix)Changes
cmd/thv/app/upgrade.goapplysubcommand + flags + confirmationpkg/api/v1/workloads_upgrade.goPOST .../upgradehandlerpkg/api/v1/workloads.go,workload_types.godocs/cli/*,docs/server/*Does this introduce a user-facing change?
Yes —
thv upgrade applyandPOST /api/v1beta/workloads/{name}/upgrade. The command stops and replaces the existing workload (verifying and pulling the candidate image first); there is no automatic rollback, which the help text states.Large PR Justification
This PR is ~1,007 lines, but only 330 are hand-written source; the remainder is generated reference and tests:
docs/server/*) and CLI reference (docs/cli/*), produced bytask docs. They must be regenerated and committed together (CI verifies them) and cannot be split.applysubcommand and the APIPOSThandler — two thin callers of the sameApplierthat the user explicitly chose to ship together so all clients share one apply path. Splitting CLI from API here would duplicate the wiring and the "all clients benefit" goal of the RFC, and both are well under the threshold individually.Special notes for reviewers
PR 5 of 6 in the RFC THV-0068 stack; based on #5410. The API path hardcodes the detached env-var validator and sources the verify mode from server config — a client cannot weaken verification or redirect the image via the request body.
🤖 Generated with Claude Code