Skip to content

Add upgrade apply for the CLI and API#5411

Open
JAORMX wants to merge 1 commit into
upgrade-4-applierfrom
upgrade-5-apply
Open

Add upgrade apply for the CLI and API#5411
JAORMX wants to merge 1 commit into
upgrade-4-applierfrom
upgrade-5-apply

Conversation

@JAORMX
Copy link
Copy Markdown
Collaborator

@JAORMX JAORMX commented Jun 1, 2026

Summary

With the Applier in place (PR #5410), expose it to users so they can apply an upgrade while preserving configuration, instead of manually re-running a workload with a new image. Part of RFC THV-0068, local scope.

  • Add thv upgrade apply <name>: runs the check, shows the candidate image, new env vars, and any permission/transport/network posture drift, then prompts for confirmation. --dry-run reports the plan without applying; --env/--secret supply values for newly required variables; --yes (or a non-interactive shell) skips the prompt and fails loudly on missing required values; --image-verification mirrors thv run.
  • Add POST /api/v1beta/workloads/{name}/upgrade, delegating to the same Applier so all clients share one apply path. The API path is always non-interactive (detached validator) and sources image verification from server config; the request body can only supply env/secret values — it cannot redirect the image or weaken verification. Apply failures return a sanitized 422 with the detailed cause logged server-side, so secret references in an error chain are never echoed to the caller.

Type of change

  • New feature

Test plan

  • Unit tests (task test) — API: happy path, no-op, 404/400, no-secret-leak (asserts a secret reference in the request never appears in success or 422 responses), route ordering. CLI: output-formatting helper.
  • Linting (task lint-fix)

Changes

File Change
cmd/thv/app/upgrade.go apply subcommand + flags + confirmation
pkg/api/v1/workloads_upgrade.go POST .../upgrade handler
pkg/api/v1/workloads.go, workload_types.go Route + request type
docs/cli/*, docs/server/* Regenerated reference

Does this introduce a user-facing change?

Yes — thv upgrade apply and POST /api/v1beta/workloads/{name}/upgrade. The command stops and replaces the existing workload (verifying and pulling the candidate image first); there is no automatic rollback, which the help text states.

Large PR Justification

This PR is ~1,007 lines, but only 330 are hand-written source; the remainder is generated reference and tests:

  • 374 lines are auto-generated OpenAPI (docs/server/*) and CLI reference (docs/cli/*), produced by task docs. They must be regenerated and committed together (CI verifies them) and cannot be split.
  • 303 lines are tests for the new handler and CLI helper.
  • The 330 source lines are the CLI apply subcommand and the API POST handler — two thin callers of the same Applier that the user explicitly chose to ship together so all clients share one apply path. Splitting CLI from API here would duplicate the wiring and the "all clients benefit" goal of the RFC, and both are well under the threshold individually.

Special notes for reviewers

PR 5 of 6 in the RFC THV-0068 stack; based on #5410. The API path hardcodes the detached env-var validator and sources the verify mode from server config — a client cannot weaken verification or redirect the image via the request body.

🤖 Generated with Claude Code

@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label Jun 1, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@JAORMX JAORMX force-pushed the upgrade-4-applier branch from 391531e to ba2195c Compare June 1, 2026 09:32
@JAORMX JAORMX force-pushed the upgrade-5-apply branch from f901540 to a9af988 Compare June 1, 2026 09:32
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 1, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 1, 2026

Codecov Report

❌ Patch coverage is 55.00000% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.88%. Comparing base (a87510f) to head (9a11b14).

Files with missing lines Patch % Lines
pkg/api/v1/workloads.go 0.00% 9 Missing ⚠️
pkg/api/v1/workloads_upgrade.go 70.96% 6 Missing and 3 partials ⚠️
Additional details and impacted files
@@                  Coverage Diff                  @@
##           upgrade-4-applier    #5411      +/-   ##
=====================================================
+ Coverage              68.83%   68.88%   +0.04%     
=====================================================
  Files                    634      634              
  Lines                  64301    64341      +40     
=====================================================
+ Hits                   44263    44319      +56     
+ Misses                 16758    16742      -16     
  Partials                3280     3280              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JAORMX JAORMX force-pushed the upgrade-4-applier branch from ba2195c to a93ff85 Compare June 1, 2026 10:12
@JAORMX JAORMX force-pushed the upgrade-5-apply branch from a9af988 to 74e4064 Compare June 1, 2026 10:12
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 1, 2026
@JAORMX JAORMX marked this pull request as ready for review June 1, 2026 10:21
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 1, 2026
@github-actions github-actions Bot dismissed their stale review June 1, 2026 10:26

Large PR justification has been provided. Thank you!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@JAORMX JAORMX force-pushed the upgrade-4-applier branch from a93ff85 to 4ad3554 Compare June 1, 2026 13:16
@JAORMX JAORMX force-pushed the upgrade-5-apply branch from 74e4064 to 664b47b Compare June 1, 2026 13:16
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 1, 2026
With the Applier in place, expose it to users. This lets CLI users and
API clients apply an upgrade while preserving their configuration,
instead of manually re-running a workload with a new image.

Add a thv upgrade apply <name> command. It runs the check, shows the
candidate image, new env vars, and any permission/transport/network
posture drift, then prompts for confirmation. --dry-run reports the plan
without applying; --env/--secret supply values for newly required
variables; --yes (or a non-interactive shell) skips the prompt and fails
loudly on missing required values; --image-verification mirrors thv run.

Add POST /api/v1beta/workloads/{name}/upgrade, delegating to the same
Applier so all clients share one apply path. The API path is always
non-interactive (detached validator) and sources image verification from
server config; the request body can only supply env/secret values, never
redirect the image or weaken verification. Apply failures return a
sanitized 422 with the detailed cause logged server-side, so secret
references in an error chain are never echoed to the caller.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX force-pushed the upgrade-4-applier branch from 4ad3554 to a87510f Compare June 1, 2026 13:27
@JAORMX JAORMX force-pushed the upgrade-5-apply branch from 664b47b to 9a11b14 Compare June 1, 2026 13:27
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 1, 2026
Comment on lines +165 to +181
if err != nil {
// Map preparation failures (resolve/verify/pull/build/validate) to a 422:
// the request was well-formed but the candidate could not be applied. The
// running workload is untouched on any of these.
//
// apierrors.ErrorHandler returns the error message verbatim to the client
// for 4xx codes, so the underlying error must NOT be wrapped into the
// response: it may reference the request's secret parameters (e.g. an
// env/secret validation failure). Log the detailed cause server-side and
// return a sanitized, secret-free message to the caller. The log line
// carries only the error chain (which itself references secrets by name,
// never resolved values).
slog.Error("failed to apply workload upgrade", "workload", name, "error", err)
return httperr.WithCode(
fmt.Errorf("failed to apply upgrade for workload %q", name),
http.StatusUnprocessableEntity,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maps every Apply error to 422, but the "the running workload is untouched on any of these" comment only holds for the preparation phase. Once Apply reaches the destructive recreate, two of its error paths fire after the workload has already been torn down:

  • UpdateWorkload failing (stop/delete already initiated)
  • completion() failing (old workload deleted, replacement didn't start)

In those cases a 422 is misleading: it tells the client the request couldn't be processed and nothing changed, when the workload may actually be gone and not running. A client seeing 422 will reasonably assume it's safe to retry against an intact workload, when it's really a 5xx-class state.

Could we distinguish the two? A typed/sentinel error out of Apply (e.g. preparation vs recreate) lets the handler return 422 for prep failures and 500 for recreate failures. At minimum it'd be worth fixing the comment so it doesn't claim "untouched" for the post-UpdateWorkload paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants