fix(bulk,types): align bulk + ChartConfig wire contracts with backend + add live test suite#98
Conversation
…ints
Two contract drifts surfaced while running cli/test/live/ against the
live k8s-stack-manager backend. Both were hidden because the existing
unit tests stub their own server and decode into whichever type stackctl
ships, so request shape was only validated against itself, never against
what the backend actually expects.
REQUEST SHAPE
- Stack-instance bulk endpoints (POST /api/v1/stack-instances/bulk/{deploy,
stop,clean,delete}) require `{"instance_ids": [...]}` with
`binding:"required"`. stackctl was sending `{"ids": [...]}` — backend
replied 400 "instance_ids is required" on every call.
- Template bulk endpoints (POST /api/v1/templates/bulk/{delete,publish,
unpublish}) require `{"template_ids": [...]}`. Same drift, same 400.
Split BulkRequest into two named request types so each endpoint group
encodes the right JSON field:
- BulkInstancesRequest { InstanceIDs []string `json:"instance_ids"` }
- BulkTemplatesRequest { TemplateIDs []string `json:"template_ids"` }
The 4 stack-instance + 3 template client methods now use the matching
type, and the stub-based unit tests decode into the same type so the
contract is locked in two places.
RESPONSE SHAPE
- Backend's BulkOperationResultItem ships `{instance_id, instance_name,
status: "success"|"error", error, log_id}` for stack-instance bulk and
the template-id equivalent for template bulk. stackctl's
BulkOperationResult had `{id, success bool, error}` — none of the
fields matched, so every response decoded as zero values. cmd/bulk.go's
printer would have rendered every result as "failed with empty ID".
Replaced the fields with the backend's actual shape (both InstanceID and
TemplateID as omitempty, plus Status as string, plus log_id) and exposed
two accessor methods so the cmd-side printer stays endpoint-agnostic:
- .ID() returns InstanceID if set, else TemplateID
- .Success() returns Status == "success"
BulkResponse also gained Total/Successful/Failed counter fields the
backend already returns.
LIVE TESTS
cli/test/live/live_test.go was last updated before the UUID migration —
parameters were typed `uint`, the health-probe path was `/api/v1/health`
(now `/health/live`), and `DeployResponse.ID` had been renamed to
`LogID`. Brought it back into compile so the suite actually runs:
- uint → string on every stack ID parameter and slice
- /api/v1/health → /health/live for the connectivity check
- deployLog.ID → deployLog.LogID (and stop/clean/redeploy)
- Added an STACKCTL_LIVE_API_KEY env path so the suite can authenticate
via the X-API-Key header instead of password Login(). The password
path still works; tests skip only when neither env is set.
VERIFICATION
- Full unit suite green: 7 packages pass.
- End-to-end against rancher-desktop k8s-stack-manager (built from
origin/main + the merged inline-charts fix):
stackctl bulk deploy olofm → 200 OK, "success"
stackctl bulk template unpublish <id> → 200 OK, "success"
stackctl bulk template publish <id> → 200 OK, "success"
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Expands cli/test/live/ from 2 deploy-heavy lifecycle tests into a
fast wire-contract smoke suite per endpoint group. Designed to catch
the class of bug we just shipped twice — request/response shape drift
between stackctl and the backend that stubbed unit tests can't see.
NEW FILES (all build-tag-gated by `live`):
helpers_test.go — shared fixtures + require/skip helpers
audit_live_test.go — list / filter by entity_type / export JSON
bulk_live_test.go — bulk template + bulk stack-instance wire
shape (drafts only, no real deploys)
cluster_live_test.go — list / get / quota / health / nodes
definition_live_test.go — list / get / export-import round-trip /
create-bare
favorite_live_test.go — add / list / remove (cleans up after)
notification_live_test.go — list / prefs round-trip
template_live_test.go — list / get / create-with-inline-charts
(the regression test for k8s-sm#264) /
publish lifecycle
EXISTING TESTS:
live_test.go's TestLiveWorkflow_FullLifecycle and _BulkOperations
actually roll out stack instances (helm install, PV allocation, golden-
db pull). On rancher-desktop they ate ~80 GiB and put the node into
DiskPressure during this very investigation. Both are now gated by
STACKCTL_LIVE_HEAVY=1 so the default suite stays lightweight and the
expensive coverage is opt-in for engineers with cluster capacity.
PRINCIPLES (also documented in helpers_test.go comment header):
1. Use newLiveClient(t) — handles connectivity + auth (API key OR
password login) and skips when neither env path is configured.
2. Never create real workloads in the default suite. Tests must be
safe to interleave on shared dev infra.
3. Register cleanup via t.Cleanup so failed assertions never leave
named resources behind.
4. Skip (don't fail) when the backend doesn't have the precondition
so partial-config environments stay green.
VERIFICATION:
Full suite against rancher-desktop k8s-stack-manager (origin/main +
the bulk-wire-contract fix in the parent commit):
16 passed, 2 skipped (heavy-gated), 0 failed in 1.5s
Already paid for itself: while writing the template test I noticed
stackctl's ChartConfig type silently drops five backend fields
(chart_path / locked_values / deploy_order / required /
build_pipeline_id) on read. That's a separate follow-up — issues #96
and #97 cover the CI + swagger-contract long-term plan.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… dropping 5 fields
stackctl's ChartConfig only carried 5 of the 10 fields the backend ships,
so reads of models.ChartConfig (definition charts) and
models.TemplateChartConfig (template charts) silently dropped:
- chart_path (both)
- deploy_order (both)
- build_pipeline_id (both)
- locked_values (template only)
- required (template only)
This was invisible in unit tests because httptest stubs decode requests
into stackctl's own ChartConfig — same blind spot that hid the bulk
field-name drift in the parent commit.
Type is now the union of the two backend models; LockedValues / Required
just stay zero on definition-chart reads. All five fields are omitempty
so create-request bodies for callers that don't use them are unaffected.
Test coverage:
- cli/pkg/types/types_test.go: extends the StackDefinition JSON
round-trip to populate every union field and assert decode integrity.
- cli/test/live/template_live_test.go: TestLiveTemplate_CreateWithInlineCharts
now sets chart_path / deploy_order / required / locked_values /
build_pipeline_id on the inline chart and asserts each round-trips
through POST /api/v1/templates and GET /api/v1/templates/:id.
Verified live against rancher-desktop k8s-stack-manager:
16 passed, 2 skipped (heavy-gated), 0 failed
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughRefactors bulk request/response types and expands ChartConfig, updates client/command code and mocks to the new wire contracts, and adds live integration tests and helpers. Bulk operations now use typed instance/template requests, per-item results carry string statuses and helper methods, and responses include aggregate counters. ChangesBulk Schema, ChartConfig, and Integration Tests
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies" Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
cli/pkg/client/client_test.go (1)
1800-1889: ⚡ Quick winLock request-body contract checks for all instance bulk endpoints.
TestBulkStop_Success,TestBulkClean_Success, andTestBulkDelete_Successdon’t assert the posted wire key, so a regression away frominstance_idswould still pass. MirrorTestBulkDeploy_Successby decodingtypes.BulkInstancesRequestand assertingbody.InstanceIDs.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/pkg/client/client_test.go` around lines 1800 - 1889, The three success tests (TestBulkStop_Success, TestBulkClean_Success, TestBulkDelete_Success) are missing assertions on the request body key; update each test's httptest handler to decode the incoming JSON into types.BulkInstancesRequest (as done in TestBulkDeploy_Success) and assert that body.InstanceIDs contains the expected IDs before writing the response; locate the handlers inside those test functions and add the decode+assert for body.InstanceIDs to ensure the wire key remains instance_ids.cli/cmd/bulk_test.go (1)
25-33: ⚡ Quick winAdd a template-result fixture to validate
template_idpaths.
sampleBulkResponse()only emitsInstanceID, but template command tests now validate template request contracts. This leaves template response contract (template_id) untested and can hide regressions inr.ID()/output serialization for template bulk flows.Suggested test-fixture split
func sampleBulkResponse() types.BulkResponse { return types.BulkResponse{ Results: []types.BulkOperationResult{ {InstanceID: "1", Status: "success"}, {InstanceID: "2", Status: "success"}, {InstanceID: "3", Status: "error", Error: "not found"}, }, } } + +func sampleBulkTemplateResponse() types.BulkResponse { + return types.BulkResponse{ + Results: []types.BulkOperationResult{ + {TemplateID: "1", Status: "success"}, + {TemplateID: "2", Status: "success"}, + {TemplateID: "3", Status: "error", Error: "not found"}, + }, + } +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/cmd/bulk_test.go` around lines 25 - 33, Update the test fixture so template bulk flows include and validate template IDs: modify sampleBulkResponse (or add a new fixture like sampleBulkResponseWithTemplates) to set the template identifier field on types.BulkOperationResult entries (e.g., add TemplateID: "template-1" / "template-2" for the success results and for the error case as appropriate) to ensure template_id paths are exercised and r.ID()/output serialization for template bulk responses is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cli/test/live/cluster_live_test.go`:
- Around line 47-51: The current t.Skipf calls skip on any error from the quota
lookup (variables cluster and err), which hides real failures; change the logic
around the quota fetch in the test to inspect err and only call t.Skipf when the
error matches explicit "quota optional/unavailable" conditions (e.g., a sentinel
error value, a specific error type or a well-known error message returned by the
quota lookup function), and otherwise call t.Fatalf or t.Error so unexpected
errors fail the test; update both places that call t.Skipf (the block
referencing cluster.ID and the later similar block) to perform this targeted
error check before skipping.
In `@cli/test/live/definition_live_test.go`:
- Around line 64-66: The test currently inspects bundle["charts"] which is
incorrect for payloads shaped as {schema_version, definition}; change the check
to read the nested charts under bundle["definition"] and assert against the
round-tripped definition's charts. Concretely, access origCharts via
bundle["definition"].(map[string]any)["charts"] (safely type-asserting) and
replace assert.NotEmpty(t, roundtrip.Charts, ...) with assert.NotEmpty(t,
roundtrip.Definition.Charts, ...) so the test verifies the nested
definition.charts field rather than a top-level charts key.
In `@cli/test/live/favorite_live_test.go`:
- Around line 60-63: Replace the manual loop and t.Errorf check after
RemoveFavorite with a testify assertion (assert/require) so failures are
consistent: instead of iterating over list2 and calling t.Errorf when
f.EntityType==entityType && f.EntityID==def.ID, use a testify helper (e.g.,
assert.NotContains or assert.False/assert.NoError with a computed found boolean)
from github.com/stretchr/testify to assert the favorite is absent — reference
the variables list2, f.EntityType, f.EntityID, def.ID and the RemoveFavorite
call when making the assertion.
In `@cli/test/live/helpers_test.go`:
- Around line 102-112: In skipUnlessHTTP200, replace the direct http.Get with an
http.Client that has a timeout (e.g. &http.Client{Timeout: 5*time.Second}) to
avoid hanging calls, use client.Get instead of http.Get, ensure resp.Body is
closed (defer resp.Body.Close()), and gate on resp.StatusCode == http.StatusOK:
if it's http.StatusNotFound call t.Skipf("endpoint %s returned 404 — feature not
enabled on this backend", path) and for any other non-200 status call
t.Skipf("endpoint %s returned non-200 status: %d", path, resp.StatusCode); keep
error handling to t.Skipf("endpoint %s unreachable: %v", path, err).
In `@cli/test/live/live_test.go`:
- Around line 34-38: The precondition currently allows API-key-only mode (checks
apiKey) which conflicts with the required live-test contract; change the check
in live_test.go so the test skips unless STACKCTL_LIVE_USER and
STACKCTL_LIVE_PASS are set (i.e., require user and pass rather than apiKey),
update the t.Skip message to state that STACKCTL_LIVE_USER and
STACKCTL_LIVE_PASS must be set (and note STACKCTL_LIVE_URL defaults to
http://localhost:8081 if needed), and make the same change for the second check
around lines 68-70 so both guards enforce the user/pass requirement.
- Around line 96-98: The heavy-test gate currently checks
os.Getenv("STACKCTL_LIVE_HEAVY") != "" which treats any non-empty value (e.g.,
"0" or "false") as enabled; change both checks of
os.Getenv("STACKCTL_LIVE_HEAVY") to require the explicit value "1" (i.e.,
os.Getenv("STACKCTL_LIVE_HEAVY") == "1") so heavy workflows only run when
STACKCTL_LIVE_HEAVY=1 is set; update the condition in both places in
cli/test/live/live_test.go and keep the t.Skip message as-is or clarify it to
mention "=1" if desired.
---
Nitpick comments:
In `@cli/cmd/bulk_test.go`:
- Around line 25-33: Update the test fixture so template bulk flows include and
validate template IDs: modify sampleBulkResponse (or add a new fixture like
sampleBulkResponseWithTemplates) to set the template identifier field on
types.BulkOperationResult entries (e.g., add TemplateID: "template-1" /
"template-2" for the success results and for the error case as appropriate) to
ensure template_id paths are exercised and r.ID()/output serialization for
template bulk responses is covered.
In `@cli/pkg/client/client_test.go`:
- Around line 1800-1889: The three success tests (TestBulkStop_Success,
TestBulkClean_Success, TestBulkDelete_Success) are missing assertions on the
request body key; update each test's httptest handler to decode the incoming
JSON into types.BulkInstancesRequest (as done in TestBulkDeploy_Success) and
assert that body.InstanceIDs contains the expected IDs before writing the
response; locate the handlers inside those test functions and add the
decode+assert for body.InstanceIDs to ensure the wire key remains instance_ids.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ce52dfe5-51a5-4e5c-81df-67bf099a3fb6
📒 Files selected for processing (15)
cli/cmd/bulk.gocli/cmd/bulk_test.gocli/pkg/client/client.gocli/pkg/client/client_test.gocli/pkg/types/types.gocli/pkg/types/types_test.gocli/test/live/audit_live_test.gocli/test/live/bulk_live_test.gocli/test/live/cluster_live_test.gocli/test/live/definition_live_test.gocli/test/live/favorite_live_test.gocli/test/live/helpers_test.gocli/test/live/live_test.gocli/test/live/notification_live_test.gocli/test/live/template_live_test.go
| apiKey := os.Getenv("STACKCTL_LIVE_API_KEY") | ||
| user, pass := os.Getenv("STACKCTL_LIVE_USER"), os.Getenv("STACKCTL_LIVE_PASS") | ||
| if apiKey == "" && (user == "" || pass == "") { | ||
| t.Skip("STACKCTL_LIVE_API_KEY or (STACKCTL_LIVE_USER + STACKCTL_LIVE_PASS) must be set for live tests") | ||
| } |
There was a problem hiding this comment.
Live auth precondition no longer matches the documented live-test contract.
Lines 34-38 and 68-70 make API-key-only mode valid, but the current live-test rule requires STACKCTL_LIVE_USER and STACKCTL_LIVE_PASS and skipping when unset. Please either enforce that here or update the live-test instruction contract in the same PR to keep policy and implementation aligned.
As per coding guidelines: “Live tests must require env vars STACKCTL_LIVE_USER and STACKCTL_LIVE_PASS and skip (not fail) if unset, with optional STACKCTL_LIVE_URL defaulting to http://localhost:8081.”
Also applies to: 68-70
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cli/test/live/live_test.go` around lines 34 - 38, The precondition currently
allows API-key-only mode (checks apiKey) which conflicts with the required
live-test contract; change the check in live_test.go so the test skips unless
STACKCTL_LIVE_USER and STACKCTL_LIVE_PASS are set (i.e., require user and pass
rather than apiKey), update the t.Skip message to state that STACKCTL_LIVE_USER
and STACKCTL_LIVE_PASS must be set (and note STACKCTL_LIVE_URL defaults to
http://localhost:8081 if needed), and make the same change for the second check
around lines 68-70 so both guards enforce the user/pass requirement.
CI was red because the e2e mock servers
(startE2ETemplateDefMockServer + startE2EQuietPipingMockServer) still
emitted the pre-fix bulk shape — {id, success} keyed by "ids". With the
new typed BulkInstancesRequest / BulkTemplatesRequest types, every bulk
test in cli/test/e2e/ decoded an empty results array and failed every
assertion.
Mocks now mirror the real backend:
- Request decode keys are "template_ids" / "instance_ids".
- Result entries carry "template_id" / "instance_id" + status:"success".
- Response envelope carries total / successful / failed counters.
Also addresses review comments from CodeRabbit on PR #98:
- live_test.go: STACKCTL_LIVE_HEAVY gating now requires exactly "1"
instead of any non-empty value, so STACKCTL_LIVE_HEAVY=0 stops
accidentally triggering ~80GB workload tests.
- helpers_test.go: skipUnlessHTTP200 now actually gates on 200,
uses a 5s timeout, and defers Body.Close.
- definition_live_test.go: chart-presence check reads the nested
bundle["definition"]["charts"] (was bundle["charts"], which is
always nil under the {schema_version, definition} envelope and
silently no-op'd the assertion).
- favorite_live_test.go: replaced t.Errorf with assert.False to match
the testify convention used elsewhere.
- cluster_live_test.go: quota + nodes skip paths now narrow on
documented "not found" / "not reachable" substrings; any other
error fails the test so contract regressions still surface.
The "live tests must require STACKCTL_LIVE_USER+PASS" comment is
intentionally not applied — API-key auth was added in this PR as a
parallel path and is the configuration used by CI.
Verified:
- go test ./... all packages pass
- go test -tags live ./test/live/... 16 passed, 2 skipped
- go vet ./... clean
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cli/test/e2e/cli_e2e_test.go`:
- Around line 890-909: The mock bulk-template handlers (the HTTP handler cases
for "/api/v1/templates/bulk/publish", "/api/v1/templates/bulk/unpublish" and
"/api/v1/templates/bulk/delete") currently only check JSON decode errors and
thus accept empty bodies or empty template_ids; update each handler to validate
that the decoded req.TemplateIDs slice is non-nil and has length > 0 and, if
empty, respond with http.StatusBadRequest and a JSON error (e.g.,
{"error":"template_ids required"}). Locate the req struct and results
construction in each case (the variables named req and results and the code that
appends map[string]interface{}{"template_id": id, "status": "success"}) and add
the guard immediately after json.NewDecoder(r.Body).Decode(&req) to return 400
when len(req.TemplateIDs) == 0.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ab58b0fb-4ea1-4402-b930-25372053016b
📒 Files selected for processing (6)
cli/test/e2e/cli_e2e_test.gocli/test/live/cluster_live_test.gocli/test/live/definition_live_test.gocli/test/live/favorite_live_test.gocli/test/live/helpers_test.gocli/test/live/live_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
- cli/test/live/helpers_test.go
- cli/test/live/definition_live_test.go
- cli/test/live/cluster_live_test.go
- cli/test/live/live_test.go
Mirrors the existing instance-bulk handlers and matches real backend behaviour: `template_ids` is `binding:"required"` server-side, so missing or empty arrays produce 400. The mock now does the same so a future bug where stackctl forgets to send the IDs fails the e2e test instead of silently appearing to "succeed" with an empty results array. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Actionable comments posted: 0 |
…kend Closes #96. Boots k8s-stack-manager (origin/main, api-only docker-compose profile) in the same job, applies a SQL fixture so require* helpers don't skip, mints a 1-day API key via the public auth endpoints, then runs the live suite. PR-time signal that the wire contracts still match — the class of bug stub-based unit tests can't see (recently surfaced in PR #98). What ships: .github/workflows/live-tests.yml - Triggers on push to main + every PR targeting main. - Clones k8s-sm main, `docker compose up backend` (no --profile → api-only, since frontend is gated to "full"). - Polls /health/live for up to 60s. - Pipes cli/test/live/testdata/ci-seed.sql into the mysql container. - Logs in as the env-seeded admin, mints a key via POST /api/v1/users/<id>/api-keys with expires_in_days=1, masks it in logs, exports as STACKCTL_LIVE_API_KEY. - `go test -tags live -timeout 10m ./test/live/...` - Dumps backend logs on failure. cli/test/live/testdata/ci-seed.sql - 1 cluster (no kubeconfig — test-connection will fail, tests expect that), 1 stack_definition + chart_config, 1 published stack_template + template_chart_config. - owner_id is looked up via SELECT … WHERE username='admin' so we don't need to inject the non-deterministic admin UUID. cli/test/live/doc.go - Documents the env vars + how to reproduce the CI flow locally (clone k8s-sm, compose up, apply seed, mint key, run suite). Design decisions (from the open questions in #96): - Bootstrap: backend SQL seed for fixtures, login-then-mint for the API key. Avoids hardcoding bcrypt/SHA-256 hashes in test data. - Backend pin: track origin/main. Surfaces cross-repo contract breaks immediately — that's exactly the signal #96 wants. - Workflow location: separate file. Easy to mark non-required if it ever gets flaky without blocking the rest of CI. Two contract-drift gotchas surfaced while validating against rancher-desktop k8s-sm (the value this workflow will catch on PRs): - POST /api/v1/users/<id>/api-keys requires expires_at OR expires_in_days. Initial draft sent neither and got 400. - The minted-key response field is `raw_key`, not `key`. Both fixed in the workflow + doc.go before pushing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The CI live-tests workflow runs against a stub cluster (no kubeconfig, no real kube-apiserver). The backend correctly returns 500 with "Failed to connect to cluster" for diagnostic endpoints in that state, so two subtests of TestLiveCluster_HealthAndTest were failing in CI while passing locally against rancher-desktop. Two changes: 1. The `health` subtest now skips on the same "cluster unreachable" conditions the `nodes` subtest already handles (added in #98 via CodeRabbit's tightening). Symmetric with `nodes` so a wire-shape regression on either endpoint still surfaces as a failure. 2. The substring set is extracted into isClusterUnreachable() and widened to include "failed to connect to cluster" — the actual literal in the backend's 500 response when kubeconfig is empty or the cluster is down. Verified against rancher-desktop k8s-sm: all three subtests still pass (real cluster, real kubeconfig). The CI api-only path will now skip the two diagnostic subtests cleanly instead of failing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…kend (#99) * ci(live): run cli/test/live/ on every PR against a freshly-booted backend Closes #96. Boots k8s-stack-manager (origin/main, api-only docker-compose profile) in the same job, applies a SQL fixture so require* helpers don't skip, mints a 1-day API key via the public auth endpoints, then runs the live suite. PR-time signal that the wire contracts still match — the class of bug stub-based unit tests can't see (recently surfaced in PR #98). What ships: .github/workflows/live-tests.yml - Triggers on push to main + every PR targeting main. - Clones k8s-sm main, `docker compose up backend` (no --profile → api-only, since frontend is gated to "full"). - Polls /health/live for up to 60s. - Pipes cli/test/live/testdata/ci-seed.sql into the mysql container. - Logs in as the env-seeded admin, mints a key via POST /api/v1/users/<id>/api-keys with expires_in_days=1, masks it in logs, exports as STACKCTL_LIVE_API_KEY. - `go test -tags live -timeout 10m ./test/live/...` - Dumps backend logs on failure. cli/test/live/testdata/ci-seed.sql - 1 cluster (no kubeconfig — test-connection will fail, tests expect that), 1 stack_definition + chart_config, 1 published stack_template + template_chart_config. - owner_id is looked up via SELECT … WHERE username='admin' so we don't need to inject the non-deterministic admin UUID. cli/test/live/doc.go - Documents the env vars + how to reproduce the CI flow locally (clone k8s-sm, compose up, apply seed, mint key, run suite). Design decisions (from the open questions in #96): - Bootstrap: backend SQL seed for fixtures, login-then-mint for the API key. Avoids hardcoding bcrypt/SHA-256 hashes in test data. - Backend pin: track origin/main. Surfaces cross-repo contract breaks immediately — that's exactly the signal #96 wants. - Workflow location: separate file. Easy to mark non-required if it ever gets flaky without blocking the rest of CI. Two contract-drift gotchas surfaced while validating against rancher-desktop k8s-sm (the value this workflow will catch on PRs): - POST /api/v1/users/<id>/api-keys requires expires_at OR expires_in_days. Initial draft sent neither and got 400. - The minted-key response field is `raw_key`, not `key`. Both fixed in the workflow + doc.go before pushing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(live): reset upstream dev bind-mount so production image starts k8s-stack-manager's docker-compose.yml bind-mounts ./backend:/app for hot-reload during dev. With target: production, that overlays the image's baked-in /app/main binary with the host source tree, which has no compiled binary — container exits immediately with "stat ./main: no such file or directory". Ships testdata/docker-compose.ci.yml as a compose override that resets the volumes list (Compose ≥1.28 !reset semantics), and wires both -f files into every compose call in the workflow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(live): redact JWT and raw API key from failure-path logs If login or api-key mint shapes drift, the workflow dumps the raw response to stderr for debugging. Both responses carry credentials (JWT in login, raw_key in mint) that shouldn't end up in CI logs even on the failure branch — masked outputs still surface the field names and status, which is the actual diagnostic value. Addresses CodeRabbit on PR #99. The sister findings about pinning actions/checkout@v6 and actions/setup-go@v6 to commit SHAs and adding persist-credentials: false are deliberately skipped — none of the four existing workflows in this repo follow that policy, and the right place to harden is a dedicated repo-wide change, not a one-workflow special case. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(live): disable container-side healthcheck (alpine image has no curl) Upstream backend healthcheck runs `curl -f http://localhost:8081/health/live` inside the container, but the production alpine image only ships ./main + helm + ca-certificates — no curl. The healthcheck fails forever and `docker compose up --wait` hits its timeout. We already poll /health/live from the host in the next step, so the in-container check is redundant. Override with `test: ["NONE"]` and keep the host-side gate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(live): use wget for healthcheck (alpine image has no curl) Last attempt set `test: ["NONE"]` to disable the broken curl-based healthcheck, but `docker compose up --wait` errors out with "no healthcheck configured" instead of treating "running" as ready. Alpine's base image ships busybox `wget` (curl is not installed), so override the healthcheck command to use that. Shorter interval + retries chosen so a healthy backend is flagged ready within ~10s rather than upstream's 75s start_period. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(live): skip cluster health/nodes when backend can't reach cluster The CI live-tests workflow runs against a stub cluster (no kubeconfig, no real kube-apiserver). The backend correctly returns 500 with "Failed to connect to cluster" for diagnostic endpoints in that state, so two subtests of TestLiveCluster_HealthAndTest were failing in CI while passing locally against rancher-desktop. Two changes: 1. The `health` subtest now skips on the same "cluster unreachable" conditions the `nodes` subtest already handles (added in #98 via CodeRabbit's tightening). Symmetric with `nodes` so a wire-shape regression on either endpoint still surfaces as a failure. 2. The substring set is extracted into isClusterUnreachable() and widened to include "failed to connect to cluster" — the actual literal in the backend's 500 response when kubeconfig is empty or the cluster is down. Verified against rancher-desktop k8s-sm: all three subtests still pass (real cluster, real kubeconfig). The CI api-only path will now skip the two diagnostic subtests cleanly instead of failing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(live): tighten cluster-unreachable substring set Drop "unavailable" — too generic, could mask a legitimate 503 from a wire-shape regression. Add "unreachable" instead, which matches the backend's models.ClusterUnreachable status constant and is the more accurate token for a cluster-down condition. Addresses CodeRabbit on PR #99. The companion suggestion to convert TestLiveCluster_HealthAndTest to a table-driven + t.Parallel() form is deliberately skipped: none of the four sibling *_live_test.go files in this package follow that pattern (the convention applies to unit tests; live tests against a shared backend run serially by design), and a mass-refactor of the live suite is out of scope for a workflow PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Olof Mattsson <olof.mattsson@klaravik.se> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…er (#101) * test(contract): assert pkg/types request shapes against backend swagger Closes #97. Adds a unit-test-only contract layer that validates every stackctl request struct against the backend's published OpenAPI (Swagger 2.0) schema. Catches the class of bug shipped four times this week (#95, #98, k8s-sm#264, the BulkOperationResult shape) where unit / integration / e2e tests all stub the backend and decode requests into stackctl's OWN types — so a json-tag drift between stackctl and the backend decodes cleanly in tests but 400s the moment a real backend reads the bytes. What ships: cli/test/contract/testdata/swagger.json Vendored copy of omattsson/k8s-stack-manager:backend/docs/swagger.json. Pinned (not auto-fetched) so backend-side wire changes show up as explicit diffs in this repo. cli/test/contract/refresh-swagger.sh One-shot refresh: `./refresh-swagger.sh [ref]`. Validates the payload is JSON, sha256-compares before overwriting, and tells you which test to re-run. Pinned to a ref so partial backend rollouts don't break stackctl CI. cli/test/contract/contract_test.go Embeds swagger.json via go:embed, walks 9 request structs via reflection, and asserts bidirectionally: 1. Every Go json tag exists as a property in swagger. Catches stale or typoed Go-side fields. 2. Every swagger "required" field has a Go json tag. Catches missing-required-field bugs. 3. Field types align (string/boolean/integer/number/array/object). Covers: CreateClusterRequest, UpdateClusterRequest, BulkInstancesRequest, BulkTemplatesRequest, RegisterRequest, LoginRequest, CreateAPIKeyRequest, ResetPasswordRequest, CreateCleanupPolicyRequest. Plus TestSwaggerVendorIntegrity — sanity floor on the vendored copy so a truncated swagger.json doesn't manifest as a pile of confusing per-case failures. Failure-mode validated with a synthetic drift (matches the pre-fix/bulk-wire-contract bug exactly): Error: Go struct field Ids (json:"ids") has no matching property in swagger schema — typo, or backend doesn't accept this field Error: swagger schema requires field "instance_ids" but the Go struct has no field with that json tag — request will fail validation V1 deliberately skips: - Response types (issue calls out polymorphism for GET endpoints; a separate iteration). - CreateTemplateRequest / UpdateTemplateRequest — backend's swag annotation uses an unexported struct (createTemplateRequest) so no schema is published. Worth a tiny upstream PR to make the type exported + annotated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(contract): refresh-swagger.sh — list python3 in deps, validate via stdin Both address CodeRabbit on PR #101: the script invokes python3 for JSON validation but the requirements comment listed only curl and shasum, and the validation embedded the temp filename in the inline Python string (mktemp output is normally safe, but stdin redirection is cleaner regardless and removes any quoting concern). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Olof Mattsson <olof.mattsson@klaravik.se> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Three related fixes after running the live integration suite against rancher-desktop k8s-stack-manager.
1. Bulk wire contract (
65fc94c)stackctl's bulk endpoints sent
{"ids": [...]}but the backend handler binds into{"instance_ids": [...]}(stack-instance bulk) or{"template_ids": [...]}(template bulk) withbinding:"required"— every bulk call returned400 instance_ids is required. Response shape was also wrong: stackctl decoded{id, success bool}but the backend ships{instance_id|template_id, status: "success"|"error", log_id}.Split into two named request types (
BulkInstancesRequest/BulkTemplatesRequest); reshapedBulkOperationResultwithID()/Success()accessor methods so callers don't need to know which side they're on.2. Live integration test suite (
144dd88)Stub-based unit tests can't catch contract drift between stackctl and the backend (they decode requests into stackctl's own types — the bug is invisible). Expanded
cli/test/live/from 2 deploy-heavy lifecycle tests into 7 endpoint-group wire-contract smoke files (audit / bulk / cluster / definition / favorite / notification / template). Designed to be fast and not create real workloads on shared infra.The existing
TestLiveWorkflow_FullLifecycle+_BulkOperationsactually roll out stack instances (~80 GiB of golden-db data each); both are now gated behindSTACKCTL_LIVE_HEAVY=1so the default suite stays lightweight.3. ChartConfig 5-field gap (
a072bdf)The new template test surfaced that stackctl's
ChartConfigonly carried 5 of the 10 fields the backend models ship —chart_path/deploy_order/build_pipeline_id/locked_values/requiredwere silently dropped on read. Same stub-test blind spot as #1. Widened to the union ofmodels.ChartConfig+models.TemplateChartConfig; all 5 new fieldsomitempty.Verification
Full suite against rancher-desktop k8s-stack-manager (origin/main):
Unit + integration suites also pass;
go vet ./...clean.Related follow-ups (tracked separately)
cli/test/live/against a booted backend (so this class of bug is caught before merge, not by hand).Test plan
go test ./...(unit + integration)go test -tags live -count=1 ./test/live/...against rancher-desktop with API keygo vet ./...🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests