fix(bulk,types): align bulk + ChartConfig wire contracts with backend + add live test suite by omattsson · Pull Request #98 · omattsson/stackctl

omattsson · 2026-05-28T09:49:07Z

Summary

Three related fixes after running the live integration suite against rancher-desktop k8s-stack-manager.

1. Bulk wire contract (`65fc94c`)

stackctl's bulk endpoints sent {"ids": [...]} but the backend handler binds into {"instance_ids": [...]} (stack-instance bulk) or {"template_ids": [...]} (template bulk) with binding:"required" — every bulk call returned 400 instance_ids is required. Response shape was also wrong: stackctl decoded {id, success bool} but the backend ships {instance_id|template_id, status: "success"|"error", log_id}.

Split into two named request types (BulkInstancesRequest / BulkTemplatesRequest); reshaped BulkOperationResult with ID() / Success() accessor methods so callers don't need to know which side they're on.

2. Live integration test suite (`144dd88`)

Stub-based unit tests can't catch contract drift between stackctl and the backend (they decode requests into stackctl's own types — the bug is invisible). Expanded cli/test/live/ from 2 deploy-heavy lifecycle tests into 7 endpoint-group wire-contract smoke files (audit / bulk / cluster / definition / favorite / notification / template). Designed to be fast and not create real workloads on shared infra.

The existing TestLiveWorkflow_FullLifecycle + _BulkOperations actually roll out stack instances (~80 GiB of golden-db data each); both are now gated behind STACKCTL_LIVE_HEAVY=1 so the default suite stays lightweight.

3. ChartConfig 5-field gap (`a072bdf`)

The new template test surfaced that stackctl's ChartConfig only carried 5 of the 10 fields the backend models ship — chart_path / deploy_order / build_pipeline_id / locked_values / required were silently dropped on read. Same stub-test blind spot as #1. Widened to the union of models.ChartConfig + models.TemplateChartConfig; all 5 new fields omitempty.

Verification

Full suite against rancher-desktop k8s-stack-manager (origin/main):

16 passed, 2 skipped (heavy-gated), 0 failed in 1.5s

Unit + integration suites also pass; go vet ./... clean.

Related follow-ups (tracked separately)

test(live): run cli/test/live/ in CI against a booted backend #96 — CI hook to run cli/test/live/ against a booted backend (so this class of bug is caught before merge, not by hand).
test(contract): assert pkg/types request shapes against backend swagger #97 — Swagger-contract assertions in unit tests, so stub decodes catch type drift instead of masking it.

Test plan

go test ./... (unit + integration)
go test -tags live -count=1 ./test/live/... against rancher-desktop with API key
go vet ./...

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Bulk operations now return per-item status strings plus aggregate counters (total/successful/failed); CLI shows item IDs and colored status in table and quiet modes.
- Chart configuration expanded with richer metadata (chart path/version, release name, deploy order, required/locked flags, build pipeline).
Tests
- Updated unit/e2e tests for new bulk wire contracts and added extensive live integration tests for audit logs, clusters, definitions, templates, favorites, notifications, and bulk workflows.

…ints Two contract drifts surfaced while running cli/test/live/ against the live k8s-stack-manager backend. Both were hidden because the existing unit tests stub their own server and decode into whichever type stackctl ships, so request shape was only validated against itself, never against what the backend actually expects. REQUEST SHAPE - Stack-instance bulk endpoints (POST /api/v1/stack-instances/bulk/{deploy, stop,clean,delete}) require `{"instance_ids": [...]}` with `binding:"required"`. stackctl was sending `{"ids": [...]}` — backend replied 400 "instance_ids is required" on every call. - Template bulk endpoints (POST /api/v1/templates/bulk/{delete,publish, unpublish}) require `{"template_ids": [...]}`. Same drift, same 400. Split BulkRequest into two named request types so each endpoint group encodes the right JSON field: - BulkInstancesRequest { InstanceIDs []string `json:"instance_ids"` } - BulkTemplatesRequest { TemplateIDs []string `json:"template_ids"` } The 4 stack-instance + 3 template client methods now use the matching type, and the stub-based unit tests decode into the same type so the contract is locked in two places. RESPONSE SHAPE - Backend's BulkOperationResultItem ships `{instance_id, instance_name, status: "success"|"error", error, log_id}` for stack-instance bulk and the template-id equivalent for template bulk. stackctl's BulkOperationResult had `{id, success bool, error}` — none of the fields matched, so every response decoded as zero values. cmd/bulk.go's printer would have rendered every result as "failed with empty ID". Replaced the fields with the backend's actual shape (both InstanceID and TemplateID as omitempty, plus Status as string, plus log_id) and exposed two accessor methods so the cmd-side printer stays endpoint-agnostic: - .ID() returns InstanceID if set, else TemplateID - .Success() returns Status == "success" BulkResponse also gained Total/Successful/Failed counter fields the backend already returns. LIVE TESTS cli/test/live/live_test.go was last updated before the UUID migration — parameters were typed `uint`, the health-probe path was `/api/v1/health` (now `/health/live`), and `DeployResponse.ID` had been renamed to `LogID`. Brought it back into compile so the suite actually runs: - uint → string on every stack ID parameter and slice - /api/v1/health → /health/live for the connectivity check - deployLog.ID → deployLog.LogID (and stop/clean/redeploy) - Added an STACKCTL_LIVE_API_KEY env path so the suite can authenticate via the X-API-Key header instead of password Login(). The password path still works; tests skip only when neither env is set. VERIFICATION - Full unit suite green: 7 packages pass. - End-to-end against rancher-desktop k8s-stack-manager (built from origin/main + the merged inline-charts fix): stackctl bulk deploy olofm → 200 OK, "success" stackctl bulk template unpublish <id> → 200 OK, "success" stackctl bulk template publish <id> → 200 OK, "success" Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Expands cli/test/live/ from 2 deploy-heavy lifecycle tests into a fast wire-contract smoke suite per endpoint group. Designed to catch the class of bug we just shipped twice — request/response shape drift between stackctl and the backend that stubbed unit tests can't see. NEW FILES (all build-tag-gated by `live`): helpers_test.go — shared fixtures + require/skip helpers audit_live_test.go — list / filter by entity_type / export JSON bulk_live_test.go — bulk template + bulk stack-instance wire shape (drafts only, no real deploys) cluster_live_test.go — list / get / quota / health / nodes definition_live_test.go — list / get / export-import round-trip / create-bare favorite_live_test.go — add / list / remove (cleans up after) notification_live_test.go — list / prefs round-trip template_live_test.go — list / get / create-with-inline-charts (the regression test for k8s-sm#264) / publish lifecycle EXISTING TESTS: live_test.go's TestLiveWorkflow_FullLifecycle and _BulkOperations actually roll out stack instances (helm install, PV allocation, golden- db pull). On rancher-desktop they ate ~80 GiB and put the node into DiskPressure during this very investigation. Both are now gated by STACKCTL_LIVE_HEAVY=1 so the default suite stays lightweight and the expensive coverage is opt-in for engineers with cluster capacity. PRINCIPLES (also documented in helpers_test.go comment header): 1. Use newLiveClient(t) — handles connectivity + auth (API key OR password login) and skips when neither env path is configured. 2. Never create real workloads in the default suite. Tests must be safe to interleave on shared dev infra. 3. Register cleanup via t.Cleanup so failed assertions never leave named resources behind. 4. Skip (don't fail) when the backend doesn't have the precondition so partial-config environments stay green. VERIFICATION: Full suite against rancher-desktop k8s-stack-manager (origin/main + the bulk-wire-contract fix in the parent commit): 16 passed, 2 skipped (heavy-gated), 0 failed in 1.5s Already paid for itself: while writing the template test I noticed stackctl's ChartConfig type silently drops five backend fields (chart_path / locked_values / deploy_order / required / build_pipeline_id) on read. That's a separate follow-up — issues #96 and #97 cover the CI + swagger-contract long-term plan. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… dropping 5 fields stackctl's ChartConfig only carried 5 of the 10 fields the backend ships, so reads of models.ChartConfig (definition charts) and models.TemplateChartConfig (template charts) silently dropped: - chart_path (both) - deploy_order (both) - build_pipeline_id (both) - locked_values (template only) - required (template only) This was invisible in unit tests because httptest stubs decode requests into stackctl's own ChartConfig — same blind spot that hid the bulk field-name drift in the parent commit. Type is now the union of the two backend models; LockedValues / Required just stay zero on definition-chart reads. All five fields are omitempty so create-request bodies for callers that don't use them are unaffected. Test coverage: - cli/pkg/types/types_test.go: extends the StackDefinition JSON round-trip to populate every union field and assert decode integrity. - cli/test/live/template_live_test.go: TestLiveTemplate_CreateWithInlineCharts now sets chart_path / deploy_order / required / locked_values / build_pipeline_id on the inline chart and asserts each round-trips through POST /api/v1/templates and GET /api/v1/templates/:id. Verified live against rancher-desktop k8s-stack-manager: 16 passed, 2 skipped (heavy-gated), 0 failed Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai · 2026-05-28T09:49:33Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9e4b1718-84e0-4338-8925-9f75be2bd5b5

📥 Commits

Reviewing files that changed from the base of the PR and between 588db83 and 69e8588.

📒 Files selected for processing (1)

cli/test/e2e/cli_e2e_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

cli/test/e2e/cli_e2e_test.go

📝 Walkthrough

Walkthrough

Refactors bulk request/response types and expands ChartConfig, updates client/command code and mocks to the new wire contracts, and adds live integration tests and helpers. Bulk operations now use typed instance/template requests, per-item results carry string statuses and helper methods, and responses include aggregate counters.

Changes

Bulk Schema, ChartConfig, and Integration Tests

Layer / File(s)	Summary
Bulk operation schema refactor `cli/pkg/types/types.go`	`BulkOperationResult` switched from `id`+`success` boolean to instance/template keyed identifiers with string `status`, optional `log_id`, and helper methods `ID()`/`Success()`. `BulkResponse` adds `total`/`successful`/`failed`. `BulkInstancesRequest` and `BulkTemplatesRequest` replace generic `BulkRequest`.
ChartConfig schema expansion `cli/pkg/types/types.go`, `cli/pkg/types/types_test.go`	`ChartConfig` expanded with chart path/version, release name, default/locked values, deploy order, required flag, and build pipeline ID; tests updated to assert round-trip preservation.
Client bulk endpoints migration `cli/pkg/client/client.go`, `cli/pkg/client/client_test.go`	Bulk instance endpoints (`BulkDeploy`, `BulkStop`, `BulkClean`, `BulkDelete`) and template endpoints now POST typed request structs (`BulkInstancesRequest`, `BulkTemplatesRequest`). Tests/mocks updated to new `BulkOperationResult`/`BulkResponse` shapes and use `Success()`/`ID()` helper calls.
Command bulk result printing & tests `cli/cmd/bulk.go`, `cli/cmd/bulk_test.go`	`printBulkResults` updated to use `r.Success()` and `r.ID()`; table status uses `printer.StatusColor(r.Status)`. YAML/JSON test assertions updated to `instance_id`/`status` and status semantics.
E2E mock servers `cli/test/e2e/cli_e2e_test.go`	E2E mock handlers updated to accept `instance_ids`/`template_ids`, reject empty lists with `instance_ids required`, and return `results` plus aggregate counters.
Live test infrastructure `cli/test/live/helpers_test.go`, `cli/test/live/live_test.go`	Add live helpers (unique prefix, cleanup helpers, prerequisite checks, skip helper). `newLiveClient` supports `STACKCTL_LIVE_API_KEY` (header-based) with conditional login; instance IDs and log IDs handled as strings; health probe path changed to `/health/live`.
Live integration tests `cli/test/live/*`	Add live tests for bulk (templates and instances), definitions (list/get, export/import, create), templates (inline charts, publish lifecycle), audit logs, clusters (list/get/quota/health/nodes), notifications (list/preferences), and favorites (add/list/remove).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

test(contract): assert pkg/types request shapes against backend swagger #97: PR aligns CLI and tests with the updated bulk API wire contract (typed requests, per-item status fields, and aggregate counters).
test(live): run cli/test/live/ in CI against a booted backend #96: Adds live-test artifacts and API-key test wiring referenced by the issue; the PR implements those live-test changes.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.66% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the three main changes: bulk wire-contract alignment, ChartConfig field additions, and live test suite expansion.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/bulk-wire-contract

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (2)

cli/pkg/client/client_test.go (1)

1800-1889: ⚡ Quick win

Lock request-body contract checks for all instance bulk endpoints.

TestBulkStop_Success, TestBulkClean_Success, and TestBulkDelete_Success don’t assert the posted wire key, so a regression away from instance_ids would still pass. Mirror TestBulkDeploy_Success by decoding types.BulkInstancesRequest and asserting body.InstanceIDs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/pkg/client/client_test.go` around lines 1800 - 1889, The three success
tests (TestBulkStop_Success, TestBulkClean_Success, TestBulkDelete_Success) are
missing assertions on the request body key; update each test's httptest handler
to decode the incoming JSON into types.BulkInstancesRequest (as done in
TestBulkDeploy_Success) and assert that body.InstanceIDs contains the expected
IDs before writing the response; locate the handlers inside those test functions
and add the decode+assert for body.InstanceIDs to ensure the wire key remains
instance_ids.

cli/cmd/bulk_test.go (1)

25-33: ⚡ Quick win

Add a template-result fixture to validate template_id paths.

sampleBulkResponse() only emits InstanceID, but template command tests now validate template request contracts. This leaves template response contract (template_id) untested and can hide regressions in r.ID()/output serialization for template bulk flows.

Suggested test-fixture split

 func sampleBulkResponse() types.BulkResponse {
 	return types.BulkResponse{
 		Results: []types.BulkOperationResult{
 			{InstanceID: "1", Status: "success"},
 			{InstanceID: "2", Status: "success"},
 			{InstanceID: "3", Status: "error", Error: "not found"},
 		},
 	}
 }
+
+func sampleBulkTemplateResponse() types.BulkResponse {
+	return types.BulkResponse{
+		Results: []types.BulkOperationResult{
+			{TemplateID: "1", Status: "success"},
+			{TemplateID: "2", Status: "success"},
+			{TemplateID: "3", Status: "error", Error: "not found"},
+		},
+	}
+}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/cmd/bulk_test.go` around lines 25 - 33, Update the test fixture so
template bulk flows include and validate template IDs: modify sampleBulkResponse
(or add a new fixture like sampleBulkResponseWithTemplates) to set the template
identifier field on types.BulkOperationResult entries (e.g., add TemplateID:
"template-1" / "template-2" for the success results and for the error case as
appropriate) to ensure template_id paths are exercised and r.ID()/output
serialization for template bulk responses is covered.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cli/test/live/cluster_live_test.go`:
- Around line 47-51: The current t.Skipf calls skip on any error from the quota
lookup (variables cluster and err), which hides real failures; change the logic
around the quota fetch in the test to inspect err and only call t.Skipf when the
error matches explicit "quota optional/unavailable" conditions (e.g., a sentinel
error value, a specific error type or a well-known error message returned by the
quota lookup function), and otherwise call t.Fatalf or t.Error so unexpected
errors fail the test; update both places that call t.Skipf (the block
referencing cluster.ID and the later similar block) to perform this targeted
error check before skipping.

In `@cli/test/live/definition_live_test.go`:
- Around line 64-66: The test currently inspects bundle["charts"] which is
incorrect for payloads shaped as {schema_version, definition}; change the check
to read the nested charts under bundle["definition"] and assert against the
round-tripped definition's charts. Concretely, access origCharts via
bundle["definition"].(map[string]any)["charts"] (safely type-asserting) and
replace assert.NotEmpty(t, roundtrip.Charts, ...) with assert.NotEmpty(t,
roundtrip.Definition.Charts, ...) so the test verifies the nested
definition.charts field rather than a top-level charts key.

In `@cli/test/live/favorite_live_test.go`:
- Around line 60-63: Replace the manual loop and t.Errorf check after
RemoveFavorite with a testify assertion (assert/require) so failures are
consistent: instead of iterating over list2 and calling t.Errorf when
f.EntityType==entityType && f.EntityID==def.ID, use a testify helper (e.g.,
assert.NotContains or assert.False/assert.NoError with a computed found boolean)
from github.com/stretchr/testify to assert the favorite is absent — reference
the variables list2, f.EntityType, f.EntityID, def.ID and the RemoveFavorite
call when making the assertion.

In `@cli/test/live/helpers_test.go`:
- Around line 102-112: In skipUnlessHTTP200, replace the direct http.Get with an
http.Client that has a timeout (e.g. &http.Client{Timeout: 5*time.Second}) to
avoid hanging calls, use client.Get instead of http.Get, ensure resp.Body is
closed (defer resp.Body.Close()), and gate on resp.StatusCode == http.StatusOK:
if it's http.StatusNotFound call t.Skipf("endpoint %s returned 404 — feature not
enabled on this backend", path) and for any other non-200 status call
t.Skipf("endpoint %s returned non-200 status: %d", path, resp.StatusCode); keep
error handling to t.Skipf("endpoint %s unreachable: %v", path, err).

In `@cli/test/live/live_test.go`:
- Around line 34-38: The precondition currently allows API-key-only mode (checks
apiKey) which conflicts with the required live-test contract; change the check
in live_test.go so the test skips unless STACKCTL_LIVE_USER and
STACKCTL_LIVE_PASS are set (i.e., require user and pass rather than apiKey),
update the t.Skip message to state that STACKCTL_LIVE_USER and
STACKCTL_LIVE_PASS must be set (and note STACKCTL_LIVE_URL defaults to
http://localhost:8081 if needed), and make the same change for the second check
around lines 68-70 so both guards enforce the user/pass requirement.
- Around line 96-98: The heavy-test gate currently checks
os.Getenv("STACKCTL_LIVE_HEAVY") != "" which treats any non-empty value (e.g.,
"0" or "false") as enabled; change both checks of
os.Getenv("STACKCTL_LIVE_HEAVY") to require the explicit value "1" (i.e.,
os.Getenv("STACKCTL_LIVE_HEAVY") == "1") so heavy workflows only run when
STACKCTL_LIVE_HEAVY=1 is set; update the condition in both places in
cli/test/live/live_test.go and keep the t.Skip message as-is or clarify it to
mention "=1" if desired.

---

Nitpick comments:
In `@cli/cmd/bulk_test.go`:
- Around line 25-33: Update the test fixture so template bulk flows include and
validate template IDs: modify sampleBulkResponse (or add a new fixture like
sampleBulkResponseWithTemplates) to set the template identifier field on
types.BulkOperationResult entries (e.g., add TemplateID: "template-1" /
"template-2" for the success results and for the error case as appropriate) to
ensure template_id paths are exercised and r.ID()/output serialization for
template bulk responses is covered.

In `@cli/pkg/client/client_test.go`:
- Around line 1800-1889: The three success tests (TestBulkStop_Success,
TestBulkClean_Success, TestBulkDelete_Success) are missing assertions on the
request body key; update each test's httptest handler to decode the incoming
JSON into types.BulkInstancesRequest (as done in TestBulkDeploy_Success) and
assert that body.InstanceIDs contains the expected IDs before writing the
response; locate the handlers inside those test functions and add the
decode+assert for body.InstanceIDs to ensure the wire key remains instance_ids.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ce52dfe5-51a5-4e5c-81df-67bf099a3fb6

📥 Commits

Reviewing files that changed from the base of the PR and between 1e2e941 and a072bdf.

📒 Files selected for processing (15)

cli/cmd/bulk.go
cli/cmd/bulk_test.go
cli/pkg/client/client.go
cli/pkg/client/client_test.go
cli/pkg/types/types.go
cli/pkg/types/types_test.go
cli/test/live/audit_live_test.go
cli/test/live/bulk_live_test.go
cli/test/live/cluster_live_test.go
cli/test/live/definition_live_test.go
cli/test/live/favorite_live_test.go
cli/test/live/helpers_test.go
cli/test/live/live_test.go
cli/test/live/notification_live_test.go
cli/test/live/template_live_test.go

coderabbitai · 2026-05-28T09:57:35Z

+	apiKey := os.Getenv("STACKCTL_LIVE_API_KEY")
+	user, pass := os.Getenv("STACKCTL_LIVE_USER"), os.Getenv("STACKCTL_LIVE_PASS")
+	if apiKey == "" && (user == "" || pass == "") {
+		t.Skip("STACKCTL_LIVE_API_KEY or (STACKCTL_LIVE_USER + STACKCTL_LIVE_PASS) must be set for live tests")
 	}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Live auth precondition no longer matches the documented live-test contract.

Lines 34-38 and 68-70 make API-key-only mode valid, but the current live-test rule requires STACKCTL_LIVE_USER and STACKCTL_LIVE_PASS and skipping when unset. Please either enforce that here or update the live-test instruction contract in the same PR to keep policy and implementation aligned.

As per coding guidelines: “Live tests must require env vars STACKCTL_LIVE_USER and STACKCTL_LIVE_PASS and skip (not fail) if unset, with optional STACKCTL_LIVE_URL defaulting to http://localhost:8081.”

Also applies to: 68-70

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/test/live/live_test.go` around lines 34 - 38, The precondition currently allows API-key-only mode (checks apiKey) which conflicts with the required live-test contract; change the check in live_test.go so the test skips unless STACKCTL_LIVE_USER and STACKCTL_LIVE_PASS are set (i.e., require user and pass rather than apiKey), update the t.Skip message to state that STACKCTL_LIVE_USER and STACKCTL_LIVE_PASS must be set (and note STACKCTL_LIVE_URL defaults to http://localhost:8081 if needed), and make the same change for the second check around lines 68-70 so both guards enforce the user/pass requirement.

CI was red because the e2e mock servers (startE2ETemplateDefMockServer + startE2EQuietPipingMockServer) still emitted the pre-fix bulk shape — {id, success} keyed by "ids". With the new typed BulkInstancesRequest / BulkTemplatesRequest types, every bulk test in cli/test/e2e/ decoded an empty results array and failed every assertion. Mocks now mirror the real backend: - Request decode keys are "template_ids" / "instance_ids". - Result entries carry "template_id" / "instance_id" + status:"success". - Response envelope carries total / successful / failed counters. Also addresses review comments from CodeRabbit on PR #98: - live_test.go: STACKCTL_LIVE_HEAVY gating now requires exactly "1" instead of any non-empty value, so STACKCTL_LIVE_HEAVY=0 stops accidentally triggering ~80GB workload tests. - helpers_test.go: skipUnlessHTTP200 now actually gates on 200, uses a 5s timeout, and defers Body.Close. - definition_live_test.go: chart-presence check reads the nested bundle["definition"]["charts"] (was bundle["charts"], which is always nil under the {schema_version, definition} envelope and silently no-op'd the assertion). - favorite_live_test.go: replaced t.Errorf with assert.False to match the testify convention used elsewhere. - cluster_live_test.go: quota + nodes skip paths now narrow on documented "not found" / "not reachable" substrings; any other error fails the test so contract regressions still surface. The "live tests must require STACKCTL_LIVE_USER+PASS" comment is intentionally not applied — API-key auth was added in this PR as a parallel path and is the configuration used by CI. Verified: - go test ./... all packages pass - go test -tags live ./test/live/... 16 passed, 2 skipped - go vet ./... clean Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cli/test/e2e/cli_e2e_test.go`:
- Around line 890-909: The mock bulk-template handlers (the HTTP handler cases
for "/api/v1/templates/bulk/publish", "/api/v1/templates/bulk/unpublish" and
"/api/v1/templates/bulk/delete") currently only check JSON decode errors and
thus accept empty bodies or empty template_ids; update each handler to validate
that the decoded req.TemplateIDs slice is non-nil and has length > 0 and, if
empty, respond with http.StatusBadRequest and a JSON error (e.g.,
{"error":"template_ids required"}). Locate the req struct and results
construction in each case (the variables named req and results and the code that
appends map[string]interface{}{"template_id": id, "status": "success"}) and add
the guard immediately after json.NewDecoder(r.Body).Decode(&req) to return 400
when len(req.TemplateIDs) == 0.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ab58b0fb-4ea1-4402-b930-25372053016b

📥 Commits

Reviewing files that changed from the base of the PR and between a072bdf and 588db83.

📒 Files selected for processing (6)

cli/test/e2e/cli_e2e_test.go
cli/test/live/cluster_live_test.go
cli/test/live/definition_live_test.go
cli/test/live/favorite_live_test.go
cli/test/live/helpers_test.go
cli/test/live/live_test.go

🚧 Files skipped from review as they are similar to previous changes (4)

cli/test/live/helpers_test.go
cli/test/live/definition_live_test.go
cli/test/live/cluster_live_test.go
cli/test/live/live_test.go

Mirrors the existing instance-bulk handlers and matches real backend behaviour: `template_ids` is `binding:"required"` server-side, so missing or empty arrays produce 400. The mock now does the same so a future bug where stackctl forgets to send the IDs fails the e2e test instead of silently appearing to "succeed" with an empty results array. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai · 2026-05-28T10:51:04Z

Actionable comments posted: 0

…kend Closes #96. Boots k8s-stack-manager (origin/main, api-only docker-compose profile) in the same job, applies a SQL fixture so require* helpers don't skip, mints a 1-day API key via the public auth endpoints, then runs the live suite. PR-time signal that the wire contracts still match — the class of bug stub-based unit tests can't see (recently surfaced in PR #98). What ships: .github/workflows/live-tests.yml - Triggers on push to main + every PR targeting main. - Clones k8s-sm main, `docker compose up backend` (no --profile → api-only, since frontend is gated to "full"). - Polls /health/live for up to 60s. - Pipes cli/test/live/testdata/ci-seed.sql into the mysql container. - Logs in as the env-seeded admin, mints a key via POST /api/v1/users/<id>/api-keys with expires_in_days=1, masks it in logs, exports as STACKCTL_LIVE_API_KEY. - `go test -tags live -timeout 10m ./test/live/...` - Dumps backend logs on failure. cli/test/live/testdata/ci-seed.sql - 1 cluster (no kubeconfig — test-connection will fail, tests expect that), 1 stack_definition + chart_config, 1 published stack_template + template_chart_config. - owner_id is looked up via SELECT … WHERE username='admin' so we don't need to inject the non-deterministic admin UUID. cli/test/live/doc.go - Documents the env vars + how to reproduce the CI flow locally (clone k8s-sm, compose up, apply seed, mint key, run suite). Design decisions (from the open questions in #96): - Bootstrap: backend SQL seed for fixtures, login-then-mint for the API key. Avoids hardcoding bcrypt/SHA-256 hashes in test data. - Backend pin: track origin/main. Surfaces cross-repo contract breaks immediately — that's exactly the signal #96 wants. - Workflow location: separate file. Easy to mark non-required if it ever gets flaky without blocking the rest of CI. Two contract-drift gotchas surfaced while validating against rancher-desktop k8s-sm (the value this workflow will catch on PRs): - POST /api/v1/users/<id>/api-keys requires expires_at OR expires_in_days. Initial draft sent neither and got 400. - The minted-key response field is `raw_key`, not `key`. Both fixed in the workflow + doc.go before pushing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The CI live-tests workflow runs against a stub cluster (no kubeconfig, no real kube-apiserver). The backend correctly returns 500 with "Failed to connect to cluster" for diagnostic endpoints in that state, so two subtests of TestLiveCluster_HealthAndTest were failing in CI while passing locally against rancher-desktop. Two changes: 1. The `health` subtest now skips on the same "cluster unreachable" conditions the `nodes` subtest already handles (added in #98 via CodeRabbit's tightening). Symmetric with `nodes` so a wire-shape regression on either endpoint still surfaces as a failure. 2. The substring set is extracted into isClusterUnreachable() and widened to include "failed to connect to cluster" — the actual literal in the backend's 500 response when kubeconfig is empty or the cluster is down. Verified against rancher-desktop k8s-sm: all three subtests still pass (real cluster, real kubeconfig). The CI api-only path will now skip the two diagnostic subtests cleanly instead of failing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…kend (#99) * ci(live): run cli/test/live/ on every PR against a freshly-booted backend Closes #96. Boots k8s-stack-manager (origin/main, api-only docker-compose profile) in the same job, applies a SQL fixture so require* helpers don't skip, mints a 1-day API key via the public auth endpoints, then runs the live suite. PR-time signal that the wire contracts still match — the class of bug stub-based unit tests can't see (recently surfaced in PR #98). What ships: .github/workflows/live-tests.yml - Triggers on push to main + every PR targeting main. - Clones k8s-sm main, `docker compose up backend` (no --profile → api-only, since frontend is gated to "full"). - Polls /health/live for up to 60s. - Pipes cli/test/live/testdata/ci-seed.sql into the mysql container. - Logs in as the env-seeded admin, mints a key via POST /api/v1/users/<id>/api-keys with expires_in_days=1, masks it in logs, exports as STACKCTL_LIVE_API_KEY. - `go test -tags live -timeout 10m ./test/live/...` - Dumps backend logs on failure. cli/test/live/testdata/ci-seed.sql - 1 cluster (no kubeconfig — test-connection will fail, tests expect that), 1 stack_definition + chart_config, 1 published stack_template + template_chart_config. - owner_id is looked up via SELECT … WHERE username='admin' so we don't need to inject the non-deterministic admin UUID. cli/test/live/doc.go - Documents the env vars + how to reproduce the CI flow locally (clone k8s-sm, compose up, apply seed, mint key, run suite). Design decisions (from the open questions in #96): - Bootstrap: backend SQL seed for fixtures, login-then-mint for the API key. Avoids hardcoding bcrypt/SHA-256 hashes in test data. - Backend pin: track origin/main. Surfaces cross-repo contract breaks immediately — that's exactly the signal #96 wants. - Workflow location: separate file. Easy to mark non-required if it ever gets flaky without blocking the rest of CI. Two contract-drift gotchas surfaced while validating against rancher-desktop k8s-sm (the value this workflow will catch on PRs): - POST /api/v1/users/<id>/api-keys requires expires_at OR expires_in_days. Initial draft sent neither and got 400. - The minted-key response field is `raw_key`, not `key`. Both fixed in the workflow + doc.go before pushing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(live): reset upstream dev bind-mount so production image starts k8s-stack-manager's docker-compose.yml bind-mounts ./backend:/app for hot-reload during dev. With target: production, that overlays the image's baked-in /app/main binary with the host source tree, which has no compiled binary — container exits immediately with "stat ./main: no such file or directory". Ships testdata/docker-compose.ci.yml as a compose override that resets the volumes list (Compose ≥1.28 !reset semantics), and wires both -f files into every compose call in the workflow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(live): redact JWT and raw API key from failure-path logs If login or api-key mint shapes drift, the workflow dumps the raw response to stderr for debugging. Both responses carry credentials (JWT in login, raw_key in mint) that shouldn't end up in CI logs even on the failure branch — masked outputs still surface the field names and status, which is the actual diagnostic value. Addresses CodeRabbit on PR #99. The sister findings about pinning actions/checkout@v6 and actions/setup-go@v6 to commit SHAs and adding persist-credentials: false are deliberately skipped — none of the four existing workflows in this repo follow that policy, and the right place to harden is a dedicated repo-wide change, not a one-workflow special case. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(live): disable container-side healthcheck (alpine image has no curl) Upstream backend healthcheck runs `curl -f http://localhost:8081/health/live` inside the container, but the production alpine image only ships ./main + helm + ca-certificates — no curl. The healthcheck fails forever and `docker compose up --wait` hits its timeout. We already poll /health/live from the host in the next step, so the in-container check is redundant. Override with `test: ["NONE"]` and keep the host-side gate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(live): use wget for healthcheck (alpine image has no curl) Last attempt set `test: ["NONE"]` to disable the broken curl-based healthcheck, but `docker compose up --wait` errors out with "no healthcheck configured" instead of treating "running" as ready. Alpine's base image ships busybox `wget` (curl is not installed), so override the healthcheck command to use that. Shorter interval + retries chosen so a healthy backend is flagged ready within ~10s rather than upstream's 75s start_period. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(live): skip cluster health/nodes when backend can't reach cluster The CI live-tests workflow runs against a stub cluster (no kubeconfig, no real kube-apiserver). The backend correctly returns 500 with "Failed to connect to cluster" for diagnostic endpoints in that state, so two subtests of TestLiveCluster_HealthAndTest were failing in CI while passing locally against rancher-desktop. Two changes: 1. The `health` subtest now skips on the same "cluster unreachable" conditions the `nodes` subtest already handles (added in #98 via CodeRabbit's tightening). Symmetric with `nodes` so a wire-shape regression on either endpoint still surfaces as a failure. 2. The substring set is extracted into isClusterUnreachable() and widened to include "failed to connect to cluster" — the actual literal in the backend's 500 response when kubeconfig is empty or the cluster is down. Verified against rancher-desktop k8s-sm: all three subtests still pass (real cluster, real kubeconfig). The CI api-only path will now skip the two diagnostic subtests cleanly instead of failing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(live): tighten cluster-unreachable substring set Drop "unavailable" — too generic, could mask a legitimate 503 from a wire-shape regression. Add "unreachable" instead, which matches the backend's models.ClusterUnreachable status constant and is the more accurate token for a cluster-down condition. Addresses CodeRabbit on PR #99. The companion suggestion to convert TestLiveCluster_HealthAndTest to a table-driven + t.Parallel() form is deliberately skipped: none of the four sibling *_live_test.go files in this package follow that pattern (the convention applies to unit tests; live tests against a shared backend run serially by design), and a mass-refactor of the live suite is out of scope for a workflow PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Olof Mattsson <olof.mattsson@klaravik.se> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…er (#101) * test(contract): assert pkg/types request shapes against backend swagger Closes #97. Adds a unit-test-only contract layer that validates every stackctl request struct against the backend's published OpenAPI (Swagger 2.0) schema. Catches the class of bug shipped four times this week (#95, #98, k8s-sm#264, the BulkOperationResult shape) where unit / integration / e2e tests all stub the backend and decode requests into stackctl's OWN types — so a json-tag drift between stackctl and the backend decodes cleanly in tests but 400s the moment a real backend reads the bytes. What ships: cli/test/contract/testdata/swagger.json Vendored copy of omattsson/k8s-stack-manager:backend/docs/swagger.json. Pinned (not auto-fetched) so backend-side wire changes show up as explicit diffs in this repo. cli/test/contract/refresh-swagger.sh One-shot refresh: `./refresh-swagger.sh [ref]`. Validates the payload is JSON, sha256-compares before overwriting, and tells you which test to re-run. Pinned to a ref so partial backend rollouts don't break stackctl CI. cli/test/contract/contract_test.go Embeds swagger.json via go:embed, walks 9 request structs via reflection, and asserts bidirectionally: 1. Every Go json tag exists as a property in swagger. Catches stale or typoed Go-side fields. 2. Every swagger "required" field has a Go json tag. Catches missing-required-field bugs. 3. Field types align (string/boolean/integer/number/array/object). Covers: CreateClusterRequest, UpdateClusterRequest, BulkInstancesRequest, BulkTemplatesRequest, RegisterRequest, LoginRequest, CreateAPIKeyRequest, ResetPasswordRequest, CreateCleanupPolicyRequest. Plus TestSwaggerVendorIntegrity — sanity floor on the vendored copy so a truncated swagger.json doesn't manifest as a pile of confusing per-case failures. Failure-mode validated with a synthetic drift (matches the pre-fix/bulk-wire-contract bug exactly): Error: Go struct field Ids (json:"ids") has no matching property in swagger schema — typo, or backend doesn't accept this field Error: swagger schema requires field "instance_ids" but the Go struct has no field with that json tag — request will fail validation V1 deliberately skips: - Response types (issue calls out polymorphism for GET endpoints; a separate iteration). - CreateTemplateRequest / UpdateTemplateRequest — backend's swag annotation uses an unexported struct (createTemplateRequest) so no schema is published. Worth a tiny upstream PR to make the type exported + annotated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(contract): refresh-swagger.sh — list python3 in deps, validate via stdin Both address CodeRabbit on PR #101: the script invokes python3 for JSON validation but the requirements comment listed only curl and shasum, and the validation embedded the temp filename in the inline Python string (mktemp output is normally safe, but stdin redirection is cleaner regardless and removes any quoting concern). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Olof Mattsson <olof.mattsson@klaravik.se> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Olof Mattsson and others added 3 commits May 28, 2026 10:14

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Comment thread cli/test/e2e/cli_e2e_test.go

omattsson merged commit 565745d into main May 28, 2026
7 checks passed

omattsson deleted the fix/bulk-wire-contract branch May 28, 2026 10:54

omattsson mentioned this pull request May 28, 2026

ci(live): run cli/test/live/ on every PR against a freshly-booted backend #99

Merged

3 tasks

omattsson mentioned this pull request May 28, 2026

test(contract): assert pkg/types request shapes against backend swagger #101

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bulk,types): align bulk + ChartConfig wire contracts with backend + add live test suite#98

fix(bulk,types): align bulk + ChartConfig wire contracts with backend + add live test suite#98
omattsson merged 5 commits into
mainfrom
fix/bulk-wire-contract

omattsson commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

omattsson commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Bulk wire contract (65fc94c)

2. Live integration test suite (144dd88)

3. ChartConfig 5-field gap (a072bdf)

Verification

Related follow-ups (tracked separately)

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

omattsson commented May 28, 2026 •

edited by coderabbitai Bot

Loading

1. Bulk wire contract (`65fc94c`)

2. Live integration test suite (`144dd88`)

3. ChartConfig 5-field gap (`a072bdf`)

coderabbitai Bot commented May 28, 2026 •

edited

Loading