Skip to content

fix(bulk,types): align bulk + ChartConfig wire contracts with backend + add live test suite#98

Merged
omattsson merged 5 commits into
mainfrom
fix/bulk-wire-contract
May 28, 2026
Merged

fix(bulk,types): align bulk + ChartConfig wire contracts with backend + add live test suite#98
omattsson merged 5 commits into
mainfrom
fix/bulk-wire-contract

Conversation

@omattsson

@omattsson omattsson commented May 28, 2026

Copy link
Copy Markdown
Owner

Summary

Three related fixes after running the live integration suite against rancher-desktop k8s-stack-manager.

1. Bulk wire contract (65fc94c)

stackctl's bulk endpoints sent {"ids": [...]} but the backend handler binds into {"instance_ids": [...]} (stack-instance bulk) or {"template_ids": [...]} (template bulk) with binding:"required" — every bulk call returned 400 instance_ids is required. Response shape was also wrong: stackctl decoded {id, success bool} but the backend ships {instance_id|template_id, status: "success"|"error", log_id}.

Split into two named request types (BulkInstancesRequest / BulkTemplatesRequest); reshaped BulkOperationResult with ID() / Success() accessor methods so callers don't need to know which side they're on.

2. Live integration test suite (144dd88)

Stub-based unit tests can't catch contract drift between stackctl and the backend (they decode requests into stackctl's own types — the bug is invisible). Expanded cli/test/live/ from 2 deploy-heavy lifecycle tests into 7 endpoint-group wire-contract smoke files (audit / bulk / cluster / definition / favorite / notification / template). Designed to be fast and not create real workloads on shared infra.

The existing TestLiveWorkflow_FullLifecycle + _BulkOperations actually roll out stack instances (~80 GiB of golden-db data each); both are now gated behind STACKCTL_LIVE_HEAVY=1 so the default suite stays lightweight.

3. ChartConfig 5-field gap (a072bdf)

The new template test surfaced that stackctl's ChartConfig only carried 5 of the 10 fields the backend models ship — chart_path / deploy_order / build_pipeline_id / locked_values / required were silently dropped on read. Same stub-test blind spot as #1. Widened to the union of models.ChartConfig + models.TemplateChartConfig; all 5 new fields omitempty.

Verification

Full suite against rancher-desktop k8s-stack-manager (origin/main):

16 passed, 2 skipped (heavy-gated), 0 failed in 1.5s

Unit + integration suites also pass; go vet ./... clean.

Related follow-ups (tracked separately)

Test plan

  • go test ./... (unit + integration)
  • go test -tags live -count=1 ./test/live/... against rancher-desktop with API key
  • go vet ./...

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Bulk operations now return per-item status strings plus aggregate counters (total/successful/failed); CLI shows item IDs and colored status in table and quiet modes.
    • Chart configuration expanded with richer metadata (chart path/version, release name, deploy order, required/locked flags, build pipeline).
  • Tests

    • Updated unit/e2e tests for new bulk wire contracts and added extensive live integration tests for audit logs, clusters, definitions, templates, favorites, notifications, and bulk workflows.

Review Change Stack

Olof Mattsson and others added 3 commits May 28, 2026 10:14
…ints

Two contract drifts surfaced while running cli/test/live/ against the
live k8s-stack-manager backend. Both were hidden because the existing
unit tests stub their own server and decode into whichever type stackctl
ships, so request shape was only validated against itself, never against
what the backend actually expects.

REQUEST SHAPE
- Stack-instance bulk endpoints (POST /api/v1/stack-instances/bulk/{deploy,
  stop,clean,delete}) require `{"instance_ids": [...]}` with
  `binding:"required"`. stackctl was sending `{"ids": [...]}` — backend
  replied 400 "instance_ids is required" on every call.
- Template bulk endpoints (POST /api/v1/templates/bulk/{delete,publish,
  unpublish}) require `{"template_ids": [...]}`. Same drift, same 400.

Split BulkRequest into two named request types so each endpoint group
encodes the right JSON field:

  - BulkInstancesRequest { InstanceIDs []string `json:"instance_ids"` }
  - BulkTemplatesRequest { TemplateIDs []string `json:"template_ids"` }

The 4 stack-instance + 3 template client methods now use the matching
type, and the stub-based unit tests decode into the same type so the
contract is locked in two places.

RESPONSE SHAPE
- Backend's BulkOperationResultItem ships `{instance_id, instance_name,
  status: "success"|"error", error, log_id}` for stack-instance bulk and
  the template-id equivalent for template bulk. stackctl's
  BulkOperationResult had `{id, success bool, error}` — none of the
  fields matched, so every response decoded as zero values. cmd/bulk.go's
  printer would have rendered every result as "failed with empty ID".

Replaced the fields with the backend's actual shape (both InstanceID and
TemplateID as omitempty, plus Status as string, plus log_id) and exposed
two accessor methods so the cmd-side printer stays endpoint-agnostic:

  - .ID()       returns InstanceID if set, else TemplateID
  - .Success()  returns Status == "success"

BulkResponse also gained Total/Successful/Failed counter fields the
backend already returns.

LIVE TESTS
cli/test/live/live_test.go was last updated before the UUID migration —
parameters were typed `uint`, the health-probe path was `/api/v1/health`
(now `/health/live`), and `DeployResponse.ID` had been renamed to
`LogID`. Brought it back into compile so the suite actually runs:

- uint → string on every stack ID parameter and slice
- /api/v1/health → /health/live for the connectivity check
- deployLog.ID → deployLog.LogID (and stop/clean/redeploy)
- Added an STACKCTL_LIVE_API_KEY env path so the suite can authenticate
  via the X-API-Key header instead of password Login(). The password
  path still works; tests skip only when neither env is set.

VERIFICATION
- Full unit suite green: 7 packages pass.
- End-to-end against rancher-desktop k8s-stack-manager (built from
  origin/main + the merged inline-charts fix):
    stackctl bulk deploy olofm                   → 200 OK, "success"
    stackctl bulk template unpublish <id>        → 200 OK, "success"
    stackctl bulk template publish <id>          → 200 OK, "success"

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Expands cli/test/live/ from 2 deploy-heavy lifecycle tests into a
fast wire-contract smoke suite per endpoint group. Designed to catch
the class of bug we just shipped twice — request/response shape drift
between stackctl and the backend that stubbed unit tests can't see.

NEW FILES (all build-tag-gated by `live`):

  helpers_test.go             — shared fixtures + require/skip helpers
  audit_live_test.go          — list / filter by entity_type / export JSON
  bulk_live_test.go           — bulk template + bulk stack-instance wire
                                shape (drafts only, no real deploys)
  cluster_live_test.go        — list / get / quota / health / nodes
  definition_live_test.go     — list / get / export-import round-trip /
                                create-bare
  favorite_live_test.go       — add / list / remove (cleans up after)
  notification_live_test.go   — list / prefs round-trip
  template_live_test.go       — list / get / create-with-inline-charts
                                (the regression test for k8s-sm#264) /
                                publish lifecycle

EXISTING TESTS:

  live_test.go's TestLiveWorkflow_FullLifecycle and _BulkOperations
  actually roll out stack instances (helm install, PV allocation, golden-
  db pull). On rancher-desktop they ate ~80 GiB and put the node into
  DiskPressure during this very investigation. Both are now gated by
  STACKCTL_LIVE_HEAVY=1 so the default suite stays lightweight and the
  expensive coverage is opt-in for engineers with cluster capacity.

PRINCIPLES (also documented in helpers_test.go comment header):

  1. Use newLiveClient(t) — handles connectivity + auth (API key OR
     password login) and skips when neither env path is configured.
  2. Never create real workloads in the default suite. Tests must be
     safe to interleave on shared dev infra.
  3. Register cleanup via t.Cleanup so failed assertions never leave
     named resources behind.
  4. Skip (don't fail) when the backend doesn't have the precondition
     so partial-config environments stay green.

VERIFICATION:

Full suite against rancher-desktop k8s-stack-manager (origin/main +
the bulk-wire-contract fix in the parent commit):

    16 passed, 2 skipped (heavy-gated), 0 failed in 1.5s

Already paid for itself: while writing the template test I noticed
stackctl's ChartConfig type silently drops five backend fields
(chart_path / locked_values / deploy_order / required /
build_pipeline_id) on read. That's a separate follow-up — issues #96
and #97 cover the CI + swagger-contract long-term plan.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… dropping 5 fields

stackctl's ChartConfig only carried 5 of the 10 fields the backend ships,
so reads of models.ChartConfig (definition charts) and
models.TemplateChartConfig (template charts) silently dropped:

  - chart_path        (both)
  - deploy_order      (both)
  - build_pipeline_id (both)
  - locked_values     (template only)
  - required          (template only)

This was invisible in unit tests because httptest stubs decode requests
into stackctl's own ChartConfig — same blind spot that hid the bulk
field-name drift in the parent commit.

Type is now the union of the two backend models; LockedValues / Required
just stay zero on definition-chart reads. All five fields are omitempty
so create-request bodies for callers that don't use them are unaffected.

Test coverage:

  - cli/pkg/types/types_test.go: extends the StackDefinition JSON
    round-trip to populate every union field and assert decode integrity.
  - cli/test/live/template_live_test.go: TestLiveTemplate_CreateWithInlineCharts
    now sets chart_path / deploy_order / required / locked_values /
    build_pipeline_id on the inline chart and asserts each round-trips
    through POST /api/v1/templates and GET /api/v1/templates/:id.

Verified live against rancher-desktop k8s-stack-manager:

    16 passed, 2 skipped (heavy-gated), 0 failed

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9e4b1718-84e0-4338-8925-9f75be2bd5b5

📥 Commits

Reviewing files that changed from the base of the PR and between 588db83 and 69e8588.

📒 Files selected for processing (1)
  • cli/test/e2e/cli_e2e_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cli/test/e2e/cli_e2e_test.go

📝 Walkthrough

Walkthrough

Refactors bulk request/response types and expands ChartConfig, updates client/command code and mocks to the new wire contracts, and adds live integration tests and helpers. Bulk operations now use typed instance/template requests, per-item results carry string statuses and helper methods, and responses include aggregate counters.

Changes

Bulk Schema, ChartConfig, and Integration Tests

Layer / File(s) Summary
Bulk operation schema refactor
cli/pkg/types/types.go
BulkOperationResult switched from id+success boolean to instance/template keyed identifiers with string status, optional log_id, and helper methods ID()/Success(). BulkResponse adds total/successful/failed. BulkInstancesRequest and BulkTemplatesRequest replace generic BulkRequest.
ChartConfig schema expansion
cli/pkg/types/types.go, cli/pkg/types/types_test.go
ChartConfig expanded with chart path/version, release name, default/locked values, deploy order, required flag, and build pipeline ID; tests updated to assert round-trip preservation.
Client bulk endpoints migration
cli/pkg/client/client.go, cli/pkg/client/client_test.go
Bulk instance endpoints (BulkDeploy, BulkStop, BulkClean, BulkDelete) and template endpoints now POST typed request structs (BulkInstancesRequest, BulkTemplatesRequest). Tests/mocks updated to new BulkOperationResult/BulkResponse shapes and use Success()/ID() helper calls.
Command bulk result printing & tests
cli/cmd/bulk.go, cli/cmd/bulk_test.go
printBulkResults updated to use r.Success() and r.ID(); table status uses printer.StatusColor(r.Status). YAML/JSON test assertions updated to instance_id/status and status semantics.
E2E mock servers
cli/test/e2e/cli_e2e_test.go
E2E mock handlers updated to accept instance_ids/template_ids, reject empty lists with instance_ids required, and return results plus aggregate counters.
Live test infrastructure
cli/test/live/helpers_test.go, cli/test/live/live_test.go
Add live helpers (unique prefix, cleanup helpers, prerequisite checks, skip helper). newLiveClient supports STACKCTL_LIVE_API_KEY (header-based) with conditional login; instance IDs and log IDs handled as strings; health probe path changed to /health/live.
Live integration tests
cli/test/live/*
Add live tests for bulk (templates and instances), definitions (list/get, export/import, create), templates (inline charts, publish lifecycle), audit logs, clusters (list/get/quota/health/nodes), notifications (list/preferences), and favorites (add/list/remove).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.66% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the three main changes: bulk wire-contract alignment, ChartConfig field additions, and live test suite expansion.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/bulk-wire-contract

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
cli/pkg/client/client_test.go (1)

1800-1889: ⚡ Quick win

Lock request-body contract checks for all instance bulk endpoints.

TestBulkStop_Success, TestBulkClean_Success, and TestBulkDelete_Success don’t assert the posted wire key, so a regression away from instance_ids would still pass. Mirror TestBulkDeploy_Success by decoding types.BulkInstancesRequest and asserting body.InstanceIDs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/pkg/client/client_test.go` around lines 1800 - 1889, The three success
tests (TestBulkStop_Success, TestBulkClean_Success, TestBulkDelete_Success) are
missing assertions on the request body key; update each test's httptest handler
to decode the incoming JSON into types.BulkInstancesRequest (as done in
TestBulkDeploy_Success) and assert that body.InstanceIDs contains the expected
IDs before writing the response; locate the handlers inside those test functions
and add the decode+assert for body.InstanceIDs to ensure the wire key remains
instance_ids.
cli/cmd/bulk_test.go (1)

25-33: ⚡ Quick win

Add a template-result fixture to validate template_id paths.

sampleBulkResponse() only emits InstanceID, but template command tests now validate template request contracts. This leaves template response contract (template_id) untested and can hide regressions in r.ID()/output serialization for template bulk flows.

Suggested test-fixture split
 func sampleBulkResponse() types.BulkResponse {
 	return types.BulkResponse{
 		Results: []types.BulkOperationResult{
 			{InstanceID: "1", Status: "success"},
 			{InstanceID: "2", Status: "success"},
 			{InstanceID: "3", Status: "error", Error: "not found"},
 		},
 	}
 }
+
+func sampleBulkTemplateResponse() types.BulkResponse {
+	return types.BulkResponse{
+		Results: []types.BulkOperationResult{
+			{TemplateID: "1", Status: "success"},
+			{TemplateID: "2", Status: "success"},
+			{TemplateID: "3", Status: "error", Error: "not found"},
+		},
+	}
+}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/cmd/bulk_test.go` around lines 25 - 33, Update the test fixture so
template bulk flows include and validate template IDs: modify sampleBulkResponse
(or add a new fixture like sampleBulkResponseWithTemplates) to set the template
identifier field on types.BulkOperationResult entries (e.g., add TemplateID:
"template-1" / "template-2" for the success results and for the error case as
appropriate) to ensure template_id paths are exercised and r.ID()/output
serialization for template bulk responses is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cli/test/live/cluster_live_test.go`:
- Around line 47-51: The current t.Skipf calls skip on any error from the quota
lookup (variables cluster and err), which hides real failures; change the logic
around the quota fetch in the test to inspect err and only call t.Skipf when the
error matches explicit "quota optional/unavailable" conditions (e.g., a sentinel
error value, a specific error type or a well-known error message returned by the
quota lookup function), and otherwise call t.Fatalf or t.Error so unexpected
errors fail the test; update both places that call t.Skipf (the block
referencing cluster.ID and the later similar block) to perform this targeted
error check before skipping.

In `@cli/test/live/definition_live_test.go`:
- Around line 64-66: The test currently inspects bundle["charts"] which is
incorrect for payloads shaped as {schema_version, definition}; change the check
to read the nested charts under bundle["definition"] and assert against the
round-tripped definition's charts. Concretely, access origCharts via
bundle["definition"].(map[string]any)["charts"] (safely type-asserting) and
replace assert.NotEmpty(t, roundtrip.Charts, ...) with assert.NotEmpty(t,
roundtrip.Definition.Charts, ...) so the test verifies the nested
definition.charts field rather than a top-level charts key.

In `@cli/test/live/favorite_live_test.go`:
- Around line 60-63: Replace the manual loop and t.Errorf check after
RemoveFavorite with a testify assertion (assert/require) so failures are
consistent: instead of iterating over list2 and calling t.Errorf when
f.EntityType==entityType && f.EntityID==def.ID, use a testify helper (e.g.,
assert.NotContains or assert.False/assert.NoError with a computed found boolean)
from github.com/stretchr/testify to assert the favorite is absent — reference
the variables list2, f.EntityType, f.EntityID, def.ID and the RemoveFavorite
call when making the assertion.

In `@cli/test/live/helpers_test.go`:
- Around line 102-112: In skipUnlessHTTP200, replace the direct http.Get with an
http.Client that has a timeout (e.g. &http.Client{Timeout: 5*time.Second}) to
avoid hanging calls, use client.Get instead of http.Get, ensure resp.Body is
closed (defer resp.Body.Close()), and gate on resp.StatusCode == http.StatusOK:
if it's http.StatusNotFound call t.Skipf("endpoint %s returned 404 — feature not
enabled on this backend", path) and for any other non-200 status call
t.Skipf("endpoint %s returned non-200 status: %d", path, resp.StatusCode); keep
error handling to t.Skipf("endpoint %s unreachable: %v", path, err).

In `@cli/test/live/live_test.go`:
- Around line 34-38: The precondition currently allows API-key-only mode (checks
apiKey) which conflicts with the required live-test contract; change the check
in live_test.go so the test skips unless STACKCTL_LIVE_USER and
STACKCTL_LIVE_PASS are set (i.e., require user and pass rather than apiKey),
update the t.Skip message to state that STACKCTL_LIVE_USER and
STACKCTL_LIVE_PASS must be set (and note STACKCTL_LIVE_URL defaults to
http://localhost:8081 if needed), and make the same change for the second check
around lines 68-70 so both guards enforce the user/pass requirement.
- Around line 96-98: The heavy-test gate currently checks
os.Getenv("STACKCTL_LIVE_HEAVY") != "" which treats any non-empty value (e.g.,
"0" or "false") as enabled; change both checks of
os.Getenv("STACKCTL_LIVE_HEAVY") to require the explicit value "1" (i.e.,
os.Getenv("STACKCTL_LIVE_HEAVY") == "1") so heavy workflows only run when
STACKCTL_LIVE_HEAVY=1 is set; update the condition in both places in
cli/test/live/live_test.go and keep the t.Skip message as-is or clarify it to
mention "=1" if desired.

---

Nitpick comments:
In `@cli/cmd/bulk_test.go`:
- Around line 25-33: Update the test fixture so template bulk flows include and
validate template IDs: modify sampleBulkResponse (or add a new fixture like
sampleBulkResponseWithTemplates) to set the template identifier field on
types.BulkOperationResult entries (e.g., add TemplateID: "template-1" /
"template-2" for the success results and for the error case as appropriate) to
ensure template_id paths are exercised and r.ID()/output serialization for
template bulk responses is covered.

In `@cli/pkg/client/client_test.go`:
- Around line 1800-1889: The three success tests (TestBulkStop_Success,
TestBulkClean_Success, TestBulkDelete_Success) are missing assertions on the
request body key; update each test's httptest handler to decode the incoming
JSON into types.BulkInstancesRequest (as done in TestBulkDeploy_Success) and
assert that body.InstanceIDs contains the expected IDs before writing the
response; locate the handlers inside those test functions and add the
decode+assert for body.InstanceIDs to ensure the wire key remains instance_ids.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ce52dfe5-51a5-4e5c-81df-67bf099a3fb6

📥 Commits

Reviewing files that changed from the base of the PR and between 1e2e941 and a072bdf.

📒 Files selected for processing (15)
  • cli/cmd/bulk.go
  • cli/cmd/bulk_test.go
  • cli/pkg/client/client.go
  • cli/pkg/client/client_test.go
  • cli/pkg/types/types.go
  • cli/pkg/types/types_test.go
  • cli/test/live/audit_live_test.go
  • cli/test/live/bulk_live_test.go
  • cli/test/live/cluster_live_test.go
  • cli/test/live/definition_live_test.go
  • cli/test/live/favorite_live_test.go
  • cli/test/live/helpers_test.go
  • cli/test/live/live_test.go
  • cli/test/live/notification_live_test.go
  • cli/test/live/template_live_test.go

Comment thread cli/test/live/cluster_live_test.go
Comment thread cli/test/live/definition_live_test.go Outdated
Comment thread cli/test/live/favorite_live_test.go Outdated
Comment thread cli/test/live/helpers_test.go
Comment on lines +34 to 38
apiKey := os.Getenv("STACKCTL_LIVE_API_KEY")
user, pass := os.Getenv("STACKCTL_LIVE_USER"), os.Getenv("STACKCTL_LIVE_PASS")
if apiKey == "" && (user == "" || pass == "") {
t.Skip("STACKCTL_LIVE_API_KEY or (STACKCTL_LIVE_USER + STACKCTL_LIVE_PASS) must be set for live tests")
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Live auth precondition no longer matches the documented live-test contract.

Lines 34-38 and 68-70 make API-key-only mode valid, but the current live-test rule requires STACKCTL_LIVE_USER and STACKCTL_LIVE_PASS and skipping when unset. Please either enforce that here or update the live-test instruction contract in the same PR to keep policy and implementation aligned.

As per coding guidelines: “Live tests must require env vars STACKCTL_LIVE_USER and STACKCTL_LIVE_PASS and skip (not fail) if unset, with optional STACKCTL_LIVE_URL defaulting to http://localhost:8081.”

Also applies to: 68-70

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/test/live/live_test.go` around lines 34 - 38, The precondition currently
allows API-key-only mode (checks apiKey) which conflicts with the required
live-test contract; change the check in live_test.go so the test skips unless
STACKCTL_LIVE_USER and STACKCTL_LIVE_PASS are set (i.e., require user and pass
rather than apiKey), update the t.Skip message to state that STACKCTL_LIVE_USER
and STACKCTL_LIVE_PASS must be set (and note STACKCTL_LIVE_URL defaults to
http://localhost:8081 if needed), and make the same change for the second check
around lines 68-70 so both guards enforce the user/pass requirement.

Comment thread cli/test/live/live_test.go Outdated
CI was red because the e2e mock servers
(startE2ETemplateDefMockServer + startE2EQuietPipingMockServer) still
emitted the pre-fix bulk shape — {id, success} keyed by "ids". With the
new typed BulkInstancesRequest / BulkTemplatesRequest types, every bulk
test in cli/test/e2e/ decoded an empty results array and failed every
assertion.

Mocks now mirror the real backend:

  - Request decode keys are "template_ids" / "instance_ids".
  - Result entries carry "template_id" / "instance_id" + status:"success".
  - Response envelope carries total / successful / failed counters.

Also addresses review comments from CodeRabbit on PR #98:

  - live_test.go: STACKCTL_LIVE_HEAVY gating now requires exactly "1"
    instead of any non-empty value, so STACKCTL_LIVE_HEAVY=0 stops
    accidentally triggering ~80GB workload tests.
  - helpers_test.go: skipUnlessHTTP200 now actually gates on 200,
    uses a 5s timeout, and defers Body.Close.
  - definition_live_test.go: chart-presence check reads the nested
    bundle["definition"]["charts"] (was bundle["charts"], which is
    always nil under the {schema_version, definition} envelope and
    silently no-op'd the assertion).
  - favorite_live_test.go: replaced t.Errorf with assert.False to match
    the testify convention used elsewhere.
  - cluster_live_test.go: quota + nodes skip paths now narrow on
    documented "not found" / "not reachable" substrings; any other
    error fails the test so contract regressions still surface.

The "live tests must require STACKCTL_LIVE_USER+PASS" comment is
intentionally not applied — API-key auth was added in this PR as a
parallel path and is the configuration used by CI.

Verified:
  - go test ./...                              all packages pass
  - go test -tags live ./test/live/...         16 passed, 2 skipped
  - go vet ./...                               clean

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cli/test/e2e/cli_e2e_test.go`:
- Around line 890-909: The mock bulk-template handlers (the HTTP handler cases
for "/api/v1/templates/bulk/publish", "/api/v1/templates/bulk/unpublish" and
"/api/v1/templates/bulk/delete") currently only check JSON decode errors and
thus accept empty bodies or empty template_ids; update each handler to validate
that the decoded req.TemplateIDs slice is non-nil and has length > 0 and, if
empty, respond with http.StatusBadRequest and a JSON error (e.g.,
{"error":"template_ids required"}). Locate the req struct and results
construction in each case (the variables named req and results and the code that
appends map[string]interface{}{"template_id": id, "status": "success"}) and add
the guard immediately after json.NewDecoder(r.Body).Decode(&req) to return 400
when len(req.TemplateIDs) == 0.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ab58b0fb-4ea1-4402-b930-25372053016b

📥 Commits

Reviewing files that changed from the base of the PR and between a072bdf and 588db83.

📒 Files selected for processing (6)
  • cli/test/e2e/cli_e2e_test.go
  • cli/test/live/cluster_live_test.go
  • cli/test/live/definition_live_test.go
  • cli/test/live/favorite_live_test.go
  • cli/test/live/helpers_test.go
  • cli/test/live/live_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • cli/test/live/helpers_test.go
  • cli/test/live/definition_live_test.go
  • cli/test/live/cluster_live_test.go
  • cli/test/live/live_test.go

Comment thread cli/test/e2e/cli_e2e_test.go
Mirrors the existing instance-bulk handlers and matches real backend
behaviour: `template_ids` is `binding:"required"` server-side, so missing
or empty arrays produce 400. The mock now does the same so a future
bug where stackctl forgets to send the IDs fails the e2e test instead
of silently appearing to "succeed" with an empty results array.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown

Actionable comments posted: 0

@omattsson omattsson merged commit 565745d into main May 28, 2026
7 checks passed
@omattsson omattsson deleted the fix/bulk-wire-contract branch May 28, 2026 10:54
omattsson pushed a commit that referenced this pull request May 28, 2026
…kend

Closes #96.

Boots k8s-stack-manager (origin/main, api-only docker-compose profile)
in the same job, applies a SQL fixture so require* helpers don't skip,
mints a 1-day API key via the public auth endpoints, then runs the
live suite. PR-time signal that the wire contracts still match — the
class of bug stub-based unit tests can't see (recently surfaced in
PR #98).

What ships:

  .github/workflows/live-tests.yml
      - Triggers on push to main + every PR targeting main.
      - Clones k8s-sm main, `docker compose up backend` (no --profile
        → api-only, since frontend is gated to "full").
      - Polls /health/live for up to 60s.
      - Pipes cli/test/live/testdata/ci-seed.sql into the mysql
        container.
      - Logs in as the env-seeded admin, mints a key via
        POST /api/v1/users/<id>/api-keys with expires_in_days=1,
        masks it in logs, exports as STACKCTL_LIVE_API_KEY.
      - `go test -tags live -timeout 10m ./test/live/...`
      - Dumps backend logs on failure.

  cli/test/live/testdata/ci-seed.sql
      - 1 cluster (no kubeconfig — test-connection will fail, tests
        expect that), 1 stack_definition + chart_config, 1 published
        stack_template + template_chart_config.
      - owner_id is looked up via SELECT … WHERE username='admin' so
        we don't need to inject the non-deterministic admin UUID.

  cli/test/live/doc.go
      - Documents the env vars + how to reproduce the CI flow locally
        (clone k8s-sm, compose up, apply seed, mint key, run suite).

Design decisions (from the open questions in #96):

  - Bootstrap: backend SQL seed for fixtures, login-then-mint for the
    API key. Avoids hardcoding bcrypt/SHA-256 hashes in test data.
  - Backend pin: track origin/main. Surfaces cross-repo contract
    breaks immediately — that's exactly the signal #96 wants.
  - Workflow location: separate file. Easy to mark non-required if it
    ever gets flaky without blocking the rest of CI.

Two contract-drift gotchas surfaced while validating against
rancher-desktop k8s-sm (the value this workflow will catch on PRs):

  - POST /api/v1/users/<id>/api-keys requires expires_at OR
    expires_in_days. Initial draft sent neither and got 400.
  - The minted-key response field is `raw_key`, not `key`.

Both fixed in the workflow + doc.go before pushing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
omattsson pushed a commit that referenced this pull request May 28, 2026
The CI live-tests workflow runs against a stub cluster (no kubeconfig,
no real kube-apiserver). The backend correctly returns 500 with
"Failed to connect to cluster" for diagnostic endpoints in that
state, so two subtests of TestLiveCluster_HealthAndTest were failing
in CI while passing locally against rancher-desktop.

Two changes:

1. The `health` subtest now skips on the same "cluster unreachable"
   conditions the `nodes` subtest already handles (added in #98 via
   CodeRabbit's tightening). Symmetric with `nodes` so a wire-shape
   regression on either endpoint still surfaces as a failure.

2. The substring set is extracted into isClusterUnreachable() and
   widened to include "failed to connect to cluster" — the actual
   literal in the backend's 500 response when kubeconfig is empty or
   the cluster is down.

Verified against rancher-desktop k8s-sm: all three subtests still
pass (real cluster, real kubeconfig). The CI api-only path will now
skip the two diagnostic subtests cleanly instead of failing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
omattsson added a commit that referenced this pull request May 28, 2026
…kend (#99)

* ci(live): run cli/test/live/ on every PR against a freshly-booted backend

Closes #96.

Boots k8s-stack-manager (origin/main, api-only docker-compose profile)
in the same job, applies a SQL fixture so require* helpers don't skip,
mints a 1-day API key via the public auth endpoints, then runs the
live suite. PR-time signal that the wire contracts still match — the
class of bug stub-based unit tests can't see (recently surfaced in
PR #98).

What ships:

  .github/workflows/live-tests.yml
      - Triggers on push to main + every PR targeting main.
      - Clones k8s-sm main, `docker compose up backend` (no --profile
        → api-only, since frontend is gated to "full").
      - Polls /health/live for up to 60s.
      - Pipes cli/test/live/testdata/ci-seed.sql into the mysql
        container.
      - Logs in as the env-seeded admin, mints a key via
        POST /api/v1/users/<id>/api-keys with expires_in_days=1,
        masks it in logs, exports as STACKCTL_LIVE_API_KEY.
      - `go test -tags live -timeout 10m ./test/live/...`
      - Dumps backend logs on failure.

  cli/test/live/testdata/ci-seed.sql
      - 1 cluster (no kubeconfig — test-connection will fail, tests
        expect that), 1 stack_definition + chart_config, 1 published
        stack_template + template_chart_config.
      - owner_id is looked up via SELECT … WHERE username='admin' so
        we don't need to inject the non-deterministic admin UUID.

  cli/test/live/doc.go
      - Documents the env vars + how to reproduce the CI flow locally
        (clone k8s-sm, compose up, apply seed, mint key, run suite).

Design decisions (from the open questions in #96):

  - Bootstrap: backend SQL seed for fixtures, login-then-mint for the
    API key. Avoids hardcoding bcrypt/SHA-256 hashes in test data.
  - Backend pin: track origin/main. Surfaces cross-repo contract
    breaks immediately — that's exactly the signal #96 wants.
  - Workflow location: separate file. Easy to mark non-required if it
    ever gets flaky without blocking the rest of CI.

Two contract-drift gotchas surfaced while validating against
rancher-desktop k8s-sm (the value this workflow will catch on PRs):

  - POST /api/v1/users/<id>/api-keys requires expires_at OR
    expires_in_days. Initial draft sent neither and got 400.
  - The minted-key response field is `raw_key`, not `key`.

Both fixed in the workflow + doc.go before pushing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci(live): reset upstream dev bind-mount so production image starts

k8s-stack-manager's docker-compose.yml bind-mounts ./backend:/app for
hot-reload during dev. With target: production, that overlays the
image's baked-in /app/main binary with the host source tree, which
has no compiled binary — container exits immediately with
"stat ./main: no such file or directory".

Ships testdata/docker-compose.ci.yml as a compose override that
resets the volumes list (Compose ≥1.28 !reset semantics), and wires
both -f files into every compose call in the workflow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci(live): redact JWT and raw API key from failure-path logs

If login or api-key mint shapes drift, the workflow dumps the raw
response to stderr for debugging. Both responses carry credentials
(JWT in login, raw_key in mint) that shouldn't end up in CI logs even
on the failure branch — masked outputs still surface the field names
and status, which is the actual diagnostic value.

Addresses CodeRabbit on PR #99. The sister findings about pinning
actions/checkout@v6 and actions/setup-go@v6 to commit SHAs and adding
persist-credentials: false are deliberately skipped — none of the
four existing workflows in this repo follow that policy, and the
right place to harden is a dedicated repo-wide change, not a
one-workflow special case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci(live): disable container-side healthcheck (alpine image has no curl)

Upstream backend healthcheck runs `curl -f http://localhost:8081/health/live`
inside the container, but the production alpine image only ships
./main + helm + ca-certificates — no curl. The healthcheck fails
forever and `docker compose up --wait` hits its timeout.

We already poll /health/live from the host in the next step, so the
in-container check is redundant. Override with `test: ["NONE"]` and
keep the host-side gate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci(live): use wget for healthcheck (alpine image has no curl)

Last attempt set `test: ["NONE"]` to disable the broken curl-based
healthcheck, but `docker compose up --wait` errors out with "no
healthcheck configured" instead of treating "running" as ready.

Alpine's base image ships busybox `wget` (curl is not installed), so
override the healthcheck command to use that. Shorter interval +
retries chosen so a healthy backend is flagged ready within ~10s
rather than upstream's 75s start_period.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(live): skip cluster health/nodes when backend can't reach cluster

The CI live-tests workflow runs against a stub cluster (no kubeconfig,
no real kube-apiserver). The backend correctly returns 500 with
"Failed to connect to cluster" for diagnostic endpoints in that
state, so two subtests of TestLiveCluster_HealthAndTest were failing
in CI while passing locally against rancher-desktop.

Two changes:

1. The `health` subtest now skips on the same "cluster unreachable"
   conditions the `nodes` subtest already handles (added in #98 via
   CodeRabbit's tightening). Symmetric with `nodes` so a wire-shape
   regression on either endpoint still surfaces as a failure.

2. The substring set is extracted into isClusterUnreachable() and
   widened to include "failed to connect to cluster" — the actual
   literal in the backend's 500 response when kubeconfig is empty or
   the cluster is down.

Verified against rancher-desktop k8s-sm: all three subtests still
pass (real cluster, real kubeconfig). The CI api-only path will now
skip the two diagnostic subtests cleanly instead of failing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(live): tighten cluster-unreachable substring set

Drop "unavailable" — too generic, could mask a legitimate 503 from a
wire-shape regression. Add "unreachable" instead, which matches the
backend's models.ClusterUnreachable status constant and is the more
accurate token for a cluster-down condition.

Addresses CodeRabbit on PR #99. The companion suggestion to convert
TestLiveCluster_HealthAndTest to a table-driven + t.Parallel() form
is deliberately skipped: none of the four sibling *_live_test.go
files in this package follow that pattern (the convention applies to
unit tests; live tests against a shared backend run serially by
design), and a mass-refactor of the live suite is out of scope for a
workflow PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Olof Mattsson <olof.mattsson@klaravik.se>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
omattsson added a commit that referenced this pull request May 28, 2026
…er (#101)

* test(contract): assert pkg/types request shapes against backend swagger

Closes #97.

Adds a unit-test-only contract layer that validates every stackctl
request struct against the backend's published OpenAPI (Swagger 2.0)
schema. Catches the class of bug shipped four times this week (#95,
#98, k8s-sm#264, the BulkOperationResult shape) where unit /
integration / e2e tests all stub the backend and decode requests
into stackctl's OWN types — so a json-tag drift between stackctl and
the backend decodes cleanly in tests but 400s the moment a real
backend reads the bytes.

What ships:

  cli/test/contract/testdata/swagger.json
      Vendored copy of omattsson/k8s-stack-manager:backend/docs/swagger.json.
      Pinned (not auto-fetched) so backend-side wire changes show up
      as explicit diffs in this repo.

  cli/test/contract/refresh-swagger.sh
      One-shot refresh: `./refresh-swagger.sh [ref]`. Validates the
      payload is JSON, sha256-compares before overwriting, and tells
      you which test to re-run. Pinned to a ref so partial backend
      rollouts don't break stackctl CI.

  cli/test/contract/contract_test.go
      Embeds swagger.json via go:embed, walks 9 request structs via
      reflection, and asserts bidirectionally:

        1. Every Go json tag exists as a property in swagger.
           Catches stale or typoed Go-side fields.
        2. Every swagger "required" field has a Go json tag.
           Catches missing-required-field bugs.
        3. Field types align (string/boolean/integer/number/array/object).

      Covers: CreateClusterRequest, UpdateClusterRequest,
      BulkInstancesRequest, BulkTemplatesRequest, RegisterRequest,
      LoginRequest, CreateAPIKeyRequest, ResetPasswordRequest,
      CreateCleanupPolicyRequest.

      Plus TestSwaggerVendorIntegrity — sanity floor on the vendored
      copy so a truncated swagger.json doesn't manifest as a pile of
      confusing per-case failures.

Failure-mode validated with a synthetic drift (matches the
pre-fix/bulk-wire-contract bug exactly):

  Error: Go struct field Ids (json:"ids") has no matching property
         in swagger schema — typo, or backend doesn't accept this field
  Error: swagger schema requires field "instance_ids" but the Go
         struct has no field with that json tag — request will fail
         validation

V1 deliberately skips:
  - Response types (issue calls out polymorphism for GET endpoints;
    a separate iteration).
  - CreateTemplateRequest / UpdateTemplateRequest — backend's swag
    annotation uses an unexported struct (createTemplateRequest) so
    no schema is published. Worth a tiny upstream PR to make the
    type exported + annotated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(contract): refresh-swagger.sh — list python3 in deps, validate via stdin

Both address CodeRabbit on PR #101: the script invokes python3 for
JSON validation but the requirements comment listed only curl and
shasum, and the validation embedded the temp filename in the inline
Python string (mktemp output is normally safe, but stdin redirection
is cleaner regardless and removes any quoting concern).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Olof Mattsson <olof.mattsson@klaravik.se>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant