ci(live): run cli/test/live/ on every PR against a freshly-booted backend by omattsson · Pull Request #99 · omattsson/stackctl

omattsson · 2026-05-28T11:18:15Z

Summary

Closes #96.

Adds a CI job that boots k8s-stack-manager (origin/main, api-only docker-compose profile) in the same runner, applies a SQL fixture, mints a 1-day API key, then runs cli/test/live/... against it. PR-time signal that the wire contracts still match between the two repos — the class of bug stub-based unit tests can't see, and that surfaced four times in the past week alone (#95, k8s-sm#264, #98).

Design decisions

Three open questions in #96 were resolved before writing the workflow:

Question	Decision	Rationale
Bootstrap mechanism	Backend SQL seed + login-then-mint via API	Hardcoded bcrypt / SHA-256 hashes in fixtures rot when hashing schemes change; using the public auth endpoints survives migrations. SQL fixtures cover the require* skip-gates that a fresh DB would otherwise trip.
k8s-sm pinning	Track `origin/main`	Surfacing cross-repo contract breaks is the whole point of this workflow. Pin later if it becomes too noisy.
Workflow location	New `.github/workflows/live-tests.yml`	Separable from the unit/integration job — easy to mark non-required if it ever gets flaky without blocking the rest of CI.

Validation

Bootstrap flow validated against the rancher-desktop k8s-sm before pushing:

SHOW COLUMNS confirms the SQL fixture matches the GORM-migrated schema for clusters, stack_definitions, chart_configs, stack_templates, template_chart_configs, users.
Seed runs cleanly inside a transaction with no schema errors.
/api/v1/auth/me returns {id, username, role} as expected.
POST /api/v1/users/<id>/api-keys returns .raw_key (validated key minted + revoked).

Two contract-drift gotchas surfaced during that validation (precisely the kind of bug this workflow will surface on PRs):

POST /api/v1/users/<id>/api-keys requires expires_at OR expires_in_days. Initial draft sent neither → 400.
Minted-key field is raw_key, not key.

Both fixed in the workflow + cli/test/live/doc.go before pushing.

Depends on

#98 (live suite expansion + bulk wire-contract fix). The live tests this job runs were introduced there — needs to merge first.

Test plan

CI green on this PR (live job passes)
Manually verify a follow-up PR that introduces a deliberate field-name drift fails this job with a clear error
Reproducing the CI flow locally via the steps in cli/test/live/doc.go produces the same result

🤖 Generated with Claude Code

Summary by CodeRabbit

Chores
- Added an automated live integration test workflow that runs on pushes to main and on pull requests; it boots the backend and database, seeds test data, mints a short‑lived API key, and executes the live test suite.
Documentation
- Expanded live-test docs with setup steps, required environment variables, authentication examples, and CI reproduction instructions.
Bug Fixes
- Live tests now gracefully skip cluster health diagnostics when the backend indicates the cluster is unreachable, reducing false failures.

coderabbitai · 2026-05-28T11:18:27Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a39fe683-01e0-40e1-bf8d-6c9a3f4a5529

📥 Commits

Reviewing files that changed from the base of the PR and between 9c98f0e and 8a34986.

📒 Files selected for processing (1)

cli/test/live/cluster_live_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

cli/test/live/cluster_live_test.go

📝 Walkthrough

Walkthrough

Adds a GitHub Actions workflow that boots the API+MySQL backend, seeds MySQL, mints a short-lived API key, runs the live test suite (go test -tags live) with injected env vars, expands live test package documentation, and centralizes unreachable-cluster detection in tests.

Changes

Live Integration Test Suite CI Automation and Documentation

Layer / File(s)	Summary
Workflow triggers and job setup `.github/workflows/live-tests.yml`	Defines workflow name, triggers on `push`/`pull_request` to `main`, read-only `contents` permission, concurrency, job env, repo checkout, and Go setup using `cli/go.mod`/`cli/go.sum`.
Backend stack initialization and health check `.github/workflows/live-tests.yml`	Clones `omattsson/k8s-stack-manager`, starts the API-only compose stack (backend + MySQL) and polls `http://localhost:8081/health/live` until healthy, failing with backend logs if unreachable.
Database seeding and API key minting `.github/workflows/live-tests.yml`	Executes `cli/test/live/testdata/ci-seed.sql` inside MySQL container, authenticates against the backend, mints an API key, masks the raw key, and exports it via `GITHUB_OUTPUT`.
Live test suite execution and failure handling `.github/workflows/live-tests.yml`	Runs `go test -tags live` from `cli/` against `./test/live/...` with backend URL and API key env vars; on failure prints backend container logs using the cloned compose files.
Live test suite documentation and contracts `cli/test/live/doc.go`	Expands package-level documentation describing the `live` build tag, env-var contract (backend URL default, API key vs user/password), `STACKCTL_LIVE_HEAVY` toggle, and concrete local/CI reproduction commands.
Cluster unreachable handling in tests `cli/test/live/cluster_live_test.go`	Refactors health and nodes subtests to conditionally skip when the backend indicates the cluster is unreachable, and adds `isClusterUnreachable(err error) bool` to centralize substring checks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

test(live): run cli/test/live/ in CI against a booted backend #96: Implements the proposed CI workflow to boot the backend, seed MySQL, mint credentials, and run the cli/test/live suite via GitHub Actions.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding a CI workflow to run live tests on every PR against a freshly-booted backend.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ci/live-tests-workflow

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

cli/pkg/client/client_test.go (1)

1800-1806: ⚡ Quick win

Lock instance_ids contract in all stack bulk success stubs.

TestBulkStop_Success, TestBulkClean_Success, and TestBulkDelete_Success don’t decode/assert the request body, so a payload-key regression could pass these tests while only TestBulkDeploy_Success catches it.

Suggested patch

 func TestBulkStop_Success(t *testing.T) {
 	t.Parallel()
 	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		assert.Equal(t, http.MethodPost, r.Method)
 		assert.Equal(t, "/api/v1/stack-instances/bulk/stop", r.URL.Path)
+		var body types.BulkInstancesRequest
+		require.NoError(t, json.NewDecoder(r.Body).Decode(&body))
+		assert.Equal(t, []string{"1"}, body.InstanceIDs)
 		w.WriteHeader(http.StatusOK)
 		json.NewEncoder(w).Encode(types.BulkResponse{
 			Results: []types.BulkOperationResult{
 				{InstanceID: "1", Status: "success"},
 			},
@@
 func TestBulkClean_Success(t *testing.T) {
 	t.Parallel()
 	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		assert.Equal(t, http.MethodPost, r.Method)
 		assert.Equal(t, "/api/v1/stack-instances/bulk/clean", r.URL.Path)
+		var body types.BulkInstancesRequest
+		require.NoError(t, json.NewDecoder(r.Body).Decode(&body))
+		assert.Equal(t, []string{"5", "6"}, body.InstanceIDs)
 		w.WriteHeader(http.StatusOK)
 		json.NewEncoder(w).Encode(types.BulkResponse{
 			Results: []types.BulkOperationResult{
 				{InstanceID: "5", Status: "success"},
 				{InstanceID: "6", Status: "success"},
@@
 func TestBulkDelete_Success(t *testing.T) {
 	t.Parallel()
 	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		assert.Equal(t, http.MethodPost, r.Method)
 		assert.Equal(t, "/api/v1/stack-instances/bulk/delete", r.URL.Path)
+		var body types.BulkInstancesRequest
+		require.NoError(t, json.NewDecoder(r.Body).Decode(&body))
+		assert.Equal(t, []string{"10"}, body.InstanceIDs)
 		w.WriteHeader(http.StatusOK)
 		json.NewEncoder(w).Encode(types.BulkResponse{
 			Results: []types.BulkOperationResult{
 				{InstanceID: "10", Status: "success"},
 			},

Also applies to: 1835-1841, 1870-1876

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/pkg/client/client_test.go` around lines 1800 - 1806, The three tests
TestBulkStop_Success, TestBulkClean_Success, and TestBulkDelete_Success need to
assert the request body contract (ensure the JSON contains "instance_ids") like
TestBulkDeploy_Success does: in each httptest handler for those tests decode the
request body (e.g., via json.NewDecoder(r.Body).Decode into a struct or map) and
assert that the "instance_ids" key (or the expected slice) is present and
correct before writing the success response; update the handlers inside
TestBulkStop_Success, TestBulkClean_Success, and TestBulkDelete_Success to
perform this decode/assert to lock the payload-key contract.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/live-tests.yml:
- Around line 86-94: The failure path currently echoes the full login JSON
(login) including sensitive JWTs and user IDs; change the log to redact or
remove sensitive fields instead of printing the raw payload—replace the echo
"$login" | jq . >&2 call with a sanitized jq transformation that deletes or
masks .token, .access_token, and .user.id (or sets them to "REDACTED") so you
still get useful debug output without exposing credentials; locate the
login/jwt/admin_id usage in the login block where BACKEND_URL, ADMIN_USERNAME,
ADMIN_PASSWORD, jwt, admin_id, and login are referenced and apply the jq
redaction there.
- Around line 36-40: Replace floating action tags and leaking logs: pin
actions/checkout@v6 and actions/setup-go@v6 to specific commit SHAs (replace the
`@v6` references with the corresponding full SHA), add persist-credentials: false
to the checkout step (the step currently using actions/checkout) to avoid
leaving credentials in the workspace, and stop printing raw auth JSON by
removing or replacing the echo "$login" | jq . >&2 and echo "$mint" | jq . >&2
lines—instead log only non-sensitive fields or a redacted summary (e.g., status
and expiry) so tokens/JWTs are never emitted.

In `@cli/cmd/bulk_test.go`:
- Around line 25-31: The template bulk tests are using the instance-scoped
fixture sampleBulkResponse instead of a template-scoped fixture, so swap those
usages to use sampleBulkTemplateResponse; locate the template
publish/unpublish/delete test cases that call sampleBulkResponse (mentions
around the blocks that assert instance_id) and replace those calls with
sampleBulkTemplateResponse, ensuring the assertions check template_id in the
returned types.BulkResponse.Results entries (and that sampleBulkTemplateResponse
returns TemplateID/populates the template-scoped fields expected by the tests).

In `@cli/test/live/notification_live_test.go`:
- Around line 63-66: The test is order-dependent because it compares prefs[i] to
got[i]; change it to check by EventType instead: assert the lengths match, build
a map from got items keyed by EventType (e.g., gotByType :=
map[string]NotificationPreference{...}), then iterate over prefs and for each
prefs[j].EventType look up the corresponding got entry and assert its Enabled
equals prefs[j].Enabled (and that the lookup exists). Update the loop using
prefs and gotByType rather than index-based comparisons to avoid ordering
issues.

---

Nitpick comments:
In `@cli/pkg/client/client_test.go`:
- Around line 1800-1806: The three tests TestBulkStop_Success,
TestBulkClean_Success, and TestBulkDelete_Success need to assert the request
body contract (ensure the JSON contains "instance_ids") like
TestBulkDeploy_Success does: in each httptest handler for those tests decode the
request body (e.g., via json.NewDecoder(r.Body).Decode into a struct or map) and
assert that the "instance_ids" key (or the expected slice) is present and
correct before writing the success response; update the handlers inside
TestBulkStop_Success, TestBulkClean_Success, and TestBulkDelete_Success to
perform this decode/assert to lock the payload-key contract.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: de82d5da-0556-4bd4-8f3e-58b1cb8c3bcc

📥 Commits

Reviewing files that changed from the base of the PR and between 565745d and 24bd762.

⛔ Files ignored due to path filters (1)

cli/test/live/testdata/ci-seed.sql is excluded by !**/testdata/**

📒 Files selected for processing (18)

.github/workflows/live-tests.yml
cli/cmd/bulk.go
cli/cmd/bulk_test.go
cli/pkg/client/client.go
cli/pkg/client/client_test.go
cli/pkg/types/types.go
cli/pkg/types/types_test.go
cli/test/e2e/cli_e2e_test.go
cli/test/live/audit_live_test.go
cli/test/live/bulk_live_test.go
cli/test/live/cluster_live_test.go
cli/test/live/definition_live_test.go
cli/test/live/doc.go
cli/test/live/favorite_live_test.go
cli/test/live/helpers_test.go
cli/test/live/live_test.go
cli/test/live/notification_live_test.go
cli/test/live/template_live_test.go

coderabbitai · 2026-05-28T11:29:22Z

+      - name: Check out stackctl
+        uses: actions/checkout@v6
+
+      - name: Set up Go
+        uses: actions/setup-go@v6


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Verify all GitHub actions are pinned by commit SHA and checkout disables persisted credentials. fd -a '\.ya?ml$' .github/workflows | while read -r wf; do echo "== $wf ==" rg -n '^\s*uses:\s*[^@]+@' "$wf" rg -n 'actions/checkout@' "$wf" -A3 -B1 done

Repository: omattsson/stackctl

Length of output: 112

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Show the exact relevant sections in live-tests.yml (with line numbers) echo "---- live-tests.yml (lines 1-140) ----" nl -ba .github/workflows/live-tests.yml | sed -n '1,140p' # 2) Check whether persist-credentials is set anywhere in live-tests.yml echo "---- persist-credentials occurrences ----" rg -n "persist-credentials" .github/workflows/live-tests.yml || true # 3) Inspect .github/instructions for rules related to pinned actions + secret/log handling echo "---- .github/instructions/*.md files ----" ls -la .github/instructions || true echo "---- instruction keyword matches ----" rg -n "(pin(ned)?|commit sha|checkout|persist-credentials|JWT|token|secret|logs?|supply chain)" .github/instructions/*.md || true # 4) Also find any login/echo/jq payload printing patterns in live-tests.yml echo "---- login/echo/jq printing patterns ----" rg -n "(name:\s*Login|login\b|mint\b|jq\b|echo\s+\"\\$login\"|echo\s+\"\\$mint\"|validation failure|fail(ed)?)" .github/workflows/live-tests.yml || true

Repository: omattsson/stackctl

Length of output: 162

🏁 Script executed:

#!/bin/bash set -euo pipefail # Basic file existence echo "---- file checks ----" test -f .github/workflows/live-tests.yml && echo "live-tests.yml: exists" || echo "live-tests.yml: missing" test -d .github/instructions && echo "instructions/: exists" || echo "instructions/: missing" # Show live-tests.yml with line numbers (1-180) echo "---- live-tests.yml (lines 1-180) ----" awk '{printf "%5d %s\n", NR, $0}' .github/workflows/live-tests.yml | sed -n '1,180p' # Search action usage lines echo "---- uses: actions/checkout@ / setup-go@ ----" rg -n 'uses:\s*actions/(checkout|setup-go)@' .github/workflows/live-tests.yml || true # Check persist-credentials echo "---- persist-credentials occurrences ----" rg -n 'persist-credentials' .github/workflows/live-tests.yml || true # Check for any jq+echo of login/mint patterns echo "---- login/mint/jq/echo sensitive payload printing patterns ----" rg -n '(name:\s*Login|login\b|mint\b|jq\b|echo\s+"\\$login"|echo\s+"\\$mint"|echo\s+["'\'']\\$login|echo\s+["'\'']\\$mint|validation failure|fail(ed|ing)?|Authorization|Bearer|JWT|token)' .github/workflows/live-tests.yml || true # Instructions directory listing echo "---- .github/instructions listing ----" ls -la .github/instructions || true # Keyword matches in instructions echo "---- instruction keyword matches ----" rg -n '(pin(ned)?|commit sha|checkout|persist-credentials|JWT|token|secret|log(s)?|supply chain|credential)' .github/instructions/*.md || true

Repository: omattsson/stackctl

Length of output: 9369

Pin GitHub Actions to commit SHAs; redact auth payloads in CI logs.

.github/workflows/live-tests.yml uses floating tags (actions/checkout@v6, actions/setup-go@v6) and doesn’t set persist-credentials: false for checkout, weakening CI supply-chain/security posture.

On auth failures, the workflow prints full JSON responses (echo "$login" | jq . >&2 and echo "$mint" | jq . >&2), which can include the JWT/token fields—avoid dumping raw auth payloads; redact or print only non-sensitive fields.

Suggested hardening

- - name: Check out stackctl - uses: actions/checkout@<pinned-commit-sha> - with: - persist-credentials: false + - name: Check out stackctl + uses: actions/checkout@<pinned-commit-sha> + with: + persist-credentials: false - - name: Set up Go - uses: actions/setup-go@<pinned-commit-sha> + - name: Set up Go + uses: actions/setup-go@<pinned-commit-sha>

🧰 Tools

🪛 zizmor (1.25.2)

[warning] 36-37: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

[error] 37-37: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

[error] 40-40: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/live-tests.yml around lines 36 - 40, Replace floating action tags and leaking logs: pin actions/checkout@v6 and actions/setup-go@v6 to specific commit SHAs (replace the `@v6` references with the corresponding full SHA), add persist-credentials: false to the checkout step (the step currently using actions/checkout) to avoid leaving credentials in the workspace, and stop printing raw auth JSON by removing or replacing the echo "$login" | jq . >&2 and echo "$mint" | jq . >&2 lines—instead log only non-sensitive fields or a redacted summary (e.g., status and expiry) so tokens/JWTs are never emitted.

coderabbitai · 2026-05-28T11:29:22Z

 func sampleBulkResponse() types.BulkResponse {
 	return types.BulkResponse{
 		Results: []types.BulkOperationResult{
-			{ID: "1", Success: true},
-			{ID: "2", Success: true},
-			{ID: "3", Success: false, Error: "not found"},
+			{InstanceID: "1", Status: "success"},
+			{InstanceID: "2", Status: "success"},
+			{InstanceID: "3", Status: "error", Error: "not found"},
 		},


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Template bulk tests are asserting the wrong result field (instance_id).

Template bulk responses should be template-scoped. Reusing an instance-scoped fixture here means these tests won’t catch regressions where template_id is broken or missing.

Suggested fix

-func sampleBulkResponse() types.BulkResponse { +func sampleBulkInstanceResponse() types.BulkResponse { return types.BulkResponse{ Results: []types.BulkOperationResult{ {InstanceID: "1", Status: "success"}, {InstanceID: "2", Status: "success"}, {InstanceID: "3", Status: "error", Error: "not found"}, }, } } + +func sampleBulkTemplateResponse() types.BulkResponse { + return types.BulkResponse{ + Results: []types.BulkOperationResult{ + {TemplateID: "1", Status: "success"}, + {TemplateID: "2", Status: "success"}, + {TemplateID: "3", Status: "error", Error: "not found"}, + }, + } +}

-assert.Contains(t, out, "instance_id: \"1\"") +assert.Contains(t, out, "template_id: \"1\"")

Use sampleBulkTemplateResponse() across template publish/unpublish/delete output tests.

Also applies to: 892-894, 1022-1024, 1228-1230

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/cmd/bulk_test.go` around lines 25 - 31, The template bulk tests are using the instance-scoped fixture sampleBulkResponse instead of a template-scoped fixture, so swap those usages to use sampleBulkTemplateResponse; locate the template publish/unpublish/delete test cases that call sampleBulkResponse (mentions around the blocks that assert instance_id) and replace those calls with sampleBulkTemplateResponse, ensuring the assertions check template_id in the returned types.BulkResponse.Results entries (and that sampleBulkTemplateResponse returns TemplateID/populates the template-scoped fields expected by the tests).

coderabbitai · 2026-05-28T11:29:22Z

+	for i := range prefs {
+		assert.Equal(t, prefs[i].EventType, got[i].EventType, "event_type[%d]", i)
+		assert.Equal(t, prefs[i].Enabled, got[i].Enabled, "enabled[%d]", i)
+	}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid order-dependent preference assertions.

At Line 63, comparing prefs[i] with got[i] assumes stable ordering from the API. If ordering changes, this test can fail even when payload content is correct.

Proposed fix

got, err := c.UpdateNotificationPreferences(prefs) require.NoError(t, err, "update notification preferences (echo)") require.Len(t, got, len(prefs), "response length must match input") - for i := range prefs { - assert.Equal(t, prefs[i].EventType, got[i].EventType, "event_type[%d]", i) - assert.Equal(t, prefs[i].Enabled, got[i].Enabled, "enabled[%d]", i) - } + assert.ElementsMatch(t, prefs, got, "response must match input regardless of order")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for i := range prefs {

assert.Equal(t, prefs[i].EventType, got[i].EventType, "event_type[%d]", i)

assert.Equal(t, prefs[i].Enabled, got[i].Enabled, "enabled[%d]", i)

}

got, err := c.UpdateNotificationPreferences(prefs)

require.NoError(t, err, "update notification preferences (echo)")

require.Len(t, got, len(prefs), "response length must match input")

assert.ElementsMatch(t, prefs, got, "response must match input regardless of order")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/test/live/notification_live_test.go` around lines 63 - 66, The test is order-dependent because it compares prefs[i] to got[i]; change it to check by EventType instead: assert the lengths match, build a map from got items keyed by EventType (e.g., gotByType := map[string]NotificationPreference{...}), then iterate over prefs and for each prefs[j].EventType look up the corresponding got entry and assert its Enabled equals prefs[j].Enabled (and that the lookup exists). Update the loop using prefs and gotByType rather than index-based comparisons to avoid ordering issues.

coderabbitai · 2026-05-28T11:32:44Z

Actionable comments posted: 0

…kend Closes #96. Boots k8s-stack-manager (origin/main, api-only docker-compose profile) in the same job, applies a SQL fixture so require* helpers don't skip, mints a 1-day API key via the public auth endpoints, then runs the live suite. PR-time signal that the wire contracts still match — the class of bug stub-based unit tests can't see (recently surfaced in PR #98). What ships: .github/workflows/live-tests.yml - Triggers on push to main + every PR targeting main. - Clones k8s-sm main, `docker compose up backend` (no --profile → api-only, since frontend is gated to "full"). - Polls /health/live for up to 60s. - Pipes cli/test/live/testdata/ci-seed.sql into the mysql container. - Logs in as the env-seeded admin, mints a key via POST /api/v1/users/<id>/api-keys with expires_in_days=1, masks it in logs, exports as STACKCTL_LIVE_API_KEY. - `go test -tags live -timeout 10m ./test/live/...` - Dumps backend logs on failure. cli/test/live/testdata/ci-seed.sql - 1 cluster (no kubeconfig — test-connection will fail, tests expect that), 1 stack_definition + chart_config, 1 published stack_template + template_chart_config. - owner_id is looked up via SELECT … WHERE username='admin' so we don't need to inject the non-deterministic admin UUID. cli/test/live/doc.go - Documents the env vars + how to reproduce the CI flow locally (clone k8s-sm, compose up, apply seed, mint key, run suite). Design decisions (from the open questions in #96): - Bootstrap: backend SQL seed for fixtures, login-then-mint for the API key. Avoids hardcoding bcrypt/SHA-256 hashes in test data. - Backend pin: track origin/main. Surfaces cross-repo contract breaks immediately — that's exactly the signal #96 wants. - Workflow location: separate file. Easy to mark non-required if it ever gets flaky without blocking the rest of CI. Two contract-drift gotchas surfaced while validating against rancher-desktop k8s-sm (the value this workflow will catch on PRs): - POST /api/v1/users/<id>/api-keys requires expires_at OR expires_in_days. Initial draft sent neither and got 400. - The minted-key response field is `raw_key`, not `key`. Both fixed in the workflow + doc.go before pushing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

k8s-stack-manager's docker-compose.yml bind-mounts ./backend:/app for hot-reload during dev. With target: production, that overlays the image's baked-in /app/main binary with the host source tree, which has no compiled binary — container exits immediately with "stat ./main: no such file or directory". Ships testdata/docker-compose.ci.yml as a compose override that resets the volumes list (Compose ≥1.28 !reset semantics), and wires both -f files into every compose call in the workflow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

If login or api-key mint shapes drift, the workflow dumps the raw response to stderr for debugging. Both responses carry credentials (JWT in login, raw_key in mint) that shouldn't end up in CI logs even on the failure branch — masked outputs still surface the field names and status, which is the actual diagnostic value. Addresses CodeRabbit on PR #99. The sister findings about pinning actions/checkout@v6 and actions/setup-go@v6 to commit SHAs and adding persist-credentials: false are deliberately skipped — none of the four existing workflows in this repo follow that policy, and the right place to harden is a dedicated repo-wide change, not a one-workflow special case. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

.github/workflows/live-tests.yml (1)

36-40: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin action SHAs and disable checkout credential persistence.

Line 37 and Line 40 use floating action tags, and Line 36 does not set persist-credentials: false. This is still a supply-chain/security hardening gap.

Suggested patch

       - name: Check out stackctl
-        uses: actions/checkout@v6
+        uses: actions/checkout@<full_commit_sha_for_v6>
+        with:
+          persist-credentials: false

       - name: Set up Go
-        uses: actions/setup-go@v6
+        uses: actions/setup-go@<full_commit_sha_for_v6>

#!/bin/bash
set -euo pipefail

# Verify actions are pinned to SHAs and checkout disables credential persistence.
nl -ba .github/workflows/live-tests.yml | sed -n '30,55p'
echo
rg -n '^\s*uses:\s*actions/(checkout|setup-go)@' .github/workflows/live-tests.yml
echo
rg -n 'persist-credentials:\s*false' .github/workflows/live-tests.yml

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/live-tests.yml around lines 36 - 40, Update the workflow
to pin actions to specific SHAs and disable credential persistence: replace
floating tags actions/checkout@v6 and actions/setup-go@v6 with their respective
full SHA-pinned refs, and add persist-credentials: false under the checkout step
(referencing the uses: actions/checkout and uses: actions/setup-go entries and
the persist-credentials key) so the checkout does not leave credentials in the
runner.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @.github/workflows/live-tests.yml:
- Around line 36-40: Update the workflow to pin actions to specific SHAs and
disable credential persistence: replace floating tags actions/checkout@v6 and
actions/setup-go@v6 with their respective full SHA-pinned refs, and add
persist-credentials: false under the checkout step (referencing the uses:
actions/checkout and uses: actions/setup-go entries and the persist-credentials
key) so the checkout does not leave credentials in the runner.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1a6c76bb-420c-487a-b2a2-435fc701e1ce

📥 Commits

Reviewing files that changed from the base of the PR and between f9faebf and a38daf1.

⛔ Files ignored due to path filters (2)

cli/test/live/testdata/ci-seed.sql is excluded by !**/testdata/**
cli/test/live/testdata/docker-compose.ci.yml is excluded by !**/testdata/**

📒 Files selected for processing (2)

.github/workflows/live-tests.yml
cli/test/live/doc.go

✅ Files skipped from review due to trivial changes (1)

cli/test/live/doc.go

Upstream backend healthcheck runs `curl -f http://localhost:8081/health/live` inside the container, but the production alpine image only ships ./main + helm + ca-certificates — no curl. The healthcheck fails forever and `docker compose up --wait` hits its timeout. We already poll /health/live from the host in the next step, so the in-container check is redundant. Override with `test: ["NONE"]` and keep the host-side gate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Last attempt set `test: ["NONE"]` to disable the broken curl-based healthcheck, but `docker compose up --wait` errors out with "no healthcheck configured" instead of treating "running" as ready. Alpine's base image ships busybox `wget` (curl is not installed), so override the healthcheck command to use that. Shorter interval + retries chosen so a healthy backend is flagged ready within ~10s rather than upstream's 75s start_period. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The CI live-tests workflow runs against a stub cluster (no kubeconfig, no real kube-apiserver). The backend correctly returns 500 with "Failed to connect to cluster" for diagnostic endpoints in that state, so two subtests of TestLiveCluster_HealthAndTest were failing in CI while passing locally against rancher-desktop. Two changes: 1. The `health` subtest now skips on the same "cluster unreachable" conditions the `nodes` subtest already handles (added in #98 via CodeRabbit's tightening). Symmetric with `nodes` so a wire-shape regression on either endpoint still surfaces as a failure. 2. The substring set is extracted into isClusterUnreachable() and widened to include "failed to connect to cluster" — the actual literal in the backend's 500 response when kubeconfig is empty or the cluster is down. Verified against rancher-desktop k8s-sm: all three subtests still pass (real cluster, real kubeconfig). The CI api-only path will now skip the two diagnostic subtests cleanly instead of failing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cli/test/live/cluster_live_test.go (1)
64-112: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Bring this test in line with required table-driven + parallel test conventions.

This still uses ad-hoc subtests and is missing t.Parallel() on the parent and subtests, which violates the repo test rules for this path.

As per coding guidelines, "**/*_test.go: Use testify/assert with table-driven tests and t.Parallel() on parent and subtests" and "cli/**/*_test.go: ... with table-driven test patterns".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/test/live/cluster_live_test.go` around lines 64 - 112, The
TestLiveCluster_HealthAndTest function must be converted to a table-driven test
with t.Parallel() on the parent and each subtest: replace the ad-hoc t.Run calls
with a slice of test cases (e.g., names "health", "test_connection", "nodes")
and iterate calling t.Run(tc.name, func(t *testing.T){ t.Parallel(); ... }),
keeping the existing test logic inside the corresponding case bodies (use the
same calls GetClusterHealth, TestClusterConnection, GetClusterNodes and preserve
isClusterUnreachable handling and assertions); ensure the top-level
TestLiveCluster_HealthAndTest begins with t.Parallel() and use the table-driven
pattern required by the repo.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cli/test/live/cluster_live_test.go`:
- Around line 121-124: The skip-token list used in the test loop (the []string
literal iterated by for _, needle := range ...) is too broad because it includes
"unavailable"; remove that generic token and replace it with a cluster-specific
token such as "unreachable" (if that was the intended wording), leaving only
cluster-related phrases like "not reachable", "unreachable", and "connection
refused" so unrelated backend errors still surface as test failures.

---

Outside diff comments:
In `@cli/test/live/cluster_live_test.go`:
- Around line 64-112: The TestLiveCluster_HealthAndTest function must be
converted to a table-driven test with t.Parallel() on the parent and each
subtest: replace the ad-hoc t.Run calls with a slice of test cases (e.g., names
"health", "test_connection", "nodes") and iterate calling t.Run(tc.name, func(t
*testing.T){ t.Parallel(); ... }), keeping the existing test logic inside the
corresponding case bodies (use the same calls GetClusterHealth,
TestClusterConnection, GetClusterNodes and preserve isClusterUnreachable
handling and assertions); ensure the top-level TestLiveCluster_HealthAndTest
begins with t.Parallel() and use the table-driven pattern required by the repo.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 3363f8fc-b532-4f79-82fc-0af4ae935c82

📥 Commits

Reviewing files that changed from the base of the PR and between a38daf1 and 9c98f0e.

⛔ Files ignored due to path filters (1)

cli/test/live/testdata/docker-compose.ci.yml is excluded by !**/testdata/**

📒 Files selected for processing (1)

cli/test/live/cluster_live_test.go

Drop "unavailable" — too generic, could mask a legitimate 503 from a wire-shape regression. Add "unreachable" instead, which matches the backend's models.ClusterUnreachable status constant and is the more accurate token for a cluster-down condition. Addresses CodeRabbit on PR #99. The companion suggestion to convert TestLiveCluster_HealthAndTest to a table-driven + t.Parallel() form is deliberately skipped: none of the four sibling *_live_test.go files in this package follow that pattern (the convention applies to unit tests; live tests against a shared backend run serially by design), and a mass-refactor of the live suite is out of scope for a workflow PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai · 2026-05-28T12:21:56Z

Actionable comments posted: 0

…okups Two correctness fixes from CodeRabbit on PR #100: - apikey_live_test.go: promote the raw_key length check from assert.Truef to require.Truef. A failing length check followed by created.RawKey[:3] would panic instead of failing cleanly. - user_live_test.go: after DisableUser/EnableUser, look up the created user in the list response first (find-first pattern, matched by the other live tests) and require.NotNil before asserting on the flag. The previous range-and-skip would silently pass if the user was missing from the response. The companion table-driven + t.Parallel() refactor suggestion for cleanup_policy_live_test.go is deliberately skipped — same reason as on PR #99: every other *_live_test.go file in this package uses ad-hoc subtests and runs serially against a shared backend by design. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…olicies, cluster CRUD (#100) * test(live): expand suite — apikey CRUD, user lifecycle, template versions, cleanup policies, cluster CRUD Adds five new endpoint-group live tests covering the highest-blast-radius surfaces still missing from cli/test/live/. All tests are wire-shape focused (no real workloads created), follow the existing helpers/cleanup conventions in this package, and run cleanly under the CI api-only flow introduced in #99. New files: apikey_live_test.go - Create → list → revoke cycle against the calling user (whoami). - Locks the raw_key contract: sk_-prefixed, returned once, never in list. Was implicitly relied on by the CI bootstrap but never asserted. user_live_test.go - Register a throwaway user, list (admin-only path), disable, enable, reset-password, delete. Never operates on admin — locking out the caller would break the rest of the suite. template_versions_live_test.go - Publishes the same template twice (description-only change in between) to materialise two version snapshots, then exercises list → get → diff with shape assertions on left/right/chart_diffs. cleanup_policy_live_test.go - Full admin CRUD plus a dry-run execution. The condition "idle_days:9999" deliberately matches nothing so the run never mutates a real instance. cluster_lifecycle_live_test.go - Stub-cluster create → get → update → delete. IsDefault stays false so the test never disrupts requireCluster() for other tests. Exercises registry_* + image_pull_secret_name fields (the registry_password drift in PR #95 is the canonical example of why this surface needs a live test). Bonus findings surfaced during local validation against rancher-desktop (left as commented follow-ups, not blockers for this PR): - stackctl's UpdateTemplateRequest.Name is `omitempty` but the backend rejects PUT with "name is required" when omitted. Either drop the omitempty or relax the backend. - stackctl's CreateTemplateRequest has no `version` field, so the template-level Version is unsettable through the CLI — the version snapshot's `version` round-trips empty as a result. - Backend rejects kubeconfig_data unless KUBECONFIG_ENCRYPTION_KEY is configured (the CI compose env doesn't set it); kubeconfig_path works without that prerequisite. Verified locally against rancher-desktop k8s-stack-manager: full live suite passes (21 passed, 2 skipped by design, 0 failed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(live): address review — guard raw_key slice + find-first user lookups Two correctness fixes from CodeRabbit on PR #100: - apikey_live_test.go: promote the raw_key length check from assert.Truef to require.Truef. A failing length check followed by created.RawKey[:3] would panic instead of failing cleanly. - user_live_test.go: after DisableUser/EnableUser, look up the created user in the list response first (find-first pattern, matched by the other live tests) and require.NotNil before asserting on the flag. The previous range-and-skip would silently pass if the user was missing from the response. The companion table-driven + t.Parallel() refactor suggestion for cleanup_policy_live_test.go is deliberately skipped — same reason as on PR #99: every other *_live_test.go file in this package uses ad-hoc subtests and runs serially against a shared backend by design. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Olof Mattsson <olof.mattsson@klaravik.se> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Olof Mattsson and others added 3 commits May 28, 2026 13:33

omattsson force-pushed the ci/live-tests-workflow branch from f9faebf to a38daf1 Compare May 28, 2026 11:35

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Olof Mattsson and others added 3 commits May 28, 2026 13:41

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Comment thread cli/test/live/cluster_live_test.go

omattsson merged commit 72e8a2a into main May 28, 2026
8 checks passed

omattsson deleted the ci/live-tests-workflow branch May 28, 2026 12:28

omattsson mentioned this pull request May 28, 2026

test(live): apikey CRUD, user lifecycle, template versions, cleanup policies, cluster CRUD #100

Merged

3 tasks

-	for i := range prefs {
-		assert.Equal(t, prefs[i].EventType, got[i].EventType, "event_type[%d]", i)
-		assert.Equal(t, prefs[i].Enabled, got[i].Enabled, "enabled[%d]", i)
-	}
+	got, err := c.UpdateNotificationPreferences(prefs)
+	require.NoError(t, err, "update notification preferences (echo)")
+	require.Len(t, got, len(prefs), "response length must match input")
+	assert.ElementsMatch(t, prefs, got, "response must match input regardless of order")

Conversation

omattsson commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design decisions

Validation

Depends on

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related issues

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

omattsson commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading