Skip to content

ci(live): run cli/test/live/ on every PR against a freshly-booted backend#99

Merged
omattsson merged 7 commits into
mainfrom
ci/live-tests-workflow
May 28, 2026
Merged

ci(live): run cli/test/live/ on every PR against a freshly-booted backend#99
omattsson merged 7 commits into
mainfrom
ci/live-tests-workflow

Conversation

@omattsson

@omattsson omattsson commented May 28, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #96.

Adds a CI job that boots k8s-stack-manager (origin/main, api-only docker-compose profile) in the same runner, applies a SQL fixture, mints a 1-day API key, then runs cli/test/live/... against it. PR-time signal that the wire contracts still match between the two repos — the class of bug stub-based unit tests can't see, and that surfaced four times in the past week alone (#95, k8s-sm#264, #98).

Design decisions

Three open questions in #96 were resolved before writing the workflow:

Question Decision Rationale
Bootstrap mechanism Backend SQL seed + login-then-mint via API Hardcoded bcrypt / SHA-256 hashes in fixtures rot when hashing schemes change; using the public auth endpoints survives migrations. SQL fixtures cover the require* skip-gates that a fresh DB would otherwise trip.
k8s-sm pinning Track origin/main Surfacing cross-repo contract breaks is the whole point of this workflow. Pin later if it becomes too noisy.
Workflow location New .github/workflows/live-tests.yml Separable from the unit/integration job — easy to mark non-required if it ever gets flaky without blocking the rest of CI.

Validation

Bootstrap flow validated against the rancher-desktop k8s-sm before pushing:

  • SHOW COLUMNS confirms the SQL fixture matches the GORM-migrated schema for clusters, stack_definitions, chart_configs, stack_templates, template_chart_configs, users.
  • Seed runs cleanly inside a transaction with no schema errors.
  • /api/v1/auth/me returns {id, username, role} as expected.
  • POST /api/v1/users/<id>/api-keys returns .raw_key (validated key minted + revoked).

Two contract-drift gotchas surfaced during that validation (precisely the kind of bug this workflow will surface on PRs):

  • POST /api/v1/users/<id>/api-keys requires expires_at OR expires_in_days. Initial draft sent neither → 400.
  • Minted-key field is raw_key, not key.

Both fixed in the workflow + cli/test/live/doc.go before pushing.

Depends on

#98 (live suite expansion + bulk wire-contract fix). The live tests this job runs were introduced there — needs to merge first.

Test plan

  • CI green on this PR (live job passes)
  • Manually verify a follow-up PR that introduces a deliberate field-name drift fails this job with a clear error
  • Reproducing the CI flow locally via the steps in cli/test/live/doc.go produces the same result

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores

    • Added an automated live integration test workflow that runs on pushes to main and on pull requests; it boots the backend and database, seeds test data, mints a short‑lived API key, and executes the live test suite.
  • Documentation

    • Expanded live-test docs with setup steps, required environment variables, authentication examples, and CI reproduction instructions.
  • Bug Fixes

    • Live tests now gracefully skip cluster health diagnostics when the backend indicates the cluster is unreachable, reducing false failures.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a39fe683-01e0-40e1-bf8d-6c9a3f4a5529

📥 Commits

Reviewing files that changed from the base of the PR and between 9c98f0e and 8a34986.

📒 Files selected for processing (1)
  • cli/test/live/cluster_live_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cli/test/live/cluster_live_test.go

📝 Walkthrough

Walkthrough

Adds a GitHub Actions workflow that boots the API+MySQL backend, seeds MySQL, mints a short-lived API key, runs the live test suite (go test -tags live) with injected env vars, expands live test package documentation, and centralizes unreachable-cluster detection in tests.

Changes

Live Integration Test Suite CI Automation and Documentation

Layer / File(s) Summary
Workflow triggers and job setup
.github/workflows/live-tests.yml
Defines workflow name, triggers on push/pull_request to main, read-only contents permission, concurrency, job env, repo checkout, and Go setup using cli/go.mod/cli/go.sum.
Backend stack initialization and health check
.github/workflows/live-tests.yml
Clones omattsson/k8s-stack-manager, starts the API-only compose stack (backend + MySQL) and polls http://localhost:8081/health/live until healthy, failing with backend logs if unreachable.
Database seeding and API key minting
.github/workflows/live-tests.yml
Executes cli/test/live/testdata/ci-seed.sql inside MySQL container, authenticates against the backend, mints an API key, masks the raw key, and exports it via GITHUB_OUTPUT.
Live test suite execution and failure handling
.github/workflows/live-tests.yml
Runs go test -tags live from cli/ against ./test/live/... with backend URL and API key env vars; on failure prints backend container logs using the cloned compose files.
Live test suite documentation and contracts
cli/test/live/doc.go
Expands package-level documentation describing the live build tag, env-var contract (backend URL default, API key vs user/password), STACKCTL_LIVE_HEAVY toggle, and concrete local/CI reproduction commands.
Cluster unreachable handling in tests
cli/test/live/cluster_live_test.go
Refactors health and nodes subtests to conditionally skip when the backend indicates the cluster is unreachable, and adds isClusterUnreachable(err error) bool to centralize substring checks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a CI workflow to run live tests on every PR against a freshly-booted backend.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/live-tests-workflow

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
cli/pkg/client/client_test.go (1)

1800-1806: ⚡ Quick win

Lock instance_ids contract in all stack bulk success stubs.

TestBulkStop_Success, TestBulkClean_Success, and TestBulkDelete_Success don’t decode/assert the request body, so a payload-key regression could pass these tests while only TestBulkDeploy_Success catches it.

Suggested patch
 func TestBulkStop_Success(t *testing.T) {
 	t.Parallel()
 	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		assert.Equal(t, http.MethodPost, r.Method)
 		assert.Equal(t, "/api/v1/stack-instances/bulk/stop", r.URL.Path)
+		var body types.BulkInstancesRequest
+		require.NoError(t, json.NewDecoder(r.Body).Decode(&body))
+		assert.Equal(t, []string{"1"}, body.InstanceIDs)
 		w.WriteHeader(http.StatusOK)
 		json.NewEncoder(w).Encode(types.BulkResponse{
 			Results: []types.BulkOperationResult{
 				{InstanceID: "1", Status: "success"},
 			},
@@
 func TestBulkClean_Success(t *testing.T) {
 	t.Parallel()
 	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		assert.Equal(t, http.MethodPost, r.Method)
 		assert.Equal(t, "/api/v1/stack-instances/bulk/clean", r.URL.Path)
+		var body types.BulkInstancesRequest
+		require.NoError(t, json.NewDecoder(r.Body).Decode(&body))
+		assert.Equal(t, []string{"5", "6"}, body.InstanceIDs)
 		w.WriteHeader(http.StatusOK)
 		json.NewEncoder(w).Encode(types.BulkResponse{
 			Results: []types.BulkOperationResult{
 				{InstanceID: "5", Status: "success"},
 				{InstanceID: "6", Status: "success"},
@@
 func TestBulkDelete_Success(t *testing.T) {
 	t.Parallel()
 	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		assert.Equal(t, http.MethodPost, r.Method)
 		assert.Equal(t, "/api/v1/stack-instances/bulk/delete", r.URL.Path)
+		var body types.BulkInstancesRequest
+		require.NoError(t, json.NewDecoder(r.Body).Decode(&body))
+		assert.Equal(t, []string{"10"}, body.InstanceIDs)
 		w.WriteHeader(http.StatusOK)
 		json.NewEncoder(w).Encode(types.BulkResponse{
 			Results: []types.BulkOperationResult{
 				{InstanceID: "10", Status: "success"},
 			},

Also applies to: 1835-1841, 1870-1876

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/pkg/client/client_test.go` around lines 1800 - 1806, The three tests
TestBulkStop_Success, TestBulkClean_Success, and TestBulkDelete_Success need to
assert the request body contract (ensure the JSON contains "instance_ids") like
TestBulkDeploy_Success does: in each httptest handler for those tests decode the
request body (e.g., via json.NewDecoder(r.Body).Decode into a struct or map) and
assert that the "instance_ids" key (or the expected slice) is present and
correct before writing the success response; update the handlers inside
TestBulkStop_Success, TestBulkClean_Success, and TestBulkDelete_Success to
perform this decode/assert to lock the payload-key contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/live-tests.yml:
- Around line 86-94: The failure path currently echoes the full login JSON
(login) including sensitive JWTs and user IDs; change the log to redact or
remove sensitive fields instead of printing the raw payload—replace the echo
"$login" | jq . >&2 call with a sanitized jq transformation that deletes or
masks .token, .access_token, and .user.id (or sets them to "REDACTED") so you
still get useful debug output without exposing credentials; locate the
login/jwt/admin_id usage in the login block where BACKEND_URL, ADMIN_USERNAME,
ADMIN_PASSWORD, jwt, admin_id, and login are referenced and apply the jq
redaction there.
- Around line 36-40: Replace floating action tags and leaking logs: pin
actions/checkout@v6 and actions/setup-go@v6 to specific commit SHAs (replace the
`@v6` references with the corresponding full SHA), add persist-credentials: false
to the checkout step (the step currently using actions/checkout) to avoid
leaving credentials in the workspace, and stop printing raw auth JSON by
removing or replacing the echo "$login" | jq . >&2 and echo "$mint" | jq . >&2
lines—instead log only non-sensitive fields or a redacted summary (e.g., status
and expiry) so tokens/JWTs are never emitted.

In `@cli/cmd/bulk_test.go`:
- Around line 25-31: The template bulk tests are using the instance-scoped
fixture sampleBulkResponse instead of a template-scoped fixture, so swap those
usages to use sampleBulkTemplateResponse; locate the template
publish/unpublish/delete test cases that call sampleBulkResponse (mentions
around the blocks that assert instance_id) and replace those calls with
sampleBulkTemplateResponse, ensuring the assertions check template_id in the
returned types.BulkResponse.Results entries (and that sampleBulkTemplateResponse
returns TemplateID/populates the template-scoped fields expected by the tests).

In `@cli/test/live/notification_live_test.go`:
- Around line 63-66: The test is order-dependent because it compares prefs[i] to
got[i]; change it to check by EventType instead: assert the lengths match, build
a map from got items keyed by EventType (e.g., gotByType :=
map[string]NotificationPreference{...}), then iterate over prefs and for each
prefs[j].EventType look up the corresponding got entry and assert its Enabled
equals prefs[j].Enabled (and that the lookup exists). Update the loop using
prefs and gotByType rather than index-based comparisons to avoid ordering
issues.

---

Nitpick comments:
In `@cli/pkg/client/client_test.go`:
- Around line 1800-1806: The three tests TestBulkStop_Success,
TestBulkClean_Success, and TestBulkDelete_Success need to assert the request
body contract (ensure the JSON contains "instance_ids") like
TestBulkDeploy_Success does: in each httptest handler for those tests decode the
request body (e.g., via json.NewDecoder(r.Body).Decode into a struct or map) and
assert that the "instance_ids" key (or the expected slice) is present and
correct before writing the success response; update the handlers inside
TestBulkStop_Success, TestBulkClean_Success, and TestBulkDelete_Success to
perform this decode/assert to lock the payload-key contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: de82d5da-0556-4bd4-8f3e-58b1cb8c3bcc

📥 Commits

Reviewing files that changed from the base of the PR and between 565745d and 24bd762.

⛔ Files ignored due to path filters (1)
  • cli/test/live/testdata/ci-seed.sql is excluded by !**/testdata/**
📒 Files selected for processing (18)
  • .github/workflows/live-tests.yml
  • cli/cmd/bulk.go
  • cli/cmd/bulk_test.go
  • cli/pkg/client/client.go
  • cli/pkg/client/client_test.go
  • cli/pkg/types/types.go
  • cli/pkg/types/types_test.go
  • cli/test/e2e/cli_e2e_test.go
  • cli/test/live/audit_live_test.go
  • cli/test/live/bulk_live_test.go
  • cli/test/live/cluster_live_test.go
  • cli/test/live/definition_live_test.go
  • cli/test/live/doc.go
  • cli/test/live/favorite_live_test.go
  • cli/test/live/helpers_test.go
  • cli/test/live/live_test.go
  • cli/test/live/notification_live_test.go
  • cli/test/live/template_live_test.go

Comment on lines +36 to +40
- name: Check out stackctl
uses: actions/checkout@v6

- name: Set up Go
uses: actions/setup-go@v6

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify all GitHub actions are pinned by commit SHA and checkout disables persisted credentials.
fd -a '\.ya?ml$' .github/workflows | while read -r wf; do
  echo "== $wf =="
  rg -n '^\s*uses:\s*[^@]+@' "$wf"
  rg -n 'actions/checkout@' "$wf" -A3 -B1
done

Repository: omattsson/stackctl

Length of output: 112


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Show the exact relevant sections in live-tests.yml (with line numbers)
echo "---- live-tests.yml (lines 1-140) ----"
nl -ba .github/workflows/live-tests.yml | sed -n '1,140p'

# 2) Check whether persist-credentials is set anywhere in live-tests.yml
echo "---- persist-credentials occurrences ----"
rg -n "persist-credentials" .github/workflows/live-tests.yml || true

# 3) Inspect .github/instructions for rules related to pinned actions + secret/log handling
echo "---- .github/instructions/*.md files ----"
ls -la .github/instructions || true

echo "---- instruction keyword matches ----"
rg -n "(pin(ned)?|commit sha|checkout|persist-credentials|JWT|token|secret|logs?|supply chain)" .github/instructions/*.md || true

# 4) Also find any login/echo/jq payload printing patterns in live-tests.yml
echo "---- login/echo/jq printing patterns ----"
rg -n "(name:\s*Login|login\b|mint\b|jq\b|echo\s+\"\\$login\"|echo\s+\"\\$mint\"|validation failure|fail(ed)?)" .github/workflows/live-tests.yml || true

Repository: omattsson/stackctl

Length of output: 162


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Basic file existence
echo "---- file checks ----"
test -f .github/workflows/live-tests.yml && echo "live-tests.yml: exists" || echo "live-tests.yml: missing"
test -d .github/instructions && echo "instructions/: exists" || echo "instructions/: missing"

# Show live-tests.yml with line numbers (1-180)
echo "---- live-tests.yml (lines 1-180) ----"
awk '{printf "%5d  %s\n", NR, $0}' .github/workflows/live-tests.yml | sed -n '1,180p'

# Search action usage lines
echo "---- uses: actions/checkout@ / setup-go@ ----"
rg -n 'uses:\s*actions/(checkout|setup-go)@' .github/workflows/live-tests.yml || true

# Check persist-credentials
echo "---- persist-credentials occurrences ----"
rg -n 'persist-credentials' .github/workflows/live-tests.yml || true

# Check for any jq+echo of login/mint patterns
echo "---- login/mint/jq/echo sensitive payload printing patterns ----"
rg -n '(name:\s*Login|login\b|mint\b|jq\b|echo\s+"\\$login"|echo\s+"\\$mint"|echo\s+["'\'']\\$login|echo\s+["'\'']\\$mint|validation failure|fail(ed|ing)?|Authorization|Bearer|JWT|token)' .github/workflows/live-tests.yml || true

# Instructions directory listing
echo "---- .github/instructions listing ----"
ls -la .github/instructions || true

# Keyword matches in instructions
echo "---- instruction keyword matches ----"
rg -n '(pin(ned)?|commit sha|checkout|persist-credentials|JWT|token|secret|log(s)?|supply chain|credential)' .github/instructions/*.md || true

Repository: omattsson/stackctl

Length of output: 9369


Pin GitHub Actions to commit SHAs; redact auth payloads in CI logs.

  • .github/workflows/live-tests.yml uses floating tags (actions/checkout@v6, actions/setup-go@v6) and doesn’t set persist-credentials: false for checkout, weakening CI supply-chain/security posture.
  • On auth failures, the workflow prints full JSON responses (echo "$login" | jq . >&2 and echo "$mint" | jq . >&2), which can include the JWT/token fields—avoid dumping raw auth payloads; redact or print only non-sensitive fields.
Suggested hardening
-      - name: Check out stackctl
-        uses: actions/checkout@<pinned-commit-sha>
-        with:
-          persist-credentials: false
+      - name: Check out stackctl
+        uses: actions/checkout@<pinned-commit-sha>
+        with:
+          persist-credentials: false

-      - name: Set up Go
-        uses: actions/setup-go@<pinned-commit-sha>
+      - name: Set up Go
+        uses: actions/setup-go@<pinned-commit-sha>
🧰 Tools
🪛 zizmor (1.25.2)

[warning] 36-37: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)


[error] 37-37: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)


[error] 40-40: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/live-tests.yml around lines 36 - 40, Replace floating
action tags and leaking logs: pin actions/checkout@v6 and actions/setup-go@v6 to
specific commit SHAs (replace the `@v6` references with the corresponding full
SHA), add persist-credentials: false to the checkout step (the step currently
using actions/checkout) to avoid leaving credentials in the workspace, and stop
printing raw auth JSON by removing or replacing the echo "$login" | jq . >&2 and
echo "$mint" | jq . >&2 lines—instead log only non-sensitive fields or a
redacted summary (e.g., status and expiry) so tokens/JWTs are never emitted.

Comment thread .github/workflows/live-tests.yml
Comment thread cli/cmd/bulk_test.go
Comment on lines 25 to 31
func sampleBulkResponse() types.BulkResponse {
return types.BulkResponse{
Results: []types.BulkOperationResult{
{ID: "1", Success: true},
{ID: "2", Success: true},
{ID: "3", Success: false, Error: "not found"},
{InstanceID: "1", Status: "success"},
{InstanceID: "2", Status: "success"},
{InstanceID: "3", Status: "error", Error: "not found"},
},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Template bulk tests are asserting the wrong result field (instance_id).

Template bulk responses should be template-scoped. Reusing an instance-scoped fixture here means these tests won’t catch regressions where template_id is broken or missing.

Suggested fix
-func sampleBulkResponse() types.BulkResponse {
+func sampleBulkInstanceResponse() types.BulkResponse {
 	return types.BulkResponse{
 		Results: []types.BulkOperationResult{
 			{InstanceID: "1", Status: "success"},
 			{InstanceID: "2", Status: "success"},
 			{InstanceID: "3", Status: "error", Error: "not found"},
 		},
 	}
}
+
+func sampleBulkTemplateResponse() types.BulkResponse {
+	return types.BulkResponse{
+		Results: []types.BulkOperationResult{
+			{TemplateID: "1", Status: "success"},
+			{TemplateID: "2", Status: "success"},
+			{TemplateID: "3", Status: "error", Error: "not found"},
+		},
+	}
+}
-assert.Contains(t, out, "instance_id: \"1\"")
+assert.Contains(t, out, "template_id: \"1\"")

Use sampleBulkTemplateResponse() across template publish/unpublish/delete output tests.

Also applies to: 892-894, 1022-1024, 1228-1230

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/cmd/bulk_test.go` around lines 25 - 31, The template bulk tests are using
the instance-scoped fixture sampleBulkResponse instead of a template-scoped
fixture, so swap those usages to use sampleBulkTemplateResponse; locate the
template publish/unpublish/delete test cases that call sampleBulkResponse
(mentions around the blocks that assert instance_id) and replace those calls
with sampleBulkTemplateResponse, ensuring the assertions check template_id in
the returned types.BulkResponse.Results entries (and that
sampleBulkTemplateResponse returns TemplateID/populates the template-scoped
fields expected by the tests).

Comment on lines +63 to +66
for i := range prefs {
assert.Equal(t, prefs[i].EventType, got[i].EventType, "event_type[%d]", i)
assert.Equal(t, prefs[i].Enabled, got[i].Enabled, "enabled[%d]", i)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid order-dependent preference assertions.

At Line 63, comparing prefs[i] with got[i] assumes stable ordering from the API. If ordering changes, this test can fail even when payload content is correct.

Proposed fix
  got, err := c.UpdateNotificationPreferences(prefs)
  require.NoError(t, err, "update notification preferences (echo)")
  require.Len(t, got, len(prefs), "response length must match input")
- for i := range prefs {
- 	assert.Equal(t, prefs[i].EventType, got[i].EventType, "event_type[%d]", i)
- 	assert.Equal(t, prefs[i].Enabled, got[i].Enabled, "enabled[%d]", i)
- }
+ assert.ElementsMatch(t, prefs, got, "response must match input regardless of order")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i := range prefs {
assert.Equal(t, prefs[i].EventType, got[i].EventType, "event_type[%d]", i)
assert.Equal(t, prefs[i].Enabled, got[i].Enabled, "enabled[%d]", i)
}
got, err := c.UpdateNotificationPreferences(prefs)
require.NoError(t, err, "update notification preferences (echo)")
require.Len(t, got, len(prefs), "response length must match input")
assert.ElementsMatch(t, prefs, got, "response must match input regardless of order")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/test/live/notification_live_test.go` around lines 63 - 66, The test is
order-dependent because it compares prefs[i] to got[i]; change it to check by
EventType instead: assert the lengths match, build a map from got items keyed by
EventType (e.g., gotByType := map[string]NotificationPreference{...}), then
iterate over prefs and for each prefs[j].EventType look up the corresponding got
entry and assert its Enabled equals prefs[j].Enabled (and that the lookup
exists). Update the loop using prefs and gotByType rather than index-based
comparisons to avoid ordering issues.

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown

Actionable comments posted: 0

Olof Mattsson and others added 3 commits May 28, 2026 13:33
…kend

Closes #96.

Boots k8s-stack-manager (origin/main, api-only docker-compose profile)
in the same job, applies a SQL fixture so require* helpers don't skip,
mints a 1-day API key via the public auth endpoints, then runs the
live suite. PR-time signal that the wire contracts still match — the
class of bug stub-based unit tests can't see (recently surfaced in
PR #98).

What ships:

  .github/workflows/live-tests.yml
      - Triggers on push to main + every PR targeting main.
      - Clones k8s-sm main, `docker compose up backend` (no --profile
        → api-only, since frontend is gated to "full").
      - Polls /health/live for up to 60s.
      - Pipes cli/test/live/testdata/ci-seed.sql into the mysql
        container.
      - Logs in as the env-seeded admin, mints a key via
        POST /api/v1/users/<id>/api-keys with expires_in_days=1,
        masks it in logs, exports as STACKCTL_LIVE_API_KEY.
      - `go test -tags live -timeout 10m ./test/live/...`
      - Dumps backend logs on failure.

  cli/test/live/testdata/ci-seed.sql
      - 1 cluster (no kubeconfig — test-connection will fail, tests
        expect that), 1 stack_definition + chart_config, 1 published
        stack_template + template_chart_config.
      - owner_id is looked up via SELECT … WHERE username='admin' so
        we don't need to inject the non-deterministic admin UUID.

  cli/test/live/doc.go
      - Documents the env vars + how to reproduce the CI flow locally
        (clone k8s-sm, compose up, apply seed, mint key, run suite).

Design decisions (from the open questions in #96):

  - Bootstrap: backend SQL seed for fixtures, login-then-mint for the
    API key. Avoids hardcoding bcrypt/SHA-256 hashes in test data.
  - Backend pin: track origin/main. Surfaces cross-repo contract
    breaks immediately — that's exactly the signal #96 wants.
  - Workflow location: separate file. Easy to mark non-required if it
    ever gets flaky without blocking the rest of CI.

Two contract-drift gotchas surfaced while validating against
rancher-desktop k8s-sm (the value this workflow will catch on PRs):

  - POST /api/v1/users/<id>/api-keys requires expires_at OR
    expires_in_days. Initial draft sent neither and got 400.
  - The minted-key response field is `raw_key`, not `key`.

Both fixed in the workflow + doc.go before pushing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
k8s-stack-manager's docker-compose.yml bind-mounts ./backend:/app for
hot-reload during dev. With target: production, that overlays the
image's baked-in /app/main binary with the host source tree, which
has no compiled binary — container exits immediately with
"stat ./main: no such file or directory".

Ships testdata/docker-compose.ci.yml as a compose override that
resets the volumes list (Compose ≥1.28 !reset semantics), and wires
both -f files into every compose call in the workflow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
If login or api-key mint shapes drift, the workflow dumps the raw
response to stderr for debugging. Both responses carry credentials
(JWT in login, raw_key in mint) that shouldn't end up in CI logs even
on the failure branch — masked outputs still surface the field names
and status, which is the actual diagnostic value.

Addresses CodeRabbit on PR #99. The sister findings about pinning
actions/checkout@v6 and actions/setup-go@v6 to commit SHAs and adding
persist-credentials: false are deliberately skipped — none of the
four existing workflows in this repo follow that policy, and the
right place to harden is a dedicated repo-wide change, not a
one-workflow special case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@omattsson omattsson force-pushed the ci/live-tests-workflow branch from f9faebf to a38daf1 Compare May 28, 2026 11:35

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
.github/workflows/live-tests.yml (1)

36-40: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin action SHAs and disable checkout credential persistence.

Line 37 and Line 40 use floating action tags, and Line 36 does not set persist-credentials: false. This is still a supply-chain/security hardening gap.

Suggested patch
       - name: Check out stackctl
-        uses: actions/checkout@v6
+        uses: actions/checkout@<full_commit_sha_for_v6>
+        with:
+          persist-credentials: false

       - name: Set up Go
-        uses: actions/setup-go@v6
+        uses: actions/setup-go@<full_commit_sha_for_v6>
#!/bin/bash
set -euo pipefail

# Verify actions are pinned to SHAs and checkout disables credential persistence.
nl -ba .github/workflows/live-tests.yml | sed -n '30,55p'
echo
rg -n '^\s*uses:\s*actions/(checkout|setup-go)@' .github/workflows/live-tests.yml
echo
rg -n 'persist-credentials:\s*false' .github/workflows/live-tests.yml
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/live-tests.yml around lines 36 - 40, Update the workflow
to pin actions to specific SHAs and disable credential persistence: replace
floating tags actions/checkout@v6 and actions/setup-go@v6 with their respective
full SHA-pinned refs, and add persist-credentials: false under the checkout step
(referencing the uses: actions/checkout and uses: actions/setup-go entries and
the persist-credentials key) so the checkout does not leave credentials in the
runner.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @.github/workflows/live-tests.yml:
- Around line 36-40: Update the workflow to pin actions to specific SHAs and
disable credential persistence: replace floating tags actions/checkout@v6 and
actions/setup-go@v6 with their respective full SHA-pinned refs, and add
persist-credentials: false under the checkout step (referencing the uses:
actions/checkout and uses: actions/setup-go entries and the persist-credentials
key) so the checkout does not leave credentials in the runner.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1a6c76bb-420c-487a-b2a2-435fc701e1ce

📥 Commits

Reviewing files that changed from the base of the PR and between f9faebf and a38daf1.

⛔ Files ignored due to path filters (2)
  • cli/test/live/testdata/ci-seed.sql is excluded by !**/testdata/**
  • cli/test/live/testdata/docker-compose.ci.yml is excluded by !**/testdata/**
📒 Files selected for processing (2)
  • .github/workflows/live-tests.yml
  • cli/test/live/doc.go
✅ Files skipped from review due to trivial changes (1)
  • cli/test/live/doc.go

Olof Mattsson and others added 3 commits May 28, 2026 13:41
Upstream backend healthcheck runs `curl -f http://localhost:8081/health/live`
inside the container, but the production alpine image only ships
./main + helm + ca-certificates — no curl. The healthcheck fails
forever and `docker compose up --wait` hits its timeout.

We already poll /health/live from the host in the next step, so the
in-container check is redundant. Override with `test: ["NONE"]` and
keep the host-side gate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Last attempt set `test: ["NONE"]` to disable the broken curl-based
healthcheck, but `docker compose up --wait` errors out with "no
healthcheck configured" instead of treating "running" as ready.

Alpine's base image ships busybox `wget` (curl is not installed), so
override the healthcheck command to use that. Shorter interval +
retries chosen so a healthy backend is flagged ready within ~10s
rather than upstream's 75s start_period.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The CI live-tests workflow runs against a stub cluster (no kubeconfig,
no real kube-apiserver). The backend correctly returns 500 with
"Failed to connect to cluster" for diagnostic endpoints in that
state, so two subtests of TestLiveCluster_HealthAndTest were failing
in CI while passing locally against rancher-desktop.

Two changes:

1. The `health` subtest now skips on the same "cluster unreachable"
   conditions the `nodes` subtest already handles (added in #98 via
   CodeRabbit's tightening). Symmetric with `nodes` so a wire-shape
   regression on either endpoint still surfaces as a failure.

2. The substring set is extracted into isClusterUnreachable() and
   widened to include "failed to connect to cluster" — the actual
   literal in the backend's 500 response when kubeconfig is empty or
   the cluster is down.

Verified against rancher-desktop k8s-sm: all three subtests still
pass (real cluster, real kubeconfig). The CI api-only path will now
skip the two diagnostic subtests cleanly instead of failing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cli/test/live/cluster_live_test.go (1)

64-112: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Bring this test in line with required table-driven + parallel test conventions.

This still uses ad-hoc subtests and is missing t.Parallel() on the parent and subtests, which violates the repo test rules for this path.

As per coding guidelines, "**/*_test.go: Use testify/assert with table-driven tests and t.Parallel() on parent and subtests" and "cli/**/*_test.go: ... with table-driven test patterns".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/test/live/cluster_live_test.go` around lines 64 - 112, The
TestLiveCluster_HealthAndTest function must be converted to a table-driven test
with t.Parallel() on the parent and each subtest: replace the ad-hoc t.Run calls
with a slice of test cases (e.g., names "health", "test_connection", "nodes")
and iterate calling t.Run(tc.name, func(t *testing.T){ t.Parallel(); ... }),
keeping the existing test logic inside the corresponding case bodies (use the
same calls GetClusterHealth, TestClusterConnection, GetClusterNodes and preserve
isClusterUnreachable handling and assertions); ensure the top-level
TestLiveCluster_HealthAndTest begins with t.Parallel() and use the table-driven
pattern required by the repo.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cli/test/live/cluster_live_test.go`:
- Around line 121-124: The skip-token list used in the test loop (the []string
literal iterated by for _, needle := range ...) is too broad because it includes
"unavailable"; remove that generic token and replace it with a cluster-specific
token such as "unreachable" (if that was the intended wording), leaving only
cluster-related phrases like "not reachable", "unreachable", and "connection
refused" so unrelated backend errors still surface as test failures.

---

Outside diff comments:
In `@cli/test/live/cluster_live_test.go`:
- Around line 64-112: The TestLiveCluster_HealthAndTest function must be
converted to a table-driven test with t.Parallel() on the parent and each
subtest: replace the ad-hoc t.Run calls with a slice of test cases (e.g., names
"health", "test_connection", "nodes") and iterate calling t.Run(tc.name, func(t
*testing.T){ t.Parallel(); ... }), keeping the existing test logic inside the
corresponding case bodies (use the same calls GetClusterHealth,
TestClusterConnection, GetClusterNodes and preserve isClusterUnreachable
handling and assertions); ensure the top-level TestLiveCluster_HealthAndTest
begins with t.Parallel() and use the table-driven pattern required by the repo.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 3363f8fc-b532-4f79-82fc-0af4ae935c82

📥 Commits

Reviewing files that changed from the base of the PR and between a38daf1 and 9c98f0e.

⛔ Files ignored due to path filters (1)
  • cli/test/live/testdata/docker-compose.ci.yml is excluded by !**/testdata/**
📒 Files selected for processing (1)
  • cli/test/live/cluster_live_test.go

Comment thread cli/test/live/cluster_live_test.go
Drop "unavailable" — too generic, could mask a legitimate 503 from a
wire-shape regression. Add "unreachable" instead, which matches the
backend's models.ClusterUnreachable status constant and is the more
accurate token for a cluster-down condition.

Addresses CodeRabbit on PR #99. The companion suggestion to convert
TestLiveCluster_HealthAndTest to a table-driven + t.Parallel() form
is deliberately skipped: none of the four sibling *_live_test.go
files in this package follow that pattern (the convention applies to
unit tests; live tests against a shared backend run serially by
design), and a mass-refactor of the live suite is out of scope for a
workflow PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown

Actionable comments posted: 0

@omattsson omattsson merged commit 72e8a2a into main May 28, 2026
8 checks passed
@omattsson omattsson deleted the ci/live-tests-workflow branch May 28, 2026 12:28
omattsson pushed a commit that referenced this pull request May 28, 2026
…okups

Two correctness fixes from CodeRabbit on PR #100:

  - apikey_live_test.go: promote the raw_key length check from
    assert.Truef to require.Truef. A failing length check followed
    by created.RawKey[:3] would panic instead of failing cleanly.

  - user_live_test.go: after DisableUser/EnableUser, look up the
    created user in the list response first (find-first pattern,
    matched by the other live tests) and require.NotNil before
    asserting on the flag. The previous range-and-skip would silently
    pass if the user was missing from the response.

The companion table-driven + t.Parallel() refactor suggestion for
cleanup_policy_live_test.go is deliberately skipped — same reason as
on PR #99: every other *_live_test.go file in this package uses
ad-hoc subtests and runs serially against a shared backend by design.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
omattsson added a commit that referenced this pull request May 28, 2026
…olicies, cluster CRUD (#100)

* test(live): expand suite — apikey CRUD, user lifecycle, template versions, cleanup policies, cluster CRUD

Adds five new endpoint-group live tests covering the highest-blast-radius
surfaces still missing from cli/test/live/. All tests are wire-shape
focused (no real workloads created), follow the existing helpers/cleanup
conventions in this package, and run cleanly under the CI api-only flow
introduced in #99.

New files:
  apikey_live_test.go
      - Create → list → revoke cycle against the calling user (whoami).
      - Locks the raw_key contract: sk_-prefixed, returned once, never
        in list. Was implicitly relied on by the CI bootstrap but
        never asserted.

  user_live_test.go
      - Register a throwaway user, list (admin-only path), disable,
        enable, reset-password, delete. Never operates on admin —
        locking out the caller would break the rest of the suite.

  template_versions_live_test.go
      - Publishes the same template twice (description-only change in
        between) to materialise two version snapshots, then exercises
        list → get → diff with shape assertions on left/right/chart_diffs.

  cleanup_policy_live_test.go
      - Full admin CRUD plus a dry-run execution. The condition
        "idle_days:9999" deliberately matches nothing so the run never
        mutates a real instance.

  cluster_lifecycle_live_test.go
      - Stub-cluster create → get → update → delete. IsDefault stays
        false so the test never disrupts requireCluster() for other
        tests. Exercises registry_* + image_pull_secret_name fields
        (the registry_password drift in PR #95 is the canonical
        example of why this surface needs a live test).

Bonus findings surfaced during local validation against rancher-desktop
(left as commented follow-ups, not blockers for this PR):

  - stackctl's UpdateTemplateRequest.Name is `omitempty` but the
    backend rejects PUT with "name is required" when omitted. Either
    drop the omitempty or relax the backend.
  - stackctl's CreateTemplateRequest has no `version` field, so the
    template-level Version is unsettable through the CLI — the version
    snapshot's `version` round-trips empty as a result.
  - Backend rejects kubeconfig_data unless KUBECONFIG_ENCRYPTION_KEY
    is configured (the CI compose env doesn't set it); kubeconfig_path
    works without that prerequisite.

Verified locally against rancher-desktop k8s-stack-manager: full live
suite passes (21 passed, 2 skipped by design, 0 failed).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(live): address review — guard raw_key slice + find-first user lookups

Two correctness fixes from CodeRabbit on PR #100:

  - apikey_live_test.go: promote the raw_key length check from
    assert.Truef to require.Truef. A failing length check followed
    by created.RawKey[:3] would panic instead of failing cleanly.

  - user_live_test.go: after DisableUser/EnableUser, look up the
    created user in the list response first (find-first pattern,
    matched by the other live tests) and require.NotNil before
    asserting on the flag. The previous range-and-skip would silently
    pass if the user was missing from the response.

The companion table-driven + t.Parallel() refactor suggestion for
cleanup_policy_live_test.go is deliberately skipped — same reason as
on PR #99: every other *_live_test.go file in this package uses
ad-hoc subtests and runs serially against a shared backend by design.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Olof Mattsson <olof.mattsson@klaravik.se>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test(live): run cli/test/live/ in CI against a booted backend

1 participant