chore(release): add Helm chart, Grafana dashboard, GoReleaser + workflows, changelog automation, promotion checklists, evidence harness #1

flyingrobots · 2025-09-12T20:01:13Z

Alpha Release: v0.4.0

Summary

Adds Helm chart (deploy/chart/job-queue-system) for worker/producer/all-in-one modes
Adds Grafana dashboard JSON (deploy/grafana/dashboards/work-queue.json)
Adds GoReleaser pipeline (binaries for linux/darwin/windows amd64/arm64) and GHCR docker images
Adds changelog workflow; retains manual CHANGELOG until git-chglog is configured
Adds promotion checklists with confidence scores and improvement actions (docs/15_promotion_checklists.md)
Adds evidence bundle (docs/evidence/) with CI URL, bench JSON, metrics snapshots, and config
Enforces branch protection on main (1 approving review; enforce admins)
Keeps CI green with Go 1.24.x and govulncheck

Promotion gates (excerpt; full in docs/15_promotion_checklists.md)

Alpha → Beta overall confidence ~0.85 (functional OK; perf/coverage improvements planned)
Beta → RC ~0.70 (needs controlled perf run, chaos tests, soak)
RC → GA ~0.70 (release flow ready; soak + rollback rehearsal pending)

Evidence

CI run JSON: docs/evidence/ci_run.json (includes URL to passing run)
Bench JSON (1000 jobs @ 500 rps): docs/evidence/bench.json
Metrics snapshots: docs/evidence/metrics_before.txt, docs/evidence/metrics_after.txt
Config used: docs/evidence/config.alpha.yaml
Repro: docs/evidence/README.md (Docker/local steps)

Release automation

Tag vX.Y.Z to trigger GoReleaser (binaries + GHCR images) and Release notes
Changelog workflow included (falls back to existing CHANGELOG if git-chglog not configured)

Notes

CI is passing for this branch (see Checks)
Main is protected; PR requires 1 approval
Future tasks: CODEOWNERS, chaos tests, controlled 4 vCPU perf run with PromQL latency (p95) using histogram

…lows, changelog automation, promotion checklists, evidence harness

…sts; update root README with release status and evidence pointers

…s (ignore all except settings/extensions/launch)

coderabbitai · 2025-09-12T20:34:50Z

Warning

Rate limit exceeded

@flyingrobots has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 518d193 and 2f598e2.

📒 Files selected for processing (4)

.github/workflows/changelog.yml (1 hunks)
.github/workflows/ci.yml (2 hunks)
.github/workflows/goreleaser.yml (1 hunks)
deploy/grafana/dashboards/work-queue.json (1 hunks)

Summary by CodeRabbit

New Features
- Admin CLI: purge-all and stats-keys.
- Expanded metrics (circuit breaker trips, reaper recovered); configurable breaker pause; single-probe half-open; cancellable processing.
- Safer, faster producer (root-path enforcement, stream enqueuing).
- New Grafana dashboard (latency, throughput, breaker, queues, workers).
Documentation
- Major overhaul: roadmap, release plan, architecture/spec, ops runbook, testing guide, promotion checklists, evidence harness; updated README and PRD.
Tests
- Breaker load/integration tests; repeated E2E run for determinism.
Chores
- Standardize on Go 1.25; automated releases/changelog; markdown lint + hooks; CI tweaks; CODEOWNERS; editor recommendations.

Walkthrough

Adds CI/release automation (changelog + GoReleaser), upgrades Go to 1.25, introduces Grafana dashboard and new Prometheus metrics, extends admin CLI and admin APIs, enforces single-probe circuit-breaker half-open semantics, streams producer enqueueing, hardens worker/reaper Redis handling, adds tests, docs, hooks, linting, and many documentation/evidence artifacts.

Changes

Cohort / File(s)	Summary
CI & Release Workflows `.github/workflows/changelog.yml`, `.github/workflows/goreleaser.yml`, `.github/workflows/ci.yml`, `.github/workflows/release.yml`	Add changelog and goreleaser workflows; bump Go to 1.25 in CI/release; add E2E determinism loop and resilient changelog commit/push steps.
GoReleaser & Module `.goreleaser.yaml`, `go.mod`	Add GoReleaser config (multi-OS/arch builds, artifacts, checksums, GHCR, GitHub releases); set module Go to 1.25.0.
Repo Policy & Editor `.github/CODEOWNERS`, `.gitignore`, `.vscode/extensions.json`, `.vscode/settings.json`	Add CODEOWNERS; refine .gitignore (whitelist select .vscode files, add .obsidian); add VS Code recommendations/settings.
Docs & Evidence `README.md`, `docs/*`, `docs/evidence/`, `docs/evidence/run_bench.sh`, `docs/did-not-agree/*`	Large documentation additions and rewrites: promotion checklists, testing guide, evidence artifacts and harness, release/roadmap/spec docs, README polish.
Observability & Dashboards `deploy/grafana/dashboards/work-queue.json`, `internal/obs/metrics.go`, `internal/obs/*`	Add Grafana dashboard; register `CircuitBreakerTrips` and `ReaperRecovered` metrics; minor obs file formatting/headers.
Admin CLI & Admin Package `cmd/job-queue-system/main.go`, `internal/admin/admin.go`	Add `purge-all` and `stats-keys` admin commands; new exported `PurgeAll`, `StatsKeys` and `KeysStats` type; JSON output and deletion counts.
Circuit Breaker `internal/breaker/breaker.go`, `internal/breaker/breaker_test.go`, `internal/breaker/breaker_load_test.go`	Add `halfOpenInFlight` to enforce single-probe in HalfOpen; update/add tests asserting single-probe under concurrency/load.
Config `internal/config/config.go`, `internal/config/config_test.go`	Add `Worker.BreakerPause time.Duration` (mapstructure:"breaker_pause") with default 100ms and v.SetDefault.
Producer `internal/producer/producer.go`, `internal/producer/producer_test.go`	Switch to streaming enqueue during WalkDir, validate absolute paths under root, use absolute paths in jobs.
Worker `internal/worker/worker.go`, `internal/worker/*_test.go`, `internal/worker/worker_breaker_integration_test.go`	Add `Worker.baseID` for centralized IDs; periodic breaker-state metric updater; use configurable `BreakerPause`; stricter Redis error handling/logging; cancellable processing timers; add integration test verifying breaker pauses consumption.
Reaper & Redis client `internal/reaper/reaper.go`, `internal/reaper/*_test.go`, `internal/redisclient/client.go`	Make LPush error handling explicit; increment `ReaperRecovered` on successful requeue; add headers/formatting; redis client unchanged otherwise.
Queue / Job `internal/queue/job.go`, `internal/queue/job_test.go`	Header/formatting-only changes.
Tests & Load Tests `internal/breaker/breaker_load_test.go`, `internal/worker/worker_breaker_integration_test.go`, `test/e2e/*`	Add breaker concurrency/load tests; worker-breaker integration test; CI runs E2E determinism loop.
Githooks & Makefile `.githooks/pre-commit`, `Makefile`	Add pre-commit Markdown lint autofix hook; Makefile targets `hooks` and `mdlint`.
Markdown Linting Workflow & Config `.github/workflows/markdownlint.yml`, `.markdownlint.yaml`	Add markdownlint workflow and configuration.
Grafana / Metrics Artifacts `docs/evidence/metrics_before.txt`, `docs/evidence/metrics_after.txt`, `deploy/grafana/...`	Add baseline and after metrics snapshots and Grafana dashboard JSON.
Formatting / Headers many `cmd/...`, `internal/...`, `test/...`	Add copyright headers, remove trailing blank lines, and small formatting tweaks across many files.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Developer
  participant GH as GitHub Actions
  participant Repo as Repository
  participant Chg as Changelog WF
  participant Rel as GoReleaser WF
  participant GHCR as GHCR

  Dev->>GH: push tag vX.Y.Z or manual dispatch
  GH->>Chg: trigger changelog job
  GH->>Rel: trigger goreleaser job

  rect rgba(220,240,255,0.12)
    Chg->>Repo: checkout (full history & tags)
    Chg->>Chg: setup Go 1.25 & install git-chglog
    Chg->>Repo: generate CHANGELOG.md (git-chglog)
    Chg->>Repo: git add/commit CHANGELOG.md (|| true)
    Chg->>Repo: git push (non-fatal)
  end

  rect rgba(230,255,220,0.12)
    Rel->>Repo: checkout (fetch-depth 0)
    Rel->>Rel: setup Go 1.25
    Rel->>GHCR: docker login (GITHUB_TOKEN)
    Rel->>Rel: goreleaser release --clean
    Rel->>GHCR: push images & publish artifacts
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

"Headers stamped, workflows primed to run,
Tags push forth and changelogs hum.
Dashboards watch, evidence boxed tight,
Releases build through day and night.
Ship it clean — no excuses, get it done."

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title accurately summarizes the PR’s release-focused scope by naming the Helm chart, Grafana dashboard, GoReleaser and workflows, changelog automation, promotion checklists, and evidence harness, all of which appear in the changeset. It is specific and on-topic rather than generic, so it meets the requirement to convey the primary change. The title is verbose and enumerative, however, so it could be shortened for better scanability in the commit/PR history.
Description Check	✅ Passed	The PR description is detailed and directly matches the changeset: it documents the Helm chart, Grafana dashboard, GoReleaser pipeline, changelog workflow, promotion checklists, evidence bundle, CI/tooling notes, and governance items that the diff introduces. It provides release context, objectives, and evidence references, satisfying the lenient requirement for a related description. There is a small inconsistency to fix: the description mentions Go 1.24.x while the changeset updates go.mod and workflows to Go 1.25.0, which could confuse reviewers.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

flyingrobots · 2025-09-12T20:36:56Z

@coderabbitai review

coderabbitai · 2025-09-12T20:37:02Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…ations for Go dev

coderabbitai

Actionable comments posted: 32

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.goreleaser.yaml (1)
45-46: Remove trailing blank line.

YAMLlint flags the extra empty line.
 snapshot:
   name_template: SNAPSHOT-{{ .ShortCommit }}
-

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb12827 and 605ea9d.

📒 Files selected for processing (15)

.github/workflows/changelog.yml (1 hunks)
.github/workflows/goreleaser.yml (1 hunks)
.gitignore (1 hunks)
.goreleaser.yaml (1 hunks)
README.md (1 hunks)
deploy/grafana/dashboards/work-queue.json (1 hunks)
docs/15_promotion_checklists.md (1 hunks)
docs/evidence/README.md (1 hunks)
docs/evidence/bench.json (1 hunks)
docs/evidence/ci_run.json (1 hunks)
docs/evidence/config.alpha.yaml (1 hunks)
docs/evidence/config.used.yaml (1 hunks)
docs/evidence/issues.json (1 hunks)
docs/evidence/metrics_after.txt (1 hunks)
docs/evidence/metrics_before.txt (1 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

docs/15_promotion_checklists.md

5-5: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

6-6: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

33-33: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

34-34: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

44-44: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

45-45: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

52-52: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

53-53: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above

(MD022, blanks-around-headings)

64-64: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

65-65: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above

(MD022, blanks-around-headings)

README.md

56-56: Bare URL used

(MD034, no-bare-urls)

59-59: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

64-64: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

docs/evidence/README.md

5-5: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

6-6: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

🪛 YAMLlint (1.37.1)

docs/evidence/config.used.yaml

[error] 20-20: too many spaces after colon

(colons)

[error] 22-22: too many spaces after colon

(colons)

docs/evidence/config.alpha.yaml

[error] 20-20: too many spaces after colon

(colons)

[error] 22-22: too many spaces after colon

(colons)

.goreleaser.yaml

[error] 38-38: too many spaces inside braces

(braces)

[error] 38-38: too many spaces inside braces

(braces)

[error] 39-39: too many spaces inside braces

(braces)

[error] 39-39: too many spaces inside braces

(braces)

[error] 45-45: too many blank lines (1 > 0)

(empty-lines)

🔇 Additional comments (7)

.gitignore (1)

35-36: Obsidian ignore LGTM.

.goreleaser.yaml (1)

7-13: Confirm version variable path.

Ensure your main package exposes var version string so -X main.version={{.Version}} sets it correctly.

Would you like a quick grep script to verify the symbol is present in cmd/job-queue-system?

.github/workflows/goreleaser.yml (1)

20-21: Pin Go toolchain input to matrix/default branch if needed.

Go 1.24.x is fine if intentional; just flagging that CI and evidence show 1.24/1.25 variance.

Confirm you want releases built with 1.24.x while evidence artifacts show go1.25.0.

docs/evidence/metrics_after.txt (1)

18-18: Evidence inconsistencies: produced/consumed/queue backlog and Go version.

go_info shows go1.25.0 while CI uses 1.24.x.

jobs_consumed_total=480, jobs_completed_total=480, but jobs_produced_total=0.

queue_length low_priority=2020 suggests backlog. If producer metrics are disabled, note it.

Please confirm the run context and annotate README with expected counter relationships to avoid misinterpretation by reviewers.

Also applies to: 109-112, 121-121, 135-139

docs/evidence/config.used.yaml (1)

21-24: Validate key patterns match implementation.

Ensure processing/heartbeat keys align with code to prevent orphaned heartbeats.

I can generate a repo scan to find Redis key usage sites and cross-check patterns.

docs/evidence/metrics_before.txt (1)

1-136: Baseline snapshot looks consistent.

No issues spotted; good complement to metrics_after.

docs/evidence/README.md (1)

10-15: Verify metrics port in curl command matches config.

README uses 9191 while config.used.yaml shows 9090; if config.alpha.yaml differs, note it explicitly.

I can scan all configs for metrics_port and propose a single source of truth in docs.

Also applies to: 42-45

.github/workflows/changelog.yml

.github/workflows/goreleaser.yml

coderabbitai · 2025-09-12T20:48:08Z

.gitignore

+# VS Code: ignore all by default, allow key shared files
+.vscode/*
+!.vscode/extensions.json
+!.vscode/settings.json
+!.vscode/launch.json


🧹 Nitpick (assertive)

Selective .vscode tracking is fine; ensure no secrets slip in.

extensions.json/settings.json/launch.json can hold credentials (e.g., debug env). Recommend adding a pre-commit check to block secrets in these files.

🤖 Prompt for AI Agents

In .gitignore around lines 27 to 31, the reviewer asks for a safeguard because tracked .vscode files (extensions.json, settings.json, launch.json) can contain secrets; add a pre-commit hook or integrate a repository check (e.g., Husky + lint-staged or a CI check) that scans those specific files for secrets/credentials and rejects commits when patterns are detected. Implement the hook to run a lightweight secret scanner (regex checks for common patterns like AWS keys, private keys, auth tokens, env var declarations) only on .vscode/extensions.json, .vscode/settings.json, and .vscode/launch.json, and ensure it exits non-zero with a clear error message instructing the developer to remove secrets or move them to environment variables; add documentation in CONTRIBUTING.md about the check and how to bypass temporarily with maintainer-approved workflow.

docs/evidence/README.md

README.md

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 605ea9d and 66ca59a.

📒 Files selected for processing (2)

.vscode/extensions.json (1 hunks)
.vscode/settings.json (1 hunks)

🔇 Additional comments (1)

.vscode/settings.json (1)

2-8: Good Go save-actions setup

Per-file Go formatting and organize-imports on save look appropriate.

coderabbitai · 2025-09-12T20:54:14Z

.vscode/extensions.json

+{
+  "recommendations": [
+    "golang.go"
+  ],
+  "unwantedRecommendations": []
+}


🧹 Nitpick (assertive)

Add YAML/Helm tooling to match new chart assets

Recommend adding Kubernetes/Helm and YAML support to align with the new Helm chart and K8s manifests.

{ "recommendations": [ - "golang.go" + "golang.go", + "redhat.vscode-yaml", + "ms-kubernetes-tools.vscode-kubernetes-tools" ], "unwantedRecommendations": [] }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{

"recommendations": [

"golang.go"

],

"unwantedRecommendations": []

}

{

"recommendations": [

"golang.go",

"redhat.vscode-yaml",

"ms-kubernetes-tools.vscode-kubernetes-tools"

],

"unwantedRecommendations": []

}

🤖 Prompt for AI Agents

.vscode/extensions.json around lines 1 to 6: the workspace extensions list only recommends the Go extension but the repo now contains Helm charts and Kubernetes YAML manifests; update the recommendations array to include YAML and Helm/Kubernetes tooling (for example add "redhat.vscode-yaml", "ms-azuretools.vscode-helm" and "ms-kubernetes-tools.vscode-kubernetes-tools") so users get proper linting, schema support and Helm assistance when opening the workspace.

.vscode/settings.json

coderabbitai · 2025-09-12T20:54:14Z

.vscode/settings.json

+  "gopls": {
+    "ui.semanticTokens": true,
+    "build.experimentalWorkspaceModule": true
+  },


💡 Verification agent

🧩 Analysis chain

Verify gopls experimental flag; it may be obsolete

gopls’s build.experimentalWorkspaceModule has changed across versions; confirm it’s still valid with your pinned Go extension/gopls. If obsolete, drop it.

"gopls": { "ui.semanticTokens": true, - "build.experimentalWorkspaceModule": true + // "build.experimentalWorkspaceModule": true },

To verify with upstream docs:

🌐 Web query:

Is gopls setting "build.experimentalWorkspaceModule" still supported in the latest gopls? If deprecated, what is the recommended default behavior or replacement?

💡 Result:

Short answer: No — it was deprecated and then removed. Use go.work (multi-module workspace support in the go command) instead.

Details / sources:

Deprecation announced (replaced by go.work; removed in gopls v0.11.0+). [1]

Release notes / removal and recommended replacement (create a go.work; e.g. go work use -r .). [2][3]

Sources:
[1] gopls v0.10.x release notes (deprecates experimentalWorkspaceModule in favor of go.work).
[2] gopls v0.12.0 notes (removes the setting; instructs using go.work / go work use -r .).
[3] GitHub issue tracking removal of experimentalWorkspaceModule.

Remove deprecated gopls setting "build.experimentalWorkspaceModule"

Deprecated in gopls v0.10.x and removed by v0.12.0; use go.work (e.g. go work use -r .) for multi-module workspaces. Action: delete the "build.experimentalWorkspaceModule" entry from .vscode/settings.json (lines 9–12); do not comment it out (JSON has no comments).

🤖 Prompt for AI Agents

In .vscode/settings.json around lines 9 to 12, the gopls setting "build.experimentalWorkspaceModule" is deprecated and should be removed; edit the file and delete the "build.experimentalWorkspaceModule": true entry (do not comment it out), leaving the rest of the "gopls" object intact and ensure the JSON remains valid.

coderabbitai · 2025-09-12T20:54:14Z

.vscode/settings.json

+  "go.testFlags": ["-race", "-count=1"],
+  "go.toolsManagement.autoUpdate": true


🧹 Nitpick (assertive)

Test flags trade-off

Running tests with -race and -count=1 by default is great for catching issues but slower. If dev feedback becomes a concern, consider moving -race to a separate profile/task.

🤖 Prompt for AI Agents

In .vscode/settings.json at lines 13-14 the default go.testFlags include ["-race", "-count=1"], which slows local feedback; remove "-race" from the default go.testFlags and add a separate configuration/task for race-enabled tests (or a VS Code test profile) so developers can run fast tests by default while still having an easy opt-in for race detection; update settings.json to set go.testFlags to ["-count=1"] and add a new task or profile (e.g., in .vscode/tasks.json or launch configurations) named "Go: test (race)" that invokes tests with ["-race","-count=1"] and document usage in the project README.

flyingrobots · 2025-09-12T20:59:33Z

@coderabbitai full review

coderabbitai · 2025-09-12T20:59:39Z

✅ Actions performed

Full review triggered.

…bump go.mod to 1.25; docs: clarify evidence harness (producer counters, clearing keys, metrics port); update config.used.yaml to match harness

flyingrobots · 2025-09-12T21:10:09Z

Addressed CodeRabbit requests:\n\n- Aligned Go toolchain to 1.25.x across CI, release, and GoReleaser workflows; bumped go.mod to 1.25.0 (evidence had go_info=go1.25.0).\n- Confirmed version variable path: main has 'var version string' and ldflags '-X main.version' is wired.\n- Clarified evidence harness: noted admin bench does not increment jobs_produced_total; added instructions to clear Redis keys before running to avoid backlog; standardized metrics port to 9191 and updated docs/evidence/config.used.yaml.\n- Verified processing/heartbeat key patterns match implementation.\n\nIf you want, I can add a script or admin subcommand to purge all test keys to make the harness one-click.\n

…ssing lists (with --yes)\nci: add 5x E2E determinism gate\ndocs: add CODEOWNERS

flyingrobots · 2025-09-12T21:14:16Z

Updates

Added purge-all admin command (requires --yes) to clear test keys:
- priority queues, completed/dead_letter, rate limiter, and per-worker processing + heartbeats.
Added CI determinism gate: runs e2e test 5× against Redis service.
Introduced CODEOWNERS (default @flyingrobots) to request reviews.

Notes

For future PR comments, I’ll format with Markdown (no literal \n).
If you’d like additional owners (e.g., SRE/security), share handles and I’ll update CODEOWNERS accordingly.

# Example purge-all
./bin/job-queue-system --role=admin --admin-cmd=purge-all --yes --config=config/config.yaml

flyingrobots · 2025-09-12T21:20:45Z

Updates (round 2)

Added stats-keys admin command: summarizes queue lengths, processing lists and items, heartbeat count, and rate limiter TTL.
Added purge-all earlier; together these support reproducible evidence runs and audits.
Updated README with new admin commands.

If you want additional CODEOWNERS for SRE/Security, please share their GitHub handles (e.g., @org/sre-team, @alice, @bob) and I will add them.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

.github/workflows/ci.yml (1)

12-22: Service config: drop unnecessary port mapping; keep healthcheck.

For GitHub Actions services you don't need host port mapping; the runner can reach the service via localhost:${mapped} only if you map, but mapping isn't required—use the service name and container port or keep the healthcheck and remove the host mapping to avoid collisions on self-hosted runners.

If you want host access, keep as-is; otherwise:
     services:
       redis:
         image: redis:7-alpine
-        ports:
-          - 6379:6379
         options: >-
           --health-cmd "redis-cli ping"
           --health-interval 5s
           --health-timeout 3s
           --health-retries 5
And set E2E_REDIS_ADDR: redis:6379.

.github/workflows/release.yml (3)

20-41: You’ve built two competing release pipelines. Kill one before it kills your releases.

This workflow builds/pushes images and creates a GitHub Release on tag push. The new GoReleaser workflow does the same. You'll double-publish images and stomp release notes.

Options:

Delete this workflow and rely solely on GoReleaser, or

Restrict triggers (e.g., only prerelease tags here) or set if: false until migration is done.

Minimal disable:
 jobs:
   build-and-publish:
+    if: false  # disabled; GoReleaser owns releases/images
     runs-on: ubuntu-latest
21-27: Stop running “make tidy” in a release job.

Don't mutate modules during release. Fetch deps only.
-      - name: Build binary
-        run: |
-          make tidy
-          make build VERSION=${GITHUB_REF_NAME}
+      - name: Build binary
+        run: make build VERSION=${GITHUB_REF_NAME}
25-37: Single-arch Docker build is not acceptable for users on arm64.

If you insist on keeping this workflow, set up Buildx + QEMU and push multi-arch.
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@v3
+      - name: Set up Buildx
+        uses: docker/setup-buildx-action@v3
       - name: Docker build
-        run: |
-          docker build -t ghcr.io/${{ github.repository }}:${GITHUB_REF_NAME} -t ghcr.io/${{ github.repository }}:latest .
+        run: |
+          docker buildx build \
+            --platform linux/amd64,linux/arm64 \
+            -t ghcr.io/${{ github.repository }}:${GITHUB_REF_NAME} \
+            -t ghcr.io/${{ github.repository }}:latest \
+            --push .

docs/evidence/README.md (1)

16-23: More MD032: surround bullet lists under “Important notes” and “Notes”.

Spacing. Do it.

-Important notes
-- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress.
-- To avoid stale backlog affecting evidence, clear test keys before running a bench: `redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter` and `redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL`.
-- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port.
+Important notes
+
+- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress.
+- To avoid stale backlog affecting evidence, clear test keys before running a bench: `redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter` and `redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL`.
+- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port.
@@
-Notes
-- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there.
+Notes
+
+- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there.

♻️ Duplicate comments (8)

.github/workflows/goreleaser.yml (2)

17-20: Fetch full history or GoReleaser will bork your changelog and versioning.

You didn't fetch tags. GoReleaser needs them.
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
25-31: Set up QEMU + Buildx so your multi-arch images aren’t fantasy.

Without these, any multi-arch config in .goreleaser.yaml won’t build/push properly.
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@v3
+      - name: Set up Buildx
+        uses: docker/setup-buildx-action@v3
       - name: Login to GHCR
         uses: docker/login-action@v3

README.md (3)

56-56: Fix MD034: don’t dump bare URLs like it’s 1998.

Wrap the PR URL.

-Release branch open for v0.4.0-alpha: see PR https://github.com/flyingrobots/go-redis-work-queue/pull/1
+Release branch open for v0.4.0-alpha: see PR <https://github.com/flyingrobots/go-redis-work-queue/pull/1>

63-66: Same MD032 problem under “Evidence artifacts”.

Space it out.

-Evidence artifacts (docs/evidence/):
+Evidence artifacts (docs/evidence/):
+
 - ci_run.json (CI URL), bench.json (throughput/latency), metrics_before/after.txt, config.alpha.yaml
-
-To reproduce evidence locally, see docs/evidence/README.md.
+ 
+To reproduce evidence locally, see docs/evidence/README.md.

58-62: Fix MD032: lists must be surrounded by blank lines.

Add spacing so linters stop whining.

-Promotion gates and confidence summary (details in docs/15_promotion_checklists.md):
+Promotion gates and confidence summary (details in docs/15_promotion_checklists.md):
+
 - Alpha → Beta: overall confidence ~0.85 (functional/observability/CI strong; perf and coverage improvements planned)
 - Beta → RC: overall confidence ~0.70 (needs controlled perf run, chaos tests, soak)
 - RC → GA: overall confidence ~0.70 (release flow ready; soak and rollback rehearsal pending)
+

docs/evidence/config.used.yaml (1)

20-22: YAML lint violations: too many spaces after colon. Fix it.

This garbage breaks linters.

     high: "jobqueue:high_priority"
-    low:  "jobqueue:low_priority"
+    low: "jobqueue:low_priority"
   processing_list_pattern: "jobqueue:worker:%s:processing"
-  heartbeat_key_pattern:  "jobqueue:processing:worker:%s"
+  heartbeat_key_pattern: "jobqueue:processing:worker:%s"

docs/evidence/README.md (2)

8-15: Make “Reproduce locally” a real heading and surround the list.

Follow basic Markdown rules.

-Reproduce locally
+## Reproduce locally
+
 1) Ensure Redis is running on `localhost:6379` (e.g., `docker run -p 6379:6379 redis:7-alpine`)
 2) Build binary: `make build`
 3) Start worker: `./bin/job-queue-system --role=worker --config=docs/evidence/config.alpha.yaml`
 4) In another terminal, run bench: `./bin/job-queue-system --role=admin --config=docs/evidence/config.alpha.yaml --admin-cmd=bench --bench-count=1000 --bench-rate=500 --bench-priority=low --bench-timeout=60s`
 5) Capture metrics: `curl -sS localhost:9191/metrics | head -n 200 > docs/evidence/metrics_after.txt`
+

1-7: markdownlint: headings and lists need blank lines.

Add spacing after the H1.

 # Evidence for v0.4.0-alpha Promotion
-
 - CI run: see `ci_run.json` (contains URL to the successful workflow run)
 - Bench JSON: `bench.json` (admin bench with 1000 jobs at 500 rps)
 - Config used: `config.alpha.yaml`
 - Metrics snapshots: `metrics_before.txt`, `metrics_after.txt`
+

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b6a17f and d45946d.

📒 Files selected for processing (7)

.github/workflows/ci.yml (1 hunks)
.github/workflows/goreleaser.yml (1 hunks)
.github/workflows/release.yml (1 hunks)
README.md (2 hunks)
docs/evidence/README.md (1 hunks)
docs/evidence/config.used.yaml (1 hunks)
go.mod (1 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

docs/evidence/README.md

17-17: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

22-22: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

README.md

56-56: Bare URL used

(MD034, no-bare-urls)

59-59: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

64-64: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

🪛 YAMLlint (1.37.1)

docs/evidence/config.used.yaml

[error] 20-20: too many spaces after colon

(colons)

[error] 22-22: too many spaces after colon

(colons)

🔇 Additional comments (4)

go.mod (1)

50-56: Sanity-check ultra-recent google genproto pins.

These pseudo-versions (20250825…) are fine if intentional, but they drag in massive trees. Confirm you really need them; otherwise prune by running go mod tidy -compat=1.25 locally and re-vetting.
.github/workflows/goreleaser.yml (1)
18-20: Enable Go module cache.

Shave minutes off releases.
       - uses: actions/setup-go@v5
         with:
-          go-version: '1.25.x'
+          go-version: '1.25.x'
+          cache: true
Likely an incorrect or invalid review comment.
README.md (1)

21-21: Good: explicit Go 1.25+.

Alignment with toolchain and CI looks consistent.

docs/evidence/config.used.yaml (1)

42-44: Port consistency check with docs.

This file uses metrics_port 9191. Ensure README and evidence curl commands consistently use 9191 or clearly call out overrides.

.github/workflows/ci.yml

coderabbitai · 2025-09-12T21:22:35Z

.github/workflows/goreleaser.yml

+      - name: Run GoReleaser
+        uses: goreleaser/goreleaser-action@v6
+        with:
+          distribution: goreleaser
+          version: latest
+          args: release --clean
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}


🧹 Nitpick (assertive)

Scope permissions narrowly and surface attestations.

Consider adding provenance/signing (SLSA/cosign) and setting tighter permissions per step. Not blocking, but you’ll thank me later.

🤖 Prompt for AI Agents

.github/workflows/goreleaser.yml lines 31-38: tighten the step permissions and enable provenance/signing for releases by (1) adding a top-level permissions block that limits the job to the minimum rights (e.g., contents: read, id-token: write for OIDC token exchange, and only add packages or actions permissions if absolutely required), (2) granting the goreleaser step only the specific environment variables/secrets it needs instead of full GITHUB_TOKEN scope, and (3) enabling artifact provenance and signing by calling goreleaser with provenance flags and/or invoking a cosign/sigstore step after build (or configure goreleaser to sign) using OIDC (id-token) and the COSIGN_* secrets so releases are attested and signed. Ensure the workflow creates and exposes attestations (SLSA provenance) and that any tokens/secrets are minimal and scoped.

coderabbitai · 2025-09-12T21:22:35Z

go.mod

 module github.com/flyingrobots/go-redis-work-queue

-go 1.24.0
+go 1.25.0


⚠️ Potential issue

go.mod “go” directive is invalid; use major.minor, not patch.

go 1.25.0 will be rejected. The directive must be go 1.25.

-go 1.25.0 +go 1.25

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

go 1.25.0

go 1.25

🤖 Prompt for AI Agents

In go.mod around lines 3 to 3, the go directive currently uses a patch version ("go 1.25.0") which is invalid; change it to a major.minor form ("go 1.25"). Update the line to remove the patch component so the file reads the supported directive format.

…worker ID uniqueness and pause via config; better error handling; cancel-safe timers\nfeat(metrics): add circuit_breaker_trips_total and reaper_recovered_total\nfeat(producer): stream file scanning + path validation\nci: tests updated for new breaker behavior

…mmitted)

…ation

…teria

flyingrobots · 2025-09-12T22:08:35Z

Docs: Did-Not-Agree Decisions

Per request, I documented deviations/deferrals under docs/did-not-agree/:

rate-limiter-fixed-window.md — Keeping fixed-window limiter for alpha; token bucket evaluation noted.
priority-fetching-brpoplpush.md — Priority loop with per-queue BRPOPLPUSH; deviation documented and tradeoffs listed.
reaper-scan-vs-registry.md — SCAN-based approach kept for simplicity; registry/notifications reserved for scale needs.
metrics-scope.md — Additional metrics deferred (restart count, pool internals); revisit criteria defined.
go-toolchain-version.md — Using Go 1.25.x due to stdlib CVEs; CI/tooling aligned.

These files include rationale, tradeoffs, and revisit criteria. Happy to adjust or add more entries as reviews continue.

…llow calls

… consumption; tune minSamples for determinism with miniredis

…commands; link from README

…uage identifiers (bash)

coderabbitai

Actionable comments posted: 36

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (15)

internal/producer/producer.go (2)

95-119: Rate limiter is racy across producers; make INCR+EXPIRE atomic and avoid herd-sleep.

Multiple producers can miss setting TTL. Use a Lua script to INCR and set EX with NX in one shot; then sleep based on returned TTL once, not per overage.

I can provide a small Lua + EvalSha helper if you want it wired here.

70-73: Fail fast — do not LPush to an empty Redis key.

Producer and reaper both fall back to cfg.Worker.Queues[default] without ensuring it’s set; if missing you’ll LPush to "" and no worker will consume. Consumers skip empty keys (internal/worker/worker.go:89–93), so this silently drops/loses jobs.

Fix producer (internal/producer/producer.go:69–73): after resolving key, if key == "" return an error. (Your suggested diff is correct.)

Fix reaper (internal/reaper/reaper.go:66–70): apply the same guard — never LPush to an empty dest (log+metric or skip if you can’t fail the process).

Add startup config validation: ensure cfg.Worker.Queues contains entries for all priorities and that Producer.DefaultPriority maps to a non-empty queue key.

Add tests (unit/e2e) to cover missing mappings.

internal/obs/metrics.go (1)

69-78: Your metrics HTTP server has no timeouts. That’s amateur hour.

Add sane timeouts to avoid trivial slowloris issues.

 import (
     "fmt"
     "net/http"
+    "time"
@@
-    srv := &http.Server{Addr: fmt.Sprintf(":%d", cfg.Observability.MetricsPort), Handler: mux}
+    srv := &http.Server{
+        Addr:              fmt.Sprintf(":%d", cfg.Observability.MetricsPort),
+        Handler:           mux,
+        ReadHeaderTimeout: 5 * time.Second,
+        ReadTimeout:       10 * time.Second,
+        WriteTimeout:      10 * time.Second,
+        IdleTimeout:       120 * time.Second,
+    }

internal/reaper/reaper.go (2)

59-75: Non-atomic RPOP + LPUSH can lose jobs on crash. You’re running a queue, not a shredder. Fix it.

Crash between RPOP and LPUSH = payload disappears. Use a staging list with RPOPLPUSH + LMOVE so recovery is crash-safe.

-            for {
-                payload, err := r.rdb.RPop(ctx, plist).Result()
-                if err == redis.Nil { break }
-                if err != nil { r.log.Warn("reaper rpop error", obs.Err(err)); break }
-                job, err := queue.UnmarshalJob(payload)
-                if err != nil {
-                    continue
-                }
-                prio := job.Priority
-                dest := r.cfg.Worker.Queues[prio]
-                if dest == "" { dest = r.cfg.Worker.Queues[r.cfg.Producer.DefaultPriority] }
-                if err := r.rdb.LPush(ctx, dest, payload).Err(); err != nil {
-                    r.log.Error("requeue failed", obs.Err(err))
-                } else {
-                    obs.ReaperRecovered.Inc()
-                    r.log.Warn("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID))
-                }
-            }
+            staging := fmt.Sprintf("jobqueue:reaper:%s:staging", workerID)
+            for {
+                // Atomic pop from processing -> staging
+                payload, err := r.rdb.RPopLPush(ctx, plist, staging).Result()
+                if err == redis.Nil { break }
+                if err != nil { r.log.Warn("reaper rpoplpush error", obs.Err(err)); break }
+
+                job, err := queue.UnmarshalJob(payload)
+                if err != nil {
+                    // leave payload in staging; next scan can handle or we can drop with LREM policy if desired
+                    r.log.Warn("bad job payload left in staging", obs.Err(err))
+                    continue
+                }
+                prio := job.Priority
+                dest := r.cfg.Worker.Queues[prio]
+                if dest == "" { dest = r.cfg.Worker.Queues[r.cfg.Producer.DefaultPriority] }
+
+                // Atomic move staging -> destination
+                if _, err := r.rdb.LMove(ctx, staging, dest, "RIGHT", "LEFT").Result(); err != nil {
+                    r.log.Error("requeue move failed", obs.Err(err))
+                    // payload remains in staging; will be retried next cycle
+                    continue
+                }
+                obs.ReaperRecovered.Inc()
+                r.log.Info("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID))
+            }

54-55: You’re ignoring EXISTS errors. That’s sloppy and dangerous.

A transient Redis error shouldn’t be treated like “worker dead.”

-            exists, _ := r.rdb.Exists(ctx, hbKey).Result()
-            if exists == 1 { continue } // worker healthy
+            exists, err := r.rdb.Exists(ctx, hbKey).Result()
+            if err != nil {
+                r.log.Warn("reaper heartbeat exists check error", obs.Err(err))
+                continue
+            }
+            if exists == 1 { continue } // worker healthy

internal/config/config.go (3)

124-128: Your default metrics port conflicts with your docs. Pick one and stick to it.

Docs say 9191; code ships 9090. Stop gaslighting your operators.

-            MetricsPort: 9090,
+            MetricsPort: 9191,

177-181: Set viper default to match the new port. Consistency isn’t optional.

-    v.SetDefault("observability.metrics_port", def.Observability.MetricsPort)
+    v.SetDefault("observability.metrics_port", def.Observability.MetricsPort)

(Ensure def.Observability.MetricsPort reflects 9191 after the change above.)

200-226: Validate the new breaker_pause and sampling interval. Don’t accept garbage.

Missing constraints invite foot-guns.

 func Validate(cfg *Config) error {
@@
     if cfg.Worker.BRPopLPushTimeout <= 0 || cfg.Worker.BRPopLPushTimeout > cfg.Worker.HeartbeatTTL/2 {
         return fmt.Errorf("worker.brpoplpush_timeout must be >0 and <= heartbeat_ttl/2")
     }
+    if cfg.Worker.BreakerPause <= 0 {
+        return fmt.Errorf("worker.breaker_pause must be > 0")
+    }
+    if !strings.Contains(cfg.Worker.HeartbeatKeyPattern, "%s") {
+        return fmt.Errorf("worker.heartbeat_key_pattern must contain %%s")
+    }
+    if !strings.Contains(cfg.Worker.ProcessingListPattern, "%s") {
+        return fmt.Errorf("worker.processing_list_pattern must contain %%s")
+    }
     if cfg.Producer.RateLimitPerSec < 0 {
         return fmt.Errorf("producer.rate_limit_per_sec must be >= 0")
     }
     if cfg.Observability.MetricsPort <= 0 || cfg.Observability.MetricsPort > 65535 {
         return fmt.Errorf("observability.metrics_port must be 1..65535")
     }
+    if cfg.Observability.QueueSampleInterval <= 0 {
+        return fmt.Errorf("observability.queue_sample_interval must be > 0")
+    }
     return nil
 }

README.md (1)

35-41: Ports are WRONG. README says 9090; config/evidence use 9191. Pick one.

You already standardized on 9191 elsewhere. Fix Metrics/Health/Readiness to 9191 here to stop gaslighting your users.

Apply:

-- Prometheus metrics exposed at http://localhost:9090/metrics by default
+- Prometheus metrics exposed at http://localhost:9191/metrics by default
@@
-- Liveness: http://localhost:9090/healthz returns 200 when the process is up
-- Readiness: http://localhost:9090/readyz returns 200 only when Redis is reachable
+- Liveness: http://localhost:9191/healthz returns 200 when the process is up
+- Readiness: http://localhost:9191/readyz returns 200 only when Redis is reachable

docs/evidence/README.md (1)

24-41: More MD032: surround lists and code fences with blank lines.

Fix spacing around “Automated harness” bullets and the code block.

-Automated harness
+## Automated harness
+
 - A convenience script `docs/evidence/run_bench.sh` automates the above steps.
 - Default params: COUNT=1000, RATE=500, PRIORITY=low, CONFIG=docs/evidence/config.alpha.yaml, PURGE=1.
 - Example:
 
 ```bash
 bash docs/evidence/run_bench.sh
 # or override
 COUNT=2000 RATE=1000 PRIORITY=low bash docs/evidence/run_bench.sh

Outputs are written under docs/evidence/run_YYYYmmdd_HHMMSS/:
+

worker.log, worker.pid
metrics_before.txt, metrics_after.txt
stats_before.json, stats_after.json
bench.json


</blockquote></details>
<details>
<summary>internal/breaker/breaker.go (1)</summary><blockquote>

`108-116`: **Ensure lastTransition is updated before returning in all HalfOpen paths.**

Minor, but be explicit: set lastTransition immediately upon state change before touching flags to avoid future footguns. Also, add a comment clarifying probe-completion semantics.

```diff
-        if ok {
-            cb.state = Closed
-        } else {
-            cb.state = Open
-        }
-        // the single probe completed; allow a future probe after cooldown or next Allow
-        cb.halfOpenInFlight = false
-        cb.lastTransition = now
+        if ok {
+            cb.state = Closed
+        } else {
+            cb.state = Open
+        }
+        cb.lastTransition = now
+        // The single HalfOpen probe completed; clear in-flight to allow a future probe on next HalfOpen window.
+        cb.halfOpenInFlight = false

cmd/job-queue-system/main.go (2)

85-92: Do NOT start the HTTP/metrics server in admin mode. It’s pointless and can collide with a running worker.

You’re binding to the metrics port for a one-shot CLI. If that port is already in use, your admin command dies before doing anything. Gate the HTTP server (and the queue-length updater) behind role != "admin".

Example fix (outside the changed hunk):
// Before creating httpSrv / starting queue length updater:
if role != "admin" {
    httpSrv := obs.StartHTTPServer(cfg, readyCheck)
    defer func() { _ = httpSrv.Shutdown(context.Background()) }()
    // Background metrics: queue lengths
    obs.StartQueueLengthUpdater(ctx, cfg, rdb, logger)
}
142-149: Stop abusing default; add a proper 'admin' case.

This default+if dance is sloppy and confuses static analysis. Add a real case and let default handle unknown roles.
-    default:
-        // admin role
-        if role == "admin" {
-            runAdmin(ctx, cfg, rdb, logger, adminCmd, adminQueue, adminN, adminYes, benchCount, benchRate, benchPriority, benchTimeout)
-            return
-        }
-        logger.Fatal("unknown role", obs.String("role", role))
+    case "admin":
+        runAdmin(ctx, cfg, rdb, logger, adminCmd, adminQueue, adminN, adminYes, benchCount, benchRate, benchPriority, benchTimeout)
+        return
+    default:
+        logger.Fatal("unknown role", obs.String("role", role))

internal/worker/worker.go (2)

51-70: Circuit-breaker trip counting is racy. Centralize transition detection in this goroutine.

Multiple workers read state, then Record, then read state again — you double-count on contention. Track last state here and increment exactly once per transition to Open.

-    go func() {
-        ticker := time.NewTicker(2 * time.Second)
-        defer ticker.Stop()
-        for {
-            select {
-            case <-ctx.Done():
-                return
-            case <-ticker.C:
-                switch w.cb.State() {
-                case breaker.Closed:
-                    obs.CircuitBreakerState.Set(0)
-                case breaker.HalfOpen:
-                    obs.CircuitBreakerState.Set(1)
-                case breaker.Open:
-                    obs.CircuitBreakerState.Set(2)
-                }
-            }
-        }
-    }()
+    go func() {
+        ticker := time.NewTicker(2 * time.Second)
+        defer ticker.Stop()
+        last := w.cb.State()
+        for {
+            select {
+            case <-ctx.Done():
+                return
+            case <-ticker.C:
+                cur := w.cb.State()
+                if cur != last && cur == breaker.Open {
+                    obs.CircuitBreakerTrips.Inc()
+                }
+                last = cur
+                switch cur {
+                case breaker.Closed:
+                    obs.CircuitBreakerState.Set(0)
+                case breaker.HalfOpen:
+                    obs.CircuitBreakerState.Set(1)
+                case breaker.Open:
+                    obs.CircuitBreakerState.Set(2)
+                }
+            }
+        }
+    }()

110-114: Why are you stuffing the entire payload into the heartbeat value?

Unless some code reads that value (show me where), it’s wasted memory and bandwidth. Store a tiny token instead.

If unused, change to:

-        _ = w.rdb.Set(ctx, hbKey, payload, w.cfg.Worker.HeartbeatTTL).Err()
+        _ = w.rdb.Set(ctx, hbKey, "1", w.cfg.Worker.HeartbeatTTL).Err()

♻️ Duplicate comments (4)

.github/workflows/ci.yml (1)

24-49: Stop mutating go.mod in CI, pin govulncheck, and enable caching. This is basic hygiene.

You’re still doing the “latest” nonsense and go mod tidy in CI. Reproducibility matters.

       - name: Setup Go
         uses: actions/setup-go@v5
         with:
-          go-version: '1.25.x'
+          go-version: '1.25.x'
+          cache: true
-      - name: Tidy
-        run: go mod tidy
+      - name: Modules (download only; don’t mutate)
+        run: go mod download
@@
-      - name: Build
-        run: go build ./...
+      - name: Build
+        run: go build -mod=readonly ./...
@@
-      - name: Test
+      - name: Test
         env:
-          E2E_REDIS_ADDR: localhost:6379
-        run: go test ./... -race -count=1
+          E2E_REDIS_ADDR: 127.0.0.1:6379
+        run: go test -mod=readonly ./... -race -count=1
@@
-      - name: Govulncheck
-        run: |
-          go install golang.org/x/vuln/cmd/govulncheck@latest
-          $(go env GOPATH)/bin/govulncheck ./...
+      - name: Govulncheck
+        uses: golang/govulncheck-action@v1
+        with:
+          go-version-input: '1.25.x'
+          vulncheck-version: 'v1.1.3'

README.md (2)

58-59: Stop using bare URLs (MD034).

Wrap the PR URL.

-Release branch open for v0.4.0-alpha: see PR https://github.com/flyingrobots/go-redis-work-queue/pull/1
+Release branch open for v0.4.0-alpha: see PR <https://github.com/flyingrobots/go-redis-work-queue/pull/1>

60-64: Blank lines around lists (MD032).

Add spacing above and below the block.

-Promotion gates and confidence summary (details in docs/15_promotion_checklists.md):
+Promotion gates and confidence summary (details in docs/15_promotion_checklists.md):
+
 - Alpha → Beta: overall confidence ~0.85 (functional/observability/CI strong; perf and coverage improvements planned)
 - Beta → RC: overall confidence ~0.70 (needs controlled perf run, chaos tests, soak)
 - RC → GA: overall confidence ~0.70 (release flow ready; soak and rollback rehearsal pending)
+

docs/evidence/README.md (1)

1-6: We already told you: blank lines and proper heading. Do it.

Same MD022/MD032 mess. Convert “Reproduce locally” into a heading and add spacing. You ignored this once.

Apply:

 # Evidence for v0.4.0-alpha Promotion
-
-- CI run: see `ci_run.json` (contains URL to the successful workflow run)
-- Bench JSON: `bench.json` (admin bench with 1000 jobs at 500 rps)
-- Config used: `config.alpha.yaml`
-- Metrics snapshots: `metrics_before.txt`, `metrics_after.txt`
+
+- CI run: see `ci_run.json` (contains URL to the successful workflow run)
+- Bench JSON: `bench.json` (admin bench with 1000 jobs at 500 rps)
+- Config used: `config.alpha.yaml`
+- Metrics snapshots: `metrics_before.txt`, `metrics_after.txt`
 
-Reproduce locally
+## Reproduce locally
+
 1) Ensure Redis is running on `localhost:6379` (e.g., `docker run -p 6379:6379 redis:7-alpine`)
 2) Build binary: `make build`
 3) Start worker: `./bin/job-queue-system --role=worker --config=docs/evidence/config.alpha.yaml`
 4) In another terminal, run bench: `./bin/job-queue-system --role=admin --config=docs/evidence/config.alpha.yaml --admin-cmd=bench --bench-count=1000 --bench-rate=500 --bench-priority=low --bench-timeout=60s`
 5) Capture metrics: `curl -sS localhost:9191/metrics | head -n 200 > docs/evidence/metrics_after.txt`

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d45946d and 5ddfd98.

📒 Files selected for processing (21)

.github/CODEOWNERS (1 hunks)
.github/workflows/ci.yml (2 hunks)
README.md (2 hunks)
cmd/job-queue-system/main.go (3 hunks)
docs/did-not-agree/README.md (1 hunks)
docs/did-not-agree/go-toolchain-version.md (1 hunks)
docs/did-not-agree/metrics-scope.md (1 hunks)
docs/did-not-agree/priority-fetching-brpoplpush.md (1 hunks)
docs/did-not-agree/rate-limiter-fixed-window.md (1 hunks)
docs/did-not-agree/reaper-scan-vs-registry.md (1 hunks)
docs/evidence/README.md (1 hunks)
docs/evidence/run_bench.sh (1 hunks)
internal/admin/admin.go (2 hunks)
internal/breaker/breaker.go (4 hunks)
internal/breaker/breaker_load_test.go (1 hunks)
internal/breaker/breaker_test.go (2 hunks)
internal/config/config.go (4 hunks)
internal/obs/metrics.go (2 hunks)
internal/producer/producer.go (3 hunks)
internal/reaper/reaper.go (2 hunks)
internal/worker/worker.go (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (7)

internal/breaker/breaker_load_test.go (1)

internal/breaker/breaker.go (3)

State (10-10)

Closed (13-13)

Open (15-15)

internal/admin/admin.go (1)

internal/config/config.go (3)

Config (75-81)

Worker (31-44)

Producer (46-54)

internal/reaper/reaper.go (2)

internal/obs/logger.go (2)

Err (31-31)

String (28-28)

internal/obs/metrics.go (1)

ReaperRecovered (55-58)

internal/config/config.go (1)

internal/worker/worker.go (1)

Worker (20-26)

cmd/job-queue-system/main.go (2)

internal/admin/admin.go (3)

PurgeAll (219-263)

Bench (108-156)

StatsKeys (169-214)

internal/obs/logger.go (1)

Err (31-31)

internal/producer/producer.go (2)

internal/config/config.go (1)

Producer (46-54)

internal/queue/job.go (1)

NewJob (20-31)

internal/worker/worker.go (4)

internal/breaker/breaker.go (4)

New (36-38)

CircuitBreaker (24-34)

State (10-10)

Open (15-15)

internal/config/config.go (2)

Worker (31-44)

CircuitBreaker (56-61)

internal/obs/metrics.go (2)

JobProcessingDuration (38-42)

CircuitBreakerTrips (51-54)

internal/obs/logger.go (3)

Err (31-31)

String (28-28)

Int (29-29)

🪛 Shellcheck (0.10.0)

docs/evidence/run_bench.sh

[warning] 50-50: i appears unused. Verify use (or export if used externally).

(SC2034)

🪛 markdownlint-cli2 (0.17.2)

docs/did-not-agree/go-toolchain-version.md

4-4: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

7-7: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

11-11: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

14-14: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

17-17: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

docs/did-not-agree/reaper-scan-vs-registry.md

4-4: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

7-7: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

11-11: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

14-14: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

17-17: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

docs/did-not-agree/README.md

4-4: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

7-7: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

docs/did-not-agree/priority-fetching-brpoplpush.md

4-4: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

7-7: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

11-11: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

15-15: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

18-18: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

docs/did-not-agree/rate-limiter-fixed-window.md

4-4: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

7-7: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

11-11: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

14-14: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

18-18: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

docs/evidence/README.md

17-17: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

22-22: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

37-37: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

README.md

26-26: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)

26-26: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

27-27: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)

27-27: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

28-28: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)

28-28: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

29-29: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2

(MD005, list-indent)

29-29: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

30-30: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2

(MD005, list-indent)

30-30: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

58-58: Bare URL used

(MD034, no-bare-urls)

61-61: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

66-66: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

🔇 Additional comments (9)

internal/producer/producer.go (2)

56-56: Streaming enqueue during WalkDir: good.

Lower memory, simpler flow. Keep it.

1-1: Standardize license headers — add SPDX and a single, correct copyright holder

Repo-wide header check shows many files contain "Copyright 2025 James Ross" (e.g. internal/producer/producer.go, internal/worker/worker.go, internal/redisclient/client.go, test/e2e/e2e_test.go). This is sloppy — fix it.

Action: Replace top-of-file comments across the repo with an SPDX-License-Identifier matching the project license and a single canonical copyright owner (organization or "The go-redis-work-queue Authors"). Do not assume the license; confirm the repo’s license first.

Verify: run rg -n "SPDX-License-Identifier|Copyright" -C2 and update every file returned.

internal/config/config.go (1)

101-106: Heartbeat/processing patterns look reversed vs. reaper scan. Verify now.

You scan “jobqueue:worker:*:processing” but your default heartbeat is “jobqueue:processing:worker:%s”. If that’s intentional, fine—prove it’s consistent everywhere.
README.md (2)
52-54: Docker mapping must match the metrics port.

Map 9191 if that’s the default. This is basic.

Apply:
-- Run: docker run --rm -p 9090:9090 --env-file env.list job-queue-system:latest --role=all
+- Run: docker run --rm -p 9191:9191 --env-file env.list job-queue-system:latest --role=all
65-68: Add blank line before the "Evidence artifacts" list and update the sample-config link to config/config.example.yaml

MD032: insert a blank line between "Evidence artifacts (docs/evidence/):" and the list (README.md ≈ lines 65–68).

Update the "sample configuration" mention (≈ line 10) to link to config/config.example.yaml — that file already exists in the repo.
-Evidence artifacts (docs/evidence/):
+Evidence artifacts (docs/evidence/):
+
 - ci_run.json (CI URL), bench.json (throughput/latency), metrics_before/after.txt, config.alpha.yaml
-
-To reproduce evidence locally, see docs/evidence/README.md.
+
+To reproduce evidence locally, see docs/evidence/README.md.
Likely an incorrect or invalid review comment.
internal/breaker/breaker_test.go (1)
1-1: License header likely inconsistent with repo.

“Copyright 2025 James Ross” in a company/repo project? Use the project’s SPDX header and owner.

Apply:
-// Copyright 2025 James Ross
+// SPDX-License-Identifier: Apache-2.0
+// Copyright (c) 2025 The go-redis-work-queue Authors
internal/breaker/breaker_load_test.go (1)
1-1: License header: align with project.

Same problem as the other test file.
-// Copyright 2025 James Ross
+// SPDX-License-Identifier: Apache-2.0
+// Copyright (c) 2025 The go-redis-work-queue Authors
cmd/job-queue-system/main.go (2)

42-42: Flag help updated correctly. Keep help text and README in sync.

Looks fine. Ensure README and docs reflect the full set of admin commands.

179-183: stats-keys output shape is sane.

JSON-encoding the result is exactly what we need for evidence harnessing.

.github/CODEOWNERS

.github/workflows/ci.yml

coderabbitai · 2025-09-12T22:36:55Z

cmd/job-queue-system/main.go

+    case "purge-all":
+        if !yes { logger.Fatal("refusing to purge without --yes") }
+        n, err := admin.PurgeAll(ctx, cfg, rdb)
+        if err != nil { logger.Fatal("admin purge-all error", obs.Err(err)) }
+        fmt.Printf("purged %d keys\n", n)
    case "bench":


🧹 Nitpick (assertive)

Add a dry-run for purge-all and structured log.

Dangerous command. Offer a --dry-run to print the keys that would be removed, and log the deletion count via zap too.

- n, err := admin.PurgeAll(ctx, cfg, rdb) + n, err := admin.PurgeAll(ctx, cfg, rdb) if err != nil { logger.Fatal("admin purge-all error", obs.Err(err)) } - fmt.Printf("purged %d keys\n", n) + logger.Info("purge-all completed", obs.Int("deleted_keys", int(n))) + fmt.Printf("purged %d keys\n", n)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

case "purge-all":

if !yes { logger.Fatal("refusing to purge without --yes") }

n, err := admin.PurgeAll(ctx, cfg, rdb)

if err != nil { logger.Fatal("admin purge-all error", obs.Err(err)) }

fmt.Printf("purged %d keys\n", n)

case "bench":

case "purge-all":

if !yes { logger.Fatal("refusing to purge without --yes") }

n, err := admin.PurgeAll(ctx, cfg, rdb)

if err != nil { logger.Fatal("admin purge-all error", obs.Err(err)) }

logger.Info("purge-all completed", obs.Int("deleted_keys", int(n)))

fmt.Printf("purged %d keys\n", n)

case "bench":

🤖 Prompt for AI Agents

In cmd/job-queue-system/main.go around lines 169 to 174, the "purge-all" branch must support a --dry-run and emit a structured zap log of the deletion count: add a boolean dryRun flag check (e.g. from CLI flags) and when dryRun is true call admin.PurgeAll with a mode or new helper that returns the list of keys to be removed without deleting them, print those keys to stdout (one per line) and do not perform deletion; when dryRun is false keep the existing delete behavior but after successful deletion log the count using the zap logger with a structured field like logger.Info("purged keys", zap.Int("count", n)) instead of printing only to stdout or using unstructured logger.Fatal; propagate and handle errors the same way as before.

docs/did-not-agree/go-toolchain-version.md

docs/did-not-agree/metrics-scope.md

coderabbitai · 2025-09-12T22:36:59Z

internal/reaper/reaper.go

+                    obs.ReaperRecovered.Inc()
+                    r.log.Warn("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID))
+                }


🧹 Nitpick (assertive)

“Requeued” isn’t a warning. Use Info.

Don’t pollute warning signal with normal operation.

- r.log.Warn("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID)) + r.log.Info("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

obs.ReaperRecovered.Inc()

r.log.Warn("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID))

}

obs.ReaperRecovered.Inc()

r.log.Info("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID))

}

🤖 Prompt for AI Agents

In internal/reaper/reaper.go around lines 72 to 74, the log call for a successfully requeued abandoned job is using Warn which is incorrect for normal operation; change the logging level to Info (e.g., replace r.log.Warn(...) with r.log.Info(...)) while preserving the same message and obs fields so the event remains recorded without signalling a warning.

coderabbitai · 2025-09-12T22:36:59Z

internal/worker/worker.go

+    host, _ := os.Hostname()
+    pid := os.Getpid()
+    now := time.Now().UnixNano()
+    randSfx := fmt.Sprintf("%04x", time.Now().UnixNano()&0xffff)
+    base := fmt.Sprintf("%s-%d-%d-%s", host, pid, now, randSfx)
+    return &Worker{cfg: cfg, rdb: rdb, log: log, cb: cb, baseID: base}
 }


🧹 Nitpick (assertive)

Stop pretending time.Now() is randomness. Use crypto/rand for the suffix.

This ID doesn’t need to be cryptographic, but at least don’t generate two nanos and mask bits like it’s 1999.

+ "crypto/rand" + "encoding/hex" @@ - now := time.Now().UnixNano() - randSfx := fmt.Sprintf("%04x", time.Now().UnixNano()&0xffff) - base := fmt.Sprintf("%s-%d-%d-%s", host, pid, now, randSfx) + now := time.Now().UnixNano() + rb := make([]byte, 2) + _, _ = rand.Read(rb) + randSfx := hex.EncodeToString(rb) + base := fmt.Sprintf("%s-%d-%d-%s", host, pid, now, randSfx)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

host, _ := os.Hostname()

pid := os.Getpid()

now := time.Now().UnixNano()

randSfx := fmt.Sprintf("%04x", time.Now().UnixNano()&0xffff)

base := fmt.Sprintf("%s-%d-%d-%s", host, pid, now, randSfx)

return &Worker{cfg: cfg, rdb: rdb, log: log, cb: cb, baseID: base}

}

host, _ := os.Hostname()

pid := os.Getpid()

now := time.Now().UnixNano()

rb := make([]byte, 2)

_, _ = rand.Read(rb)

randSfx := hex.EncodeToString(rb)

base := fmt.Sprintf("%s-%d-%d-%s", host, pid, now, randSfx)

return &Worker{cfg: cfg, rdb: rdb, log: log, cb: cb, baseID: base}

}

🤖 Prompt for AI Agents

In internal/worker/worker.go around lines 30 to 36, the code is using time.Now().UnixNano() twice and masking bits as a pseudo-random suffix; replace that with crypto/rand to generate the suffix. Read 2 bytes from crypto/rand (or an appropriate small byte slice), handle any error (fallback to deterministic value or return error), format those bytes as a four‑hex-digit suffix, and use a single captured timestamp for the base ID construction so the ID is stable and the suffix is truly random.

coderabbitai · 2025-09-12T22:36:59Z

internal/worker/worker.go

    for ctx.Err() == nil {
        if !w.cb.Allow() {
-            time.Sleep(100 * time.Millisecond)
+            time.Sleep(w.cfg.Worker.BreakerPause)
            continue
        }


💡 Verification agent

❓ Verification inconclusive

Guard against zero BreakerPause. Don’t spin like a lunatic.

If BreakerPause is 0, you tight-loop under Open breaker.

- if !w.cb.Allow() { - time.Sleep(w.cfg.Worker.BreakerPause) + if !w.cb.Allow() { + pause := w.cfg.Worker.BreakerPause + if pause <= 0 { + pause = 100 * time.Millisecond + } + time.Sleep(pause) continue }

Stop the tight spin — guard against zero BreakerPause.

File: internal/worker/worker.go Lines: 80-84
If BreakerPause <= 0 time.Sleep(0) causes a tight-loop when the breaker is open — don't spin like a lunatic. Apply:

- if !w.cb.Allow() { - time.Sleep(w.cfg.Worker.BreakerPause) + if !w.cb.Allow() { + pause := w.cfg.Worker.BreakerPause + if pause <= 0 { + pause = 100 * time.Millisecond + } + time.Sleep(pause) continue }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for ctx.Err() == nil {

if !w.cb.Allow() {

time.Sleep(100 * time.Millisecond)

time.Sleep(w.cfg.Worker.BreakerPause)

continue

}

for ctx.Err() == nil {

if !w.cb.Allow() {

pause := w.cfg.Worker.BreakerPause

if pause <= 0 {

pause = 100 * time.Millisecond

}

time.Sleep(pause)

continue

}

🤖 Prompt for AI Agents

In internal/worker/worker.go around lines 80-84, the loop calls time.Sleep(w.cfg.Worker.BreakerPause) unguarded so if BreakerPause is zero or negative the loop spins tightly; change this to enforce a sane minimum pause (e.g., compute a sleepDur = w.cfg.Worker.BreakerPause and if sleepDur <= 0 set it to a small default like 1ms or 100ms), then wait using either time.Sleep(sleepDur) or a context-aware wait (select on time.After(sleepDur) and ctx.Done()) to avoid busy spinning and still respect cancellation.

coderabbitai · 2025-09-12T22:37:00Z

internal/worker/worker.go

+        // measure state transition around Record() to count trips
        start := time.Now()
        // process job
        ok := w.processJob(ctx, workerID, srcQueue, procList, hbKey, payload)
        obs.JobProcessingDuration.Observe(time.Since(start).Seconds())
+        prev := w.cb.State()
        w.cb.Record(ok)
+        curr := w.cb.State()
+        if prev != curr && curr == breaker.Open {
+            obs.CircuitBreakerTrips.Inc()
+        }
    }


🛠️ Refactor suggestion

Delete the per-iteration prev/curr logic. It’s wrong under concurrency and will overcount.

With the above centralized logic, this cruft must go.

- // measure state transition around Record() to count trips start := time.Now() // process job ok := w.processJob(ctx, workerID, srcQueue, procList, hbKey, payload) obs.JobProcessingDuration.Observe(time.Since(start).Seconds()) - prev := w.cb.State() w.cb.Record(ok) - curr := w.cb.State() - if prev != curr && curr == breaker.Open { - obs.CircuitBreakerTrips.Inc() - }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// measure state transition around Record() to count trips

start := time.Now()

// process job

ok := w.processJob(ctx, workerID, srcQueue, procList, hbKey, payload)

obs.JobProcessingDuration.Observe(time.Since(start).Seconds())

prev := w.cb.State()

w.cb.Record(ok)

curr := w.cb.State()

if prev != curr && curr == breaker.Open {

obs.CircuitBreakerTrips.Inc()

}

}

start := time.Now()

// process job

ok := w.processJob(ctx, workerID, srcQueue, procList, hbKey, payload)

obs.JobProcessingDuration.Observe(time.Since(start).Seconds())

w.cb.Record(ok)

}

🤖 Prompt for AI Agents

In internal/worker/worker.go around lines 114 to 125, remove the per-iteration prev/curr state capture and comparison: delete "prev := w.cb.State()", "curr := w.cb.State()", and the "if prev != curr && curr == breaker.Open { obs.CircuitBreakerTrips.Inc() }" block; keep the call to "w.cb.Record(ok)" and the surrounding timing/processing logic intact so state transitions are no longer counted per-iteration here (trip counting should be handled centrally elsewhere).

README.md

…e (bash)

coderabbitai

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

README.md (1)

66-74: Your metrics/health ports are wrong. You standardized to 9191; README still says 9090. Fix it everywhere, including Docker run.

This misleads ops and breaks copy/paste.

-### Metrics
-
-- Prometheus metrics exposed at http://localhost:9090/metrics by default
+### Metrics
+
+- Prometheus metrics exposed at http://localhost:9191/metrics by default
@@
-### Health and Readiness
-
-- Liveness: http://localhost:9090/healthz returns 200 when the process is up
-- Readiness: http://localhost:9090/readyz returns 200 only when Redis is reachable
+### Health and Readiness
+
+- Liveness: http://localhost:9191/healthz returns 200 when the process is up
+- Readiness: http://localhost:9191/readyz returns 200 only when Redis is reachable

And update Docker mapping:

-docker run --rm -p 9090:9090 --env-file env.list job-queue-system:latest --role=all
+docker run --rm -p 9191:9191 --env-file env.list job-queue-system:latest --role=all

docs/evidence/README.md (1)

61-66: Add a heading and the missing blank line before the list.

-Outputs are written under `docs/evidence/run_YYYYmmdd_HHMMSS/`:
-- worker.log, worker.pid
-- metrics_before.txt, metrics_after.txt
-- stats_before.json, stats_after.json
-- bench.json
+## Outputs
+
+Outputs are written under `docs/evidence/run_YYYYmmdd_HHMMSS/`:
+
+- worker.log, worker.pid
+- metrics_before.txt, metrics_after.txt
+- stats_before.json, stats_after.json
+- bench.json

♻️ Duplicate comments (3)

README.md (1)

102-102: Fix markdownlint MD034 (bare URL) — we already told you this.

Wrap the PR URL.

-Release branch open for v0.4.0-alpha: see PR https://github.com/flyingrobots/go-redis-work-queue/pull/1
+Release branch open for v0.4.0-alpha: see PR <https://github.com/flyingrobots/go-redis-work-queue/pull/1>

docs/evidence/README.md (2)

37-45: Do NOT tell users to run KEYS/DEL. Use purge-all or SCAN/UNLINK. Blocking Redis is amateur hour.

You literally ship an admin purge-all. Document it. And if someone must use redis-cli, use SCAN + UNLINK with -r. Apply:

-Important notes
-- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress.
-- To avoid stale backlog affecting evidence, clear test keys before running a bench:
-```bash
-redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter
-redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL
-```
-- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port.
+## Important notes
+
+- The admin `bench` command enqueues jobs directly (LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress.
+- To avoid stale backlog affecting evidence, prefer the built-in purge (destructive):
+
+```bash
+./bin/job-queue-system --role=admin --config=docs/evidence/config.alpha.yaml --admin-cmd=purge-all --yes
+```
+
+- If you must use redis-cli, avoid blocking the server:
+
+```bash
+# Known small keys:
+redis-cli UNLINK jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter
+# Large per-worker keys:
+redis-cli --scan --pattern 'jobqueue:worker:*:processing' | xargs -r -n 100 redis-cli UNLINK
+```
+
+- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port.

46-48: Make “Notes” a real heading and space the list.

-Notes
-- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there.
+## Notes
+
+- The simple latency reported in `bench.json` compares current time to each job's `creation_time` after completion sampling and is a coarse approximation. For precise latency distributions, prefer the Prometheus histogram `job_processing_duration_seconds` (use `_bucket` with `histogram_quantile()` to compute quantiles).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ddfd98 and 9ee3e5e.

📒 Files selected for processing (8)

README.md (2 hunks)
docs/08_deployment.md (1 hunks)
docs/12_performance_baseline.md (1 hunks)
docs/13_release_versioning.md (1 hunks)
docs/14_ops_runbook.md (2 hunks)
docs/evidence/README.md (1 hunks)
docs/testing-guide.md (1 hunks)
internal/worker/worker_breaker_integration_test.go (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

internal/worker/worker_breaker_integration_test.go (3)

internal/config/config.go (3)

Load (133-198)

Redis (13-24)

Backoff (26-29)

internal/queue/job.go (1)

NewJob (20-31)

internal/breaker/breaker.go (1)

State (10-10)

🪛 markdownlint-cli2 (0.17.2)

docs/testing-guide.md

5-5: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

6-6: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

10-10: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

11-11: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

15-15: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

16-16: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

21-21: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

26-26: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

27-27: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

32-32: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

34-34: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

36-36: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

40-40: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

41-41: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

46-46: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

48-48: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

50-50: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

54-54: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

55-55: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

59-59: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

63-63: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

64-64: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

69-69: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

71-71: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

73-73: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

77-77: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

78-78: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

82-82: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

86-86: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

87-87: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

94-94: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

96-96: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

98-98: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

102-102: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

106-106: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

108-108: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

110-110: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

112-112: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

114-114: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

116-116: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

118-118: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

122-122: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

123-123: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

130-130: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

135-135: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

136-136: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

docs/08_deployment.md

29-29: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

docs/14_ops_runbook.md

42-42: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

docs/13_release_versioning.md

22-22: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

23-23: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

26-26: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

docs/12_performance_baseline.md

20-20: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

20-20: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

21-21: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

23-23: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

24-24: Ordered list item prefix
Expected: 1; Actual: 2; Style: 1/2/3

(MD029, ol-prefix)

24-24: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

27-27: Ordered list item prefix
Expected: 2; Actual: 3; Style: 1/2/3

(MD029, ol-prefix)

27-27: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

28-28: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

30-30: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

31-31: Ordered list item prefix
Expected: 1; Actual: 4; Style: 1/1/1

(MD029, ol-prefix)

31-31: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

31-31: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

32-32: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

36-36: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

docs/evidence/README.md

10-10: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

11-11: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

15-15: Ordered list item prefix
Expected: 1; Actual: 2; Style: 1/1/1

(MD029, ol-prefix)

15-15: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

16-16: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

20-20: Ordered list item prefix
Expected: 1; Actual: 3; Style: 1/1/1

(MD029, ol-prefix)

20-20: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

21-21: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

25-25: Ordered list item prefix
Expected: 1; Actual: 4; Style: 1/1/1

(MD029, ol-prefix)

25-25: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

26-26: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

32-32: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1

(MD029, ol-prefix)

32-32: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

33-33: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

38-38: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

39-39: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

40-40: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

43-43: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

44-44: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

47-47: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

62-62: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

README.md

20-20: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

21-21: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

25-25: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

26-26: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

30-30: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

31-31: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

35-35: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

36-36: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

40-40: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

41-41: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

45-45: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

46-46: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

85-85: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

86-86: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

90-90: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

91-91: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

95-95: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

96-96: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

102-102: Bare URL used

(MD034, no-bare-urls)

105-105: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

110-110: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

🔇 Additional comments (3)

docs/testing-guide.md (1)

10-13: Verified — 'build' target present (Makefile: line 12). docs/testing-guide.md's make build instruction is accurate; no change needed.

internal/worker/worker_breaker_integration_test.go (2)

39-53: Guard against missing queue config to avoid pushing to the empty key.

If defaults ever change, this silently pushes to "". Add a precondition check (as shown above) or set cfg.Worker.Queues["low"] explicitly in the test.

4-14: Resolved — repo consistently uses go-redis v8; no v9 imports found.
go.mod lists github.com/go-redis/redis/v8 v8.11.5 and repository imports all reference github.com/go-redis/redis/v8; no github.com/redis/go-redis/v9 or other v9 references detected.

coderabbitai · 2025-09-12T22:48:38Z

docs/08_deployment.md

 Config overrides via env vars. Example:
-`WORKER_COUNT=32 REDIS_ADDR=redis:6379 ./job-queue-system --role=worker --config=config.yaml`
+```bash
+WORKER_COUNT=32 REDIS_ADDR=redis:6379 ./job-queue-system --role=worker --config=config.yaml
+```


🧹 Nitpick (assertive)

Fix the fenced block spacing and align paths.

Surround the code fence with blank lines (MD031) and make the binary/config paths consistent with the rest of the repo (use ./bin and config/config.yaml).

-Config overrides via env vars. Example: -```bash -WORKER_COUNT=32 REDIS_ADDR=redis:6379 ./job-queue-system --role=worker --config=config.yaml -``` +Config overrides via env vars. Example: + +```bash +WORKER_COUNT=32 REDIS_ADDR=redis:6379 ./bin/job-queue-system --role=worker --config=config/config.yaml +``` +

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

29-29: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🤖 Prompt for AI Agents

In docs/08_deployment.md around lines 28 to 31, the fenced code block lacks surrounding blank lines and uses inconsistent paths; add a blank line before and after the ```bash fence and update the example to use ./bin/job-queue-system and config/config.yaml so the binary and config paths match the repo convention.

coderabbitai · 2025-09-12T22:48:38Z

docs/12_performance_baseline.md

 ## Baseline Procedure
-1) Start Redis (e.g., docker run -p 6379:6379 redis:7-alpine)
+1) Start Redis
+```bash
+docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine
+```


🧹 Nitpick (assertive)

Call out key cleanup before baseline runs.

Benchmarks lie if stale keys exist. Explicitly instruct to purge Redis keys (or start a fresh container) before running.

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

19-19: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

20-20: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

20-20: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

21-21: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

23-23: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🤖 Prompt for AI Agents

In docs/12_performance_baseline.md around lines 19–23, the baseline procedure doesn't ensure a clean Redis state; add a clear pre-benchmark cleanup step instructing users to either flush all Redis keys (e.g., via redis-cli FLUSHALL) or start a fresh Redis container before running the benchmark, place this step immediately before "Start Redis" (or replace the current start step with a note recommending a fresh instance), and mention that stale keys will skew results.

coderabbitai · 2025-09-12T22:48:38Z

docs/12_performance_baseline.md

+1) Start Redis
+```bash
+docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine
+```
 2) Copy `config/config.example.yaml` to `config/config.yaml` and set:
   - `worker.count`: 16 on a 4 vCPU node (adjust as needed)
   - `redis.addr`: "localhost:6379"
-3) In one shell, run the worker:
-   - `./bin/job-queue-system --role=worker --config=config/config.yaml`
-4) In another shell, run the bench (enqueue and wait for completion):
-   - `./bin/job-queue-system --role=admin --admin-cmd=bench --bench-count=2000 --bench-rate=1000 --bench-priority=low --bench-timeout=60s`
+3) In one shell, run the worker
+```bash
+./bin/job-queue-system --role=worker --config=config/config.yaml
+```
+4) In another shell, run the bench (enqueue and wait for completion)
+```bash
+./bin/job-queue-system --role=admin --admin-cmd=bench \
+  --bench-count=2000 --bench-rate=1000 \
+  --bench-priority=low --bench-timeout=60s
+```


🛠️ Refactor suggestion

The list/fences formatting is borked (MD029/MD031/MD032). Normalize numbering and add blank lines.

Use “1.” auto-numbering and surround code fences with blank lines to keep markdownlint quiet.

-1) Start Redis -```bash -docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine -``` -2) Copy `config/config.example.yaml` to `config/config.yaml` and set: +1. Start Redis + +```bash +docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine +``` + +1. Copy `config/config.example.yaml` to `config/config.yaml` and set: - `worker.count`: 16 on a 4 vCPU node (adjust as needed) - `redis.addr`: "localhost:6379" -3) In one shell, run the worker -```bash -./bin/job-queue-system --role=worker --config=config/config.yaml -``` -4) In another shell, run the bench (enqueue and wait for completion) -```bash -./bin/job-queue-system --role=admin --admin-cmd=bench \ - --bench-count=2000 --bench-rate=1000 \ - --bench-priority=low --bench-timeout=60s -``` +1. In one shell, run the worker + +```bash +./bin/job-queue-system --role=worker --config=config/config.yaml +``` + +1. In another shell, run the bench (enqueue and wait for completion) + +```bash +./bin/job-queue-system --role=admin --admin-cmd=bench \ + --bench-count=2000 --bench-rate=1000 \ + --bench-priority=low --bench-timeout=60s +```

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

20-20: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

20-20: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

21-21: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

23-23: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

24-24: Ordered list item prefix
Expected: 1; Actual: 2; Style: 1/2/3

(MD029, ol-prefix)

24-24: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

27-27: Ordered list item prefix
Expected: 2; Actual: 3; Style: 1/2/3

(MD029, ol-prefix)

27-27: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

28-28: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

30-30: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

31-31: Ordered list item prefix
Expected: 1; Actual: 4; Style: 1/1/1

(MD029, ol-prefix)

31-31: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

31-31: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

32-32: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

36-36: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🤖 Prompt for AI Agents

In docs/12_performance_baseline.md around lines 20 to 36, the numbered list and fenced code blocks violate markdownlint rules (MD029/MD031/MD032); normalize all numbered items to use "1." for auto-numbering, ensure there is a blank line before and after each triple-backtick code fence, and keep the indented bullet lines (`- worker.count`, `- redis.addr`) directly under their list item; update the four steps to use "1." for each top-level item, add blank lines above and below each fenced block, and preserve the two sub-bullets under the copy-config step exactly as-is.

coderabbitai · 2025-09-12T22:48:38Z

docs/13_release_versioning.md

+4) Tag release
+```bash
+git tag vX.Y.Z[-pre]
+git push --tags
+```


🛠️ Refactor suggestion

Stop pushing all tags; use annotated tags and push a single tag. Also fix MD031/MD032.

Push only the intended tag, not your entire tag namespace. Add blank lines around the fence.

-4) Tag release -```bash -git tag vX.Y.Z[-pre] -git push --tags -``` +4) Tag release (annotated) + +```bash +git tag -a vX.Y.Z[-pre] -m "release vX.Y.Z[-pre]" +git push origin vX.Y.Z[-pre] +``` +

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

22-22: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

23-23: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

26-26: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🤖 Prompt for AI Agents

docs/13_release_versioning.md around lines 22-26: the current snippet pushes all tags and uses a non-annotated tag without proper blank lines around the fenced block; change to instruct creating an annotated tag with -a and -m and pushing only that single tag (git push origin <tag>), and add blank lines before and after the ```bash fenced block to satisfy MD031/MD032. Ensure the step label mentions "Tag release (annotated)" and update the commands accordingly.

docs/14_ops_runbook.md

coderabbitai · 2025-09-12T22:48:38Z

internal/worker/worker_breaker_integration_test.go

+    mr, _ := miniredis.Run()
+    defer mr.Close()
+
+    cfg, _ := config.Load("nonexistent.yaml")
+    cfg.Redis.Addr = mr.Addr()


⚠️ Potential issue

Stop ignoring errors. This is a test, not a wish.

Check errors from miniredis.Run and config.Load or fail fast.

- mr, _ := miniredis.Run() + mr, err := miniredis.Run() + if err != nil { + t.Fatalf("miniredis: %v", err) + } @@ - cfg, _ := config.Load("nonexistent.yaml") + cfg, err := config.Load("nonexistent.yaml") + if err != nil { + t.Fatalf("load config: %v", err) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

mr, _ := miniredis.Run()

defer mr.Close()

cfg, _ := config.Load("nonexistent.yaml")

cfg.Redis.Addr = mr.Addr()

mr, err := miniredis.Run()

if err != nil {

t.Fatalf("miniredis: %v", err)

}

defer mr.Close()

cfg, err := config.Load("nonexistent.yaml")

if err != nil {

t.Fatalf("load config: %v", err)

}

cfg.Redis.Addr = mr.Addr()

🤖 Prompt for AI Agents

In internal/worker/worker_breaker_integration_test.go around lines 19–23, the code ignores errors from miniredis.Run() and config.Load("nonexistent.yaml"); update the test to capture and check both error returns and fail fast on error (e.g., mr, err := miniredis.Run(); if err != nil { t.Fatalf("miniredis.Run failed: %v", err) } and cfg, err := config.Load(...); if err != nil { t.Fatalf("config.Load failed: %v", err) }), and ensure defer mr.Close() is only called after mr is successfully created.

coderabbitai · 2025-09-12T22:48:39Z

internal/worker/worker_breaker_integration_test.go

+    // Enqueue failing jobs (filename contains "fail")
+    for i := 0; i < 5; i++ {
+        j := queue.NewJob(
+            "id-fail-", 
+            "/tmp/fail-test.txt", // contains "fail" to force failure
+            1,
+            "low",
+            "",
+            "",
+        )
+        payload, _ := j.Marshal()
+        if err := rdb.LPush(context.Background(), cfg.Worker.Queues["low"], payload).Err(); err != nil {
+            t.Fatalf("lpush: %v", err)
+        }
+    }


🛠️ Refactor suggestion

⚠️ Potential issue

Brittle payload and sloppy error handling. Make IDs unique, assert Marshal error, and validate the queue key.

Also, relying on a filename containing “fail” is a hack. If possible, inject a failing processor; if not, keep it but make the rest solid.

- for i := 0; i < 5; i++ { - j := queue.NewJob( - "id-fail-", + key := cfg.Worker.Queues["low"] + if key == "" { + t.Fatalf("missing low-priority queue key in config") + } + for i := 0; i < 5; i++ { + j := queue.NewJob( + fmt.Sprintf("id-fail-%d", i), "/tmp/fail-test.txt", // contains "fail" to force failure 1, "low", "", "", ) - payload, _ := j.Marshal() - if err := rdb.LPush(context.Background(), cfg.Worker.Queues["low"], payload).Err(); err != nil { + payload, err := j.Marshal() + if err != nil { + t.Fatalf("marshal: %v", err) + } + if err := rdb.LPush(context.Background(), key, payload).Err(); err != nil { t.Fatalf("lpush: %v", err) } }

Add missing import:

import "fmt"

🤖 Prompt for AI Agents

In internal/worker/worker_breaker_integration_test.go around lines 39 to 53, the test enqueues brittle payloads and mishandles errors: make job IDs unique (e.g., include the loop index or GUID), check and handle the error returned by j.Marshal instead of ignoring it, validate that cfg.Worker.Queues["low"] exists and is non-empty before using it (fail the test with a clear message if not), improve LPush error context (include the queue key in the t.Fatalf message), and if feasible replace the filename-based failure trigger with an injected failing processor; if injection isn’t possible keep the filename hack but ensure the rest is robust; also add the required import "fmt" as requested.

coderabbitai · 2025-09-12T22:48:39Z

internal/worker/worker_breaker_integration_test.go

+    log, _ := zap.NewDevelopment()
+    w := New(cfg, rdb, log)
+    ctx, cancel := context.WithCancel(context.Background())
+    done := make(chan struct{})
+    go func() { defer close(done); _ = w.Run(ctx) }()
+


⚠️ Potential issue

Check logger construction and don’t pretend everything succeeds.

[zap.NewDevelopment] returns an error. Handle it.

- log, _ := zap.NewDevelopment() + log, err := zap.NewDevelopment() + if err != nil { + t.Fatalf("logger: %v", err) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

log, _ := zap.NewDevelopment()

w := New(cfg, rdb, log)

ctx, cancel := context.WithCancel(context.Background())

done := make(chan struct{})

go func() { defer close(done); _ = w.Run(ctx) }()

log, err := zap.NewDevelopment()

if err != nil {

t.Fatalf("logger: %v", err)

}

w := New(cfg, rdb, log)

ctx, cancel := context.WithCancel(context.Background())

done := make(chan struct{})

go func() { defer close(done); _ = w.Run(ctx) }()

🤖 Prompt for AI Agents

In internal/worker/worker_breaker_integration_test.go around lines 55 to 60, the call to zap.NewDevelopment ignores its returned error; update the test to capture both returned values (logger, err) and handle the error instead of assuming success — for example, fail the test immediately with t.Fatalf or require.NoError(t, err) if creation fails, or alternatively fall back to a safe logger (e.g., zap.NewNop()) when err != nil; then use the verified logger variable when constructing w.

coderabbitai · 2025-09-12T22:48:39Z

internal/worker/worker_breaker_integration_test.go

+    // Wait up to 2s for breaker to open
+    deadline := time.Now().Add(2 * time.Second)
+    opened := false
+    for time.Now().Before(deadline) {
+        if w.cb.State() == 2 { // Open
+            opened = true
+            break
+        }
+        time.Sleep(5 * time.Millisecond)
+    }


🧹 Nitpick (assertive)

🛠️ Refactor suggestion

Magic number “2” for breaker state? No. Use a constant.

Depend on a public constant from the breaker package instead of hard-coding 2.

- if w.cb.State() == 2 { // Open + if w.cb.State() == breaker.StateOpen { // Open

If breaker.StateOpen doesn’t exist, expose it and I’ll wire the change.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Wait up to 2s for breaker to open

deadline := time.Now().Add(2 * time.Second)

opened := false

for time.Now().Before(deadline) {

if w.cb.State() == 2 { // Open

opened = true

break

}

time.Sleep(5 * time.Millisecond)

}

// Wait up to 2s for breaker to open

deadline := time.Now().Add(2 * time.Second)

opened := false

for time.Now().Before(deadline) {

if w.cb.State() == breaker.StateOpen { // Open

opened = true

break

}

time.Sleep(5 * time.Millisecond)

}

🤖 Prompt for AI Agents

In internal/worker/worker_breaker_integration_test.go around lines 61 to 70, replace the hard-coded numeric state check (== 2) with the public breaker package constant (e.g., breaker.StateOpen); update the import if necessary to reference the breaker package; if breaker.StateOpen does not yet exist, add and export a named constant in the breaker package (e.g., const StateOpen = 2) and use that constant in this test so the state comparison is clear and not a magic number.

coderabbitai · 2025-09-12T22:48:39Z

internal/worker/worker_breaker_integration_test.go

+    // While breaker is Open (cooldown 100ms), queue length should not decrease
+    n1, _ := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result()
+    time.Sleep(50 * time.Millisecond) // less than cooldown
+    n2, _ := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result()
+    if n2 < n1 {
+        cancel(); <-done
+        t.Fatalf("queue drained during breaker open: before=%d after=%d", n1, n2)
+    }


⚠️ Potential issue

You’re still ignoring Redis errors. Stop it.

LLen can fail; assert it.

- n1, _ := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result() + n1, err := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result() + if err != nil { + cancel(); <-done + t.Fatalf("llen before: %v", err) + } @@ - n2, _ := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result() + n2, err := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result() + if err != nil { + cancel(); <-done + t.Fatalf("llen after: %v", err) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// While breaker is Open (cooldown 100ms), queue length should not decrease

n1, _ := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result()

time.Sleep(50 * time.Millisecond) // less than cooldown

n2, _ := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result()

if n2 < n1 {

cancel(); <-done

t.Fatalf("queue drained during breaker open: before=%d after=%d", n1, n2)

}

// While breaker is Open (cooldown 100ms), queue length should not decrease

n1, err := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result()

if err != nil {

cancel(); <-done

t.Fatalf("llen before: %v", err)

}

time.Sleep(50 * time.Millisecond) // less than cooldown

n2, err := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result()

if err != nil {

cancel(); <-done

t.Fatalf("llen after: %v", err)

}

if n2 < n1 {

cancel(); <-done

t.Fatalf("queue drained during breaker open: before=%d after=%d", n1, n2)

}

🤖 Prompt for AI Agents

In internal/worker/worker_breaker_integration_test.go around lines 76 to 83, the test calls rdb.LLen(...).Result() and ignores the returned errors; change the code to capture and assert the errors (e.g., n1, err := rdb.LLen(...).Result(); if err != nil { cancel(); <-done; t.Fatalf("LLen failed before sleep: %v", err) }) for both n1 and n2 so Redis failures are surfaced and the test fails fast with the error details.

flyingrobots · 2025-09-12T22:59:19Z

Batch 1: Addressed CodeRabbit nits (workflows)

I pulled all review comments and started knocking them down. First batch focused on CI/workflows suggestions:

changelog.yml

checkout now fetches full history and tags, and checks out the default branch explicitly:

- uses: actions/checkout@v4
  with:
    fetch-depth: 0
    fetch-tags: true
    ref: ${{ github.event.repository.default_branch }}

setup-go added before go install git-chglog:

- name: Setup Go
  uses: actions/setup-go@v5
  with:
    go-version: '1.25.x'

push now targets the default branch explicitly:

git push origin HEAD:${{ github.event.repository.default_branch }} || echo "no push"

goreleaser.yml

checkout now fetches full history and tags:

- uses: actions/checkout@v4
  with:
    fetch-depth: 0
    fetch-tags: true

I will continue in batches (config/docs/nits next). If there are any high‑priority comments you want me to tackle first, let me know the IDs or paste them and I’ll jump them to the front.

…utofix hook; make hooks target to enable

…al/CI usage)

…029/MD033 and keep fenced blocks; workflows unchanged semantically

coderabbitai

Actionable comments posted: 51

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (28)

docs/06_technical_spec.md (3)
19-64: Config schema is incomplete: no TLS/ACL, no Cluster/Sentinel knobs, no heartbeat refresh, no defaults. Fix it.

You’re shipping a production-adjacent system and forgot basic knobs. Add TLS and HA options, make heartbeat refresh explicit, and document sane defaults.

Apply:
 redis:
   addr: string
   username: string
   password: string
   db: int
+  tls:
+    enabled: bool
+    ca_file: string
+    cert_file: string
+    key_file: string
+    insecure_skip_verify: bool
+  sentinel:
+    enabled: bool
+    master_name: string
+    addrs: [string]
+  cluster:
+    enabled: bool
+    addrs: [string]
+    route_by_latency: bool
   pool_size_multiplier: int
   min_idle_conns: int
   dial_timeout: duration
   read_timeout: duration
   write_timeout: duration
   max_retries: int
 worker:
   count: int
   heartbeat_ttl: duration
+  heartbeat_refresh_interval: duration
   max_retries: int
-  backoff: { base: duration, max: duration }
+  backoff: { base: duration, max: duration, jitter_pct: float }
   priorities: [string]
   queues: { <priority>: string }
   processing_list_pattern: string # printf format, %s workerID
   heartbeat_key_pattern: string   # printf format, %s workerID
   completed_list: string
   dead_letter_list: string
   brpoplpush_timeout: duration
 producer:
   scan_dir: string
   include_globs: [string]
   exclude_globs: [string]
   default_priority: string
   high_priority_exts: [string]
   rate_limit_per_sec: int
   rate_limit_key: string
 circuit_breaker:
   failure_threshold: float
   window: duration
   cooldown_period: duration
   min_samples: int
 observability:
-  metrics_port: int
+  metrics_port: int # default 9191
   log_level: string
   queue_sample_interval: duration
   tracing: { enabled: bool, endpoint: string }
+env_override_prefix: string # e.g., JOBQ_; mapping uses UPPER_SNAKE for nested keys, e.g., JOBQ_WORKER_HEARTBEAT_TTL
131-134: Logging/tracing: forbid payload/PII, specify OTLP, and define sampling.

You don’t want secrets in logs, and tracing needs protocol clarity.

Apply:
 - Logs are JSON with keys: `level`, `ts`, `msg`, `trace_id`, `span_id`, `job_id`, `queue`, `worker_id`.
+- Logs are JSON with keys: `level`, `ts`, `msg`, `trace_id`, `span_id`, `job_id`, `queue`, `worker_id`. Do NOT log payload bodies or secrets.
 - Tracing: create a span per job processing; propagate `trace_id/span_id` if present; otherwise create new.
+Tracing exporter: OTLP (gRPC) by default; `observability.tracing.endpoint` accepts OTLP endpoint (e.g., http(s):// or grpc target). Configure sampling rate (default 1.0) and baggage propagation.
81-92: Job schema nit: types, not example literals.

filesize: 0 is misleading. Also make priority enum align with worker.priorities.

Apply:
-  "filesize": 0,
-  "priority": "high|low",
+  "filesize": "<int-bytes>",
+  "priority": "<one-of worker.priorities>",
docs/05_architecture.md (4)
10-17: TOC is missing Security. Add it. Yesterday.

You ship release infra and public images but have zero “Security Considerations” in the ToC. Unacceptable for production-ish docs.

Apply:
 - [Performance Targets](#performance-targets)
+ - [Security Considerations](#security-considerations)
65-72: Non-atomic “Reap” flow is wrong and will lose messages. Fix the flow.

“RPOP processing -> LPUSH orig queue” is a two-step. Crash in between and you’ve dropped work. Use LMOVE or RPOPLPUSH.
-6) Reap: if heartbeat missing, RPOP processing list items back to originating priority queue.
+6) Reap: if heartbeat missing, atomically move items back to the originating priority queue using LMOVE {processing} {queue} RIGHT LEFT (Redis ≥6.2) or RPOPLPUSH for older versions.
Also specify how you discover the “originating” queue (embedded metadata vs key mapping). If it’s embedded, state the payload field name.

73-82: Add “Security Considerations” section with bare minimum hygiene.

You’re pushing images and Helm charts; document auth/TLS or you’ll train users to deploy Redis wide open.
@@
 ## Technology Stack
@@
 - Container: Distroless base image
+
+## Security Considerations
+
+- Redis: Require AUTH and TLS; document `rediss://` usage and certificate mounting in Helm values.
+- Secrets: Never pass credentials via flags; use env/secret mounts (Kubernetes Secrets), and mark them as `sensitive` in GoReleaser if templated.
+- Network: Isolate Redis with network policies/security groups; disallow Internet exposure.
+- RBAC: Limit the service account; read-only for metrics endpoints.
+- Images: Use distroless nonroot and drop capabilities; enable rootless where possible.
+- Supply chain: Pin base images by digest; enable `govulncheck` in CI (already present) and sign images (cosign) as a follow-up.
Also applies to: 10-17

90-94: Cite evidence or remove these performance bullets now.

docs/evidence/bench.json records throughput_jobs_per_sec=16.096426 ⇒ ≈965.79 jobs/min (admin bench: COUNT=1000, RATE=500, duration=62.1256s). Do not claim "≥1k jobs/min per 4 vCPU" — no hardware/vCPU/RAM/cloud SKU recorded. (see docs/evidence/bench.json)

docs/evidence/config.used.yaml shows worker.count=8, redis: localhost:6379, heartbeat_ttl: 30s, rate_limit_per_sec: 1000 — Redis version/topology, network, payload-size distribution, and machine specs are missing. (see docs/evidence/config.used.yaml)

docs/evidence/metrics_after.txt shows jobs_completed_total=480 and job_processing_duration_seconds_count=480 — this conflicts with bench.json and provides insufficient samples for a reliable p95 claim. Do not assert "p95 < 2s" without the full Prometheus histogram export.

Recovery claim "< 10s" directly contradicts heartbeat_ttl=30s in config.used.yaml. Either state the exact TTL + reaper/scan interval that produce <10s, or correct the recovery number.

Action (one-line): replace the three bullets in docs/05_architecture.md with precise, evidence-backed statements that reference docs/evidence/{bench.json,config.used.yaml,metrics_before.txt,metrics_after.txt,run_bench.sh} and explicitly list hardware (CPU/RAM/cloud SKU), Redis version/topology, network conditions, payload size distribution, bench params (COUNT,RATE), and heartbeat/reaper config — or remove the unsupported claims.
docs/04_sprint_plans.md (3)
39-47: Admin stories are stale vs code.

You added purge-all and stats-keys elsewhere. Mirror them here or you’ll drift instantly.
-1) As an operator, I can run admin commands to inspect and manage queues. (8)
-   - Acceptance: `stats`, `peek`, `purge-dlq` subcommands; safe operations with confirmation; tests.
+1) As an operator, I can run admin commands to inspect and manage queues. (8)
+   - Acceptance: `stats`, `stats-keys`, `peek`, `purge-dlq`, `purge-all` (with `--yes`) subcommands; safe operations with confirmation; tests.
23-26: Config knob already exists — add a test that proves it

Reality: interval is configurable via Observability.QueueSampleInterval (mapstructure key "queue_sample_interval") and defaults to 2s (internal/config/config.go; config/config.example.yaml; docs/evidence/config.used.yaml). StartQueueLengthUpdater already uses cfg.Observability.QueueSampleInterval (internal/obs/queue_length.go).

Action (required): add a test (e.g. internal/obs/queue_length_test.go or extend an existing miniredis test) that sets Observability.QueueSampleInterval to a non-default value and verifies StartQueueLengthUpdater respects that interval — do not merge without this.

20-31: Don't promise what you didn't implement — either remove “metrics server” from readiness or implement gating.

readyCheck in cmd/job-queue-system/main.go (≈line 86) only pings Redis; StartHTTPServer (internal/obs/http.go) registers /metrics and calls the provided readiness callback — there is no check that a separate metrics server is running. Update docs/04_sprint_plans.md (lines 20–31) to match reality or modify the readiness check to also verify the metrics endpoint/listener (change cmd/job-queue-system/main.go and/or internal/obs/http.go accordingly).
docs/10_risk_register.md (1)
15-26: Assign owners and triggers or this matrix is dead weight.

Add “Owner” and “Trigger/Threshold” columns; otherwise nobody moves when a risk flips.
-| # | Risk | Prob. | Impact (1-5) | Mitigation | Contingency |
-|---|------|-------|--------------|------------|-------------|
+| # | Risk | Prob. | Impact (1-5) | Owner | Trigger/Threshold | Mitigation | Contingency |
+|---|------|-------|--------------|-------|-------------------|------------|-------------|
docs/14_ops_runbook.md (1)
91-99: Rollback steps miss the obvious: stop producers first and pin digests.

Prevent refilling queues during rollback; use digests, not tags.
  - Rollback:
-  1) Select previous good tag (e.g., `vX.Y.(Z-1)`).
-  2) Deploy image `ghcr.io/<owner>/<repo>:vX.Y.(Z-1)`.
+  0) Temporarily scale producers to zero to stop inflow.
+  1) Select previous good release (e.g., `vX.Y.(Z-1)`).
+  2) Deploy pinned image `ghcr.io/<owner>/<repo>@sha256:<digest>`.
   3) Verify `/healthz` and `/readyz` return 200.
docs/09_requirements.md (1)
19-27: Scope creep and mismatch. Keep requirements aligned with the codebase.

“Producer scans directory” is oddly specific for a generic queue. Make it an example, not a requirement.

Admin tooling list is incomplete (missing stats-keys, purge-all).

Breaker semantics: call out single-probe HalfOpen if that’s the design.
-- Producer scans directory, prioritizes by extension, and enqueues JSON jobs. (complexity: ~80 LoC; O(1) `LPUSH` per job)
+- Producer API to enqueue jobs; file-scanning producer provided as an example tool (not core). (O(1) `LPUSH` per job)
@@
-- Circuit breaker with Closed/Open/HalfOpen states; cooldown. (~120 LoC)
+- Circuit breaker with Closed/Open/HalfOpen (single-probe) states; cooldown. (~120 LoC)
@@
-- Admin tooling: `stats`, `peek`, `purge-dlq`. (~120 LoC)
+- Admin tooling: `stats`, `stats-keys`, `peek`, `purge-dlq`, `purge-all` (requires `--yes`). (~150 LoC)
docs/07_test_plan.md (2)
35-38: Define how you’ll measure the “±10%” limiter claim. Otherwise it’s hand-wavy.

Spell out PromQL so humans and robots agree on pass/fail.
 4) Producer rate limit: per-second cap respected within ±10% under burst.
+   Measurement:
+   - PromQL: sum(rate(jobs_produced_total[1m])) by (producer) compared to configured rps.
+   - Pass if 0.9*rps ≤ 1m-rate ≤ 1.1*rps for ≥ 55s of any 60s window during test.
48-51: Stop calling lint “optional”. Make it mandatory and scan images, too.

Ship with guardrails or regret it later.
 - `govulncheck` in CI; fail on critical CVEs.
-- Static checks: `go vet` and `golangci-lint` (optional) for code issues.
+- Static checks: `go vet` and `golangci-lint` (required) for code issues; CI fails on new warnings.
 - Secrets: ensure no secrets in logs; validate config does not dump secrets.
+- Supply-chain: image scan (e.g., Trivy/Grype) on built images; fail on Critical/High.
Want me to add a golangci-lint + Trivy job to CI right now?
docs/03_milestones.md (2)
21-23: “TraceID/SpanID used for spans and logs” — vague. Which tracer, which fields, where?

Specify the instrumentation stack, env flags, and acceptance check.
-   - Deliverables: queue length gauges updated every 2s; TraceID/SpanID used for spans and logs
+   - Deliverables: queue length gauges updated every 2s; OpenTelemetry tracing with OTLP exporter; logs include trace_id/span_id
+   - Acceptance: OTEL_EXPORTER_OTLP_ENDPOINT set; traces visible in backend; logs show trace_id/span_id fields under load
17-37: Who owns each milestone? Without owners and exit tests, dates are fantasy.

Add DRI(s) and a concrete “exit test” per milestone.
-1. Health + Validation (due 2025-09-20)
+1. Health + Validation (due 2025-09-20, Owner: @flyingrobots)
    - Deliverables: `/healthz` and `/readyz` endpoints; strict config validation with errors; docs updated
-   - Success: endpoints return 200; malformed configs fail with descriptive messages
+   - Success: endpoints return 200; malformed configs fail with descriptive messages
+   - Exit test: `curl -sf localhost:9191/readyz`; CI test asserts non-zero exit on bad config
docs/02_release_plan.md (2)
16-22: Dates without cutoffs are worthless — add cut times and tag names.

Define exact cut times (UTC) and the tag format per milestone. Otherwise people will miss the window.
 ## Release Cadence
-- v0.4.0-alpha — 2025-09-26
-- v0.7.0-beta — 2025-10-10
-- v0.9.0-rc — 2025-10-24
-- v1.0.0 — 2025-11-07
+- v0.4.0-alpha — 2025-09-26 18:00 UTC (tag: v0.4.0-alpha)
+- v0.7.0-beta  — 2025-10-10 18:00 UTC (tag: v0.7.0-beta)
+- v0.9.0-rc    — 2025-10-24 18:00 UTC (tag: v0.9.0-rc)
+- v1.0.0       — 2025-11-07 18:00 UTC (tag: v1.0.0)
62-69: Acceptance criteria need measurable checks, not vibes.

Add explicit queries and pass/fail checks. Tie chaos to “no lost jobs” with evidence.
 ## Acceptance Criteria
 - All acceptance tests for the release pass in CI (unit, integration, race).
-- Metrics present: job counters, histograms, queue length gauges, breaker state.
+- Metrics present: job counters, histograms, queue length gauges, breaker state. Verified by scraping /metrics and asserting the following series exist: jobs_completed_total, jobs_failed_total, job_processing_duration_seconds_bucket, queue_length, circuit_breaker_state.
 - Health endpoints return HTTP 200 and surface readiness correctly.
-- On chaos tests: automatic recovery without manual intervention; no lost jobs.
+- On chaos tests: automatic recovery without manual intervention; no lost jobs (jobs_produced_total == jobs_completed_total + jobs_failed_total + dlq_length).
 - Documentation updated (README, PRD, ops guide) for each release.
+ - govulncheck reports 0 vulnerabilities at default severity threshold.
+ - Release artifacts: SBOM and cosign signatures published alongside images.
docs/11_project_charter.md (2)
16-21: Define “small files” precisely.

Quantify payload size so perf targets are testable.
 - Performance: 1k jobs/min per 4 vCPU with p95 < 2s for small files.
+ - Performance: 1k jobs/min per 4 vCPU with p95 < 2s for “small files” (payload ≤ 1 MiB).
34-41: Tie metrics to actual series/queries.

Name the metrics and how you’ll measure coverage; otherwise this is just marketing.
 ## Success Metrics
-- Availability: Ready > 99.9% over rolling 30 days.
+- Availability: Ready > 99.9% over rolling 30 days (measured via /readyz SLO burn).
 - Reliability: Zero lost jobs; DLQ rate < 0.5% of consumed jobs.
-- Performance: p95 processing duration < 2s for small files.
+- Performance: p95 processing duration < 2s for small files (PromQL: histogram_quantile(0.95, sum by (le) (rate(job_processing_duration_seconds_bucket[5m])))).
 - Observability: 100% of operations emit metrics; tracing coverage ≥ 80% of job processing.
-- Quality: Unit coverage ≥ 80% for core packages; CI green on main.
+- Quality: Unit coverage ≥ 80% for core packages; CI green on main (race-enabled).
docs/08_deployment.md (3)
18-27: Document full-history checkout and Go cache.

Your CI will randomly break changelog/release without full history; also cache your Go build to not waste CI minutes.
 Stages (GitHub Actions):
 1) Lint & Vet: run `go vet` (optionally `golangci-lint`)
 2) Build: compile all packages
 3) Unit + Race: `go test -race ./...`
 4) Integration/E2E: Redis service container; full flow tests
 5) Security: `govulncheck`
-6) Package: Docker image build (main branch tags and releases)
-7) Release: tag + changelog on releases
+6) Package: Docker image build (main branch tags and releases) with Go module/build cache
+7) Release: tag + changelog on releases (git fetch --tags and full history)
39-47: Rollback doc must mention DLQ snapshot and purge-all safety.

Don’t encourage destructive actions without backup and explicit confirmation.
 5) If needed, drain DLQ using `purge-dlq` (with backup/export first).
 6) Document incident and root cause.
+7) As a last resort, `purge-all --yes` (only after snapshotting all queues/keys).
49-55: Add alerts for reaper anomalies and tag immutability.

You miss obvious outages without these.
 - Alerts (suggested thresholds):
   - `circuit_breaker_state > 0` for > 60s → WARN
   - `rate(jobs_failed_total[5m]) > 0.1 * rate(jobs_consumed_total[5m])` → CRITICAL
   - `queue_length{queue="jobqueue:dead_letter"} > 100` → WARN
   - `/readyz` non-200 for > 30s → CRITICAL
+  - `increase(reaper_recovered_total[5m]) < 1 and sum(queue_length) > 0` → WARN (reaper stuck)
+  - Deployments not based on immutable tag (image:latest) detected → CRITICAL
docs/12_performance_baseline.md (1)
58-61: Record environment details or your numbers are meaningless.

CPU model, RAM, Redis version, and local/remote deployment must be stated.
 - On a 4 vCPU node, `bench-count=2000`, `bench-rate=1000` should achieve ≥1k jobs/min throughput, with p95 latency < 2s for small files (<1MB).
-- If results fall short, see tuning levers and ensure host/Redis are not CPU or I/O bound.
+- If results fall short, see tuning levers and ensure host/Redis are not CPU or I/O bound. Always record: CPU model, RAM, Redis version, network locality.
docs/00_assessment.md (3)
19-23: Name the “all” mode consistently.

Pick “all-in-one” or “all” and use it everywhere (readme, flags, Helm).
-- Modes: `--role=producer|worker|all` with YAML config and env overrides.
+- Modes: `--role=producer|worker|all-in-one` with YAML config and env overrides.
36-46: This is outdated — admin CLI exists now. Update the gaps.

You added purge-all and stats-keys. Stop claiming the CLI doesn’t exist; list what’s still missing.
 - Operational tooling: admin CLI (peek queue, purge DLQ, show stats) not yet implemented.
+ - Operational tooling: admin CLI exists (`purge-dlq`, `purge-all --yes`, `stats-keys`). Missing: `peek`, fine-grained purge, DLQ export tooling.
56-61: Priorities should reflect reality.

Health/readiness and queue gauges are already slated for alpha; remove duplication or tag with milestone owners.
 1. Add health/readiness probe and queue length updater.
 2. Use TraceID/SpanID to start spans and enrich logs.
 3. Strengthen rate limiter timing and jitter; document guarantees.
 4. Add config validation and error reporting.
-5. Write e2e tests with real Redis (service container) and performance benchmarks.
+5. Write e2e tests with real Redis (service container) and performance benchmarks (owner: @flyingrobots, target: v0.7.0-beta).

♻️ Duplicate comments (15)

docs/15_promotion_checklists.md (2)

36-45: Same “Evidence links as above” hand-wave. Link the actual artifacts here too.

Don’t make reviewers hunt.

15-20: Replace placeholder “Evidence links” with actual files. Placeholders are useless.

You already added evidence artifacts—link them.
-  - CI run URL: …
-  - Bench JSON: …
-  - Metrics snapshot(s): …
-  - Issues list: …
+  - CI run JSON: docs/evidence/ci_run.json
+  - Bench JSON: docs/evidence/bench.json
+  - Metrics snapshots: docs/evidence/metrics_before.txt, docs/evidence/metrics_after.txt
+  - Config used: docs/evidence/config.used.yaml
+  - Repro steps: docs/evidence/README.md

.github/workflows/goreleaser.yml (2)

9-12: Permissions are broad and unsigned artifacts are table stakes now. Tighten and attest.

Least privilege + provenance.
 permissions:
-  contents: write
-  packages: write
+  contents: write
+  packages: write
+  id-token: write   # for keyless signing/attestations
Then enable provenance/signing in GoReleaser (cosign/SLSA) or add a post-step to sign images with OIDC. I can wire this in if you want.

28-33: You promised multi-arch images but didn’t set up QEMU/Buildx. That will blow up.

Enable emulation and a buildx builder before release.
       - name: Login to GHCR
         uses: docker/login-action@v3
         with:
           registry: ghcr.io
           username: ${{ github.actor }}
           password: ${{ secrets.GITHUB_TOKEN }}
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@v3
+      - name: Set up Buildx
+        uses: docker/setup-buildx-action@v3

docs/did-not-agree/rate-limiter-fixed-window.md (1)

5-5: You forgot the only safe EXPIRE pattern. This omission misleads implementers.

Set expiry only when INCR creates the key, otherwise you extend the window under load.

 Decision
 - Keep fixed-window limiter (INCR + 1s EXPIRE + TTL-based sleep with jitter) for v0.4.0-alpha.
+
+Implementation Note
+- Set EXPIRE/PEXPIRE only when INCR returns 1 (first hit in the window). Use PTTL to sleep.
+- Never update expiry on subsequent increments; otherwise you slide the window by accident.
+- Consider a Lua wrapper (atomically): INCR; if count==1 then PEXPIRE key 1000; return count.

docs/did-not-agree/reaper-scan-vs-registry.md (1)

1-24: Fix the headings and stop hand-waving SCAN budgets. Now.

This still violates MD032 and reads like a napkin. Use proper headings, correct complexity, and document COUNT/cadence.

-# Reaper: SCAN-Based Recovery vs Worker Registry
-
-Decision
-
-- Keep SCAN-based discovery of processing lists for v0.4.0-alpha, instead of maintaining a registry of active workers or relying on keyspace notifications.
-
-Rationale
-
-- Simplicity and robustness: SCAN requires no extra moving parts or configuration and tolerates sudden worker exits.
-- Predictable load: bounded SCAN page sizes and periodic cadence maintain manageable overhead.
-
-Tradeoffs
-
-- SCAN is O(keys); at very large worker fleets, registry/notifications can reduce overhead.
-
-Revisit Criteria
-
-- If reaper CPU or Redis time spent on SCAN becomes material (observed via profiling/metrics) under expected fleet sizes.
-
-Future Work
-
-- Add optional worker registry with TTL stored in Redis; reaper would iterate registry members and target per-worker keys directly.
-- Consider Redis keyspace notifications where operationally acceptable.
+# Reaper: SCAN-Based Recovery vs Worker Registry
+
+## Decision
+
+- Keep SCAN-based discovery of processing lists for v0.4.0-alpha, instead of maintaining a registry of active workers or relying on keyspace notifications.
+
+## Rationale
+
+- Simplicity and robustness: SCAN requires no extra moving parts or configuration and tolerates sudden worker exits.
+- Predictable load: bounded SCAN page sizes and periodic cadence maintain manageable overhead.
+
+## Tradeoffs
+
+- SCAN is O(N); at very large worker fleets, registry/notifications can reduce overhead.
+
+## Operational Defaults
+
+- Key pattern: `jobqueue:worker:*:processing`
+- SCAN COUNT: 500–1000 per call
+- Cadence: every 250ms with a 50ms sleep between pages
+- Max pass time budget: 200ms per sweep
+
+## Revisit Criteria
+
+- If reaper CPU or Redis time spent on SCAN becomes material (observed via profiling/metrics) under expected fleet sizes.
+
+## Future Work
+
+- Add optional worker registry with TTL stored in Redis; reaper would iterate registry members and target per-worker keys directly.
+- Consider Redis keyspace notifications where operationally acceptable.

docs/08_deployment.md (1)

34-37: Fix MD031 and align paths. This was already flagged.

Add a blank line around the fence and use the repo’s ./bin and config/config.yaml paths.
-Config overrides via env vars. Example:
-```bash
-WORKER_COUNT=32 REDIS_ADDR=redis:6379 ./job-queue-system --role=worker --config=config.yaml
-```
+Config overrides via env vars. Example:
+
+```bash
+WORKER_COUNT=32 REDIS_ADDR=redis:6379 ./bin/job-queue-system --role=worker --config=config/config.yaml
+```

docs/13_release_versioning.md (1)

26-31: Do not push all tags. Use annotated tags and push the one you mean.

We already told you. Fix it.
-4) Tag release
+4) Tag release (annotated)
 
-```bash
-git tag vX.Y.Z[-pre]
-git push --tags
-```
+```bash
+git tag -a vX.Y.Z[-pre] -m "release vX.Y.Z[-pre]"
+git push origin vX.Y.Z[-pre]
+```

docs/did-not-agree/priority-fetching-brpoplpush.md (1)

1-24: Fix headings, cite the client behavior, and bound the timeout.

Still MD032-ugly and vague on client semantics. Document the go-redis version and recommend a 200ms per-queue timeout to cap worst-case added latency.

-# Priority Fetching and BRPOPLPUSH Semantics
-
-Decision
-
-- Use per-queue BRPOPLPUSH with short timeout to emulate multi-queue priority rather than a single command returning queue name and value.
-
-Rationale
-
-- Redis does not provide multi-source BRPOPLPUSH. Looping priorities with a short timeout preserves atomic move semantics per queue and delivers predictable prioritization.
-- go-redis returns only the value for BRPopLPush. We record the source queue implicitly by the loop order and use the known `srcQueue` when processing.
-
-Tradeoffs
-
-- Lower-priority jobs may incur up to the per-queue timeout in latency when higher-priority queues are empty.
-- We do not rely on returned queue name; this is documented and tested.
-
-Revisit Criteria
-
-- If sub-second latency for low priority becomes unacceptable or we need multi-queue fairness beyond simple priority preference.
-
-Future Work
-
-- Explore a Lua-assisted sweep to pick the first non-empty queue without waiting the full timeout per queue in sequence.
+# Priority Fetching and BRPOPLPUSH Semantics
+
+## Decision
+
+- Use per-queue BRPOPLPUSH with short timeout to emulate multi-queue priority rather than a single command returning queue name and value.
+
+## Rationale
+
+- Redis does not provide multi-source BRPOPLPUSH. Looping priorities with a short timeout preserves atomic move semantics per queue and delivers predictable prioritization.
+- go-redis (BRPopLPush) returns only the value (document the exact version from go.mod). We record the source queue by loop order and use the known `srcQueue` when processing.
+
+## Tradeoffs
+
+- Lower-priority jobs may incur up to the per-queue timeout in latency when higher-priority queues are empty.
+- We do not rely on returned queue name; this is documented and tested.
+
+## Operational Defaults
+
+- Recommended per-queue BRPOPLPUSH timeout: 200ms (caps worst-case added latency while keeping ops cost reasonable).
+
+## Revisit Criteria
+
+- If sub-second latency for low priority becomes unacceptable or we need multi-queue fairness beyond simple priority.
+
+## Future Work
+
+- Explore a Lua-assisted sweep to pick the first non-empty queue without waiting the full timeout per queue in sequence.

To lock in the actual client version, run:

#!/bin/bash
# Print go-redis dependency to cite exact behavior in docs
rg -nP 'github\.com/(go-redis|redis)/redis' go.mod || true
rg -n 'BRPopLPush' -g '!**/vendor/**' -S .

docs/12_performance_baseline.md (1)

30-47: Fix list/fence formatting and use auto-numbering.

Standardize to “1.” and keep blank lines around fences. This keeps markdownlint quiet and humans sane.

-2) Copy `config/config.example.yaml` to `config/config.yaml` and set:
+1. Copy `config/config.example.yaml` to `config/config.yaml` and set:
    - `worker.count`: 16 on a 4 vCPU node (adjust as needed)
    - `redis.addr`: "localhost:6379"
-3) In one shell, run the worker
+1. In one shell, run the worker
+
 ```bash
 ./bin/job-queue-system --role=worker --config=config/config.yaml

-4) In another shell, run the bench (enqueue and wait for completion)
+1. In another shell, run the bench (enqueue and wait for completion)
+

./bin/job-queue-system --role=admin --admin-cmd=bench \
  --bench-count=2000 --bench-rate=1000 \
  --bench-priority=low --bench-timeout=60s


</blockquote></details>
<details>
<summary>docs/evidence/README.md (3)</summary><blockquote>

`1-7`: **Fix the Markdown basics. Add blank lines around the heading and the list (MD022/MD032).**

Your very first block violates the most basic markdownlint rules.


```diff
 # Evidence for v0.4.0-alpha Promotion
+
 - CI run: see `ci_run.json` (contains URL to the successful workflow run)
 - Bench JSON: `bench.json` (admin bench with 1000 jobs at 500 rps)
 - Config used: `config.alpha.yaml`
 - Metrics snapshots: `metrics_before.txt`, `metrics_after.txt`
+

8-41: Make this a proper section and fix list/fence spacing (MD029/MD031/MD032).

Stop abusing “1)” and jammed fences. Use a heading, “1.” style, and blank lines.

-Reproduce locally
-
-1) Ensure Redis is running on localhost:6379
-
-```bash
-docker run -p 6379:6379 --rm --name jobq-redis redis:7-alpine
-```
-
-2) Build the binary
-
-```bash
-make build
-```
-
-3) Start worker
-
-```bash
-./bin/job-queue-system --role=worker --config=docs/evidence/config.alpha.yaml
-```
-
-4) In another terminal, run bench
-
-```bash
-./bin/job-queue-system --role=admin --config=docs/evidence/config.alpha.yaml \
-  --admin-cmd=bench --bench-count=1000 --bench-rate=500 \
-  --bench-priority=low --bench-timeout=60s
-```
-
-5) Capture metrics
-
-```bash
-curl -sS localhost:9191/metrics | head -n 200 > docs/evidence/metrics_after.txt
-```
+## Reproduce locally
+
+1. Ensure Redis is running on `localhost:6379`.
+
+```bash
+docker run -p 6379:6379 --rm --name jobq-redis redis:7-alpine
+```
+
+2. Build the binary.
+
+```bash
+make build
+```
+
+3. Start worker.
+
+```bash
+./bin/job-queue-system --role=worker --config=docs/evidence/config.alpha.yaml
+```
+
+4. In another terminal, run bench.
+
+```bash
+./bin/job-queue-system --role=admin --config=docs/evidence/config.alpha.yaml \
+  --admin-cmd=bench --bench-count=1000 --bench-rate=500 \
+  --bench-priority=low --bench-timeout=60s
+```
+
+5. Capture metrics.
+
+```bash
+curl -sS localhost:9191/metrics | head -n 200 > docs/evidence/metrics_after.txt
+```

58-69: Make “Automated harness” a real section and fix spacing/fences.

Basic formatting, please.

-Automated harness
-
-- A convenience script `docs/evidence/run_bench.sh` automates the above steps.
-- Default params: COUNT=1000, RATE=500, PRIORITY=low, CONFIG=docs/evidence/config.alpha.yaml, PURGE=1.
-- Example:
-
-```bash
-bash docs/evidence/run_bench.sh
-# or override
-COUNT=2000 RATE=1000 PRIORITY=low bash docs/evidence/run_bench.sh
-```
+## Automated harness
+
+- A convenience script `docs/evidence/run_bench.sh` automates the above steps.
+- Default params: `COUNT=1000`, `RATE=500`, `PRIORITY=low`, `CONFIG=docs/evidence/config.alpha.yaml`, `PURGE=1`.
+- Example:
+
+```bash
+bash docs/evidence/run_bench.sh
+# or override
+COUNT=2000 RATE=1000 PRIORITY=low bash docs/evidence/run_bench.sh
+```

README.md (2)

99-107: Consider adding a note that the port mapping must match config.

People will copy/paste and then stare at a 404.

-docker run --rm -p 9090:9090 --env-file env.list job-queue-system:latest --role=all
+docker run --rm -p 9090:9090 --env-file env.list job-queue-system:latest --role=all
+# If using the evidence harness config (metrics_port=9191), map 9191 instead:
+# docker run --rm -p 9191:9191 --env-file env.list job-queue-system:latest --role=all --config=docs/evidence/config.alpha.yaml

74-79: Document the metrics port reality. Default is 9090, evidence harness uses 9191—say it here.

Don’t force readers to diff configs to figure it out.

- - Prometheus metrics exposed at <http://localhost:9090/metrics> by default
+ - Prometheus metrics exposed at <http://localhost:9090/metrics> by default.
+   Evidence harness in docs uses 9191 via config override; adjust port mappings accordingly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ee3e5e and 738e606.

📒 Files selected for processing (33)

.githooks/pre-commit (1 hunks)
.github/workflows/changelog.yml (1 hunks)
.github/workflows/goreleaser.yml (1 hunks)
.github/workflows/markdownlint.yml (1 hunks)
.markdownlint.yaml (1 hunks)
CHANGELOG.md (1 hunks)
Makefile (1 hunks)
README.md (2 hunks)
docs/00_assessment.md (3 hunks)
docs/01_product_roadmap.md (1 hunks)
docs/02_release_plan.md (3 hunks)
docs/03_milestones.md (2 hunks)
docs/04_sprint_plans.md (5 hunks)
docs/05_architecture.md (4 hunks)
docs/06_technical_spec.md (4 hunks)
docs/07_test_plan.md (2 hunks)
docs/08_deployment.md (3 hunks)
docs/09_requirements.md (2 hunks)
docs/10_risk_register.md (1 hunks)
docs/11_project_charter.md (2 hunks)
docs/12_performance_baseline.md (1 hunks)
docs/13_release_versioning.md (1 hunks)
docs/14_ops_runbook.md (3 hunks)
docs/15_promotion_checklists.md (1 hunks)
docs/PRD.md (5 hunks)
docs/did-not-agree/README.md (1 hunks)
docs/did-not-agree/go-toolchain-version.md (1 hunks)
docs/did-not-agree/metrics-scope.md (1 hunks)
docs/did-not-agree/priority-fetching-brpoplpush.md (1 hunks)
docs/did-not-agree/rate-limiter-fixed-window.md (1 hunks)
docs/did-not-agree/reaper-scan-vs-registry.md (1 hunks)
docs/evidence/README.md (1 hunks)
docs/testing-guide.md (1 hunks)

🧰 Additional context used

🪛 YAMLlint (1.37.1)

.github/workflows/markdownlint.yml

[warning] 3-3: truthy value should be one of [false, true]

(truthy)

[error] 6-6: too many spaces inside brackets

(brackets)

[error] 6-6: too many spaces inside brackets

(brackets)

🪛 checkmake (0.2.2)

Makefile

[warning] 33-33: Missing required phony target "clean"

(minphony)

🔇 Additional comments (13)

docs/06_technical_spec.md (2)
119-122: Define breaker scope and enforce single-probe HalfOpen — implementation not found.

File: docs/06_technical_spec.md (lines 119–122)

Docs list states but omit scope. Decide whether breakers are per-worker (in-process) or cluster-wide, and lock HalfOpen so only a single in-flight probe runs (one goroutine/process; use per-process lock or distributed coordination if cluster-wide). Your earlier search returned no matches — cannot verify implementation.

Apply:
 - States: Closed → Open on threshold; Open → HalfOpen on cooldown; HalfOpen → Closed on success, else Open.
+Scope: breaker is in-process (per worker instance). Half-open allows a single in-flight probe globally per breaker instance; additional calls fail-fast.
74-78: Flag naming inconsistency — standardize --role value (all vs all-in-one)

README mixes forms: prose uses "all-in-one" (README.md:3) while examples use --role=all (README.md:35,47,100). docs/06_technical_spec.md shows --role={producer|worker|all} (lines 74–78). deploy/chart not found in repo — cannot verify Helm usage.
Action: pick a single canonical value and make it consistent across CLI help, code, Helm chart, and all docs. Also add this doc line: "Precedence: flags > env > config file. Missing file loads defaults; invalid values cause non-zero exit." Search & replace all --role / "all-in-one" occurrences and update examples to match the chosen form.
docs/05_architecture.md (2)
20-54: Make completion atomic — this is a data-loss risk

The diagram shows "LPUSH completed + LREM proc + DEL HB" as three separate ops (non‑atomic). A crash between them can mark a job completed while leaving processing entries/heartbeat behind. Either state these are executed atomically (Lua script or MULTI/EXEC) or justify why the non‑atomic ordering is safe and how failures are handled.

File: docs/05_architecture.md (lines 20–54)
- Exec-- success -->Done[LPUSH completed + LREM proc + DEL HB]
+ Exec-- success -->Done[Complete atomically (Lua/Tx): LPUSH completed; LREM proc; DEL HB]
Add a short note under "Data Flows" describing the atomic completion mechanism or the failure/recovery guarantees.

74-82: docs/05_architecture.md — sync stack details with the repo (do it now)

Go: go.mod declares "go 1.25.0" (go.mod:3) — replace "Language: Go 1.21+" with "Language: Go 1.25.x".

Container: Dockerfile (root) shows builder "FROM golang:1.21 AS build" (line 3) and runtime "FROM gcr.io/distroless/base-debian12:nonroot" (line 10) — update to the explicit distroless image and reconcile the builder vs go.mod.

Redis client: no explicit github.com/redis/go-redis entry found in go.mod — do not hardcode "go-redis v8"; point readers to go.mod for the exact module path/version.

Tracing: grep for otlptrace(http|grpc) in internal returned nothing — do not assert "OTLP HTTP exporter" until you confirm; change to "OpenTelemetry (OTLP; HTTP or gRPC — see internal/obs/tracing.go)" or update with the exact exporter after checking the implementation.

Suggested replacement diff:
- Language: Go 1.21+
- Redis Client: go-redis v8
- Tracing: OpenTelemetry (OTLP HTTP exporter)
- Container: Distroless base image
+ Language: Go 1.25.x
+ Redis Client: see go.mod for exact module path & version
+ Tracing: OpenTelemetry (OTLP; HTTP or gRPC — align with internal/obs/tracing.go)
+ Container: gcr.io/distroless/base-debian12:nonroot (runtime); builder currently golang:1.21 (reconcile with go.mod)
CHANGELOG.md (1)

5-16: Stop hand-waving — make Unreleased actionable and linkable.

Group entries under Added / Changed / Fixed (Keep a Changelog) and add a compare link for [Unreleased]; add a dated v0.4.0 section when you cut the release.

Verified present in this PR:

Config validation — internal/config/config.go (+ tests: internal/config/config_test.go)

Worker active gauge ("worker_active") — internal/obs/metrics.go

Health/readiness endpoints (/healthz, /readyz) — internal/obs/http.go

govulncheck in CI — .github/workflows/ci.yml

Not verified by the above checks (drop or add precise links/files/commits):

Admin CLI: stats, peek, purge-dlq

Queue length gauges updater

Tracing propagation from job IDs

Smarter rate limiting (TTL-based sleep + jitter)

E2E tests with Redis service in CI

Action: either remove unverifiable items from Unreleased or replace each with a link to the exact commit/file/PR that implements it.

docs/did-not-agree/README.md (1)

5-11: Formatting fixed. Keep it that way.

MD032 addressed; list now has proper blank lines. Good.

docs/15_promotion_checklists.md (1)

7-9: /readyz implemented — resolved.

Handler is registered in internal/obs/http.go (mux.HandleFunc("/readyz", ...)); README.md, docs/15_promotion_checklists.md, docs/14_ops_runbook.md, docs/PRD.md reference it; docs/evidence/run_bench.sh probes it; cmd/job-queue-system/main.go starts the HTTP server. No change required.

docs/evidence/README.md (1)

70-75: LGTM: outputs list reads cleanly.

README.md (4)

20-31: LGTM: example config + build flow are sane and concise.

50-70: LGTM: admin commands block is clear.

Keep it in a single fenced block like this.

119-124: LGTM: evidence artifacts section reads fine now.

111-118: LGTM: status and promotion gates block formatted correctly.

docs/PRD.md (1)

87-93: Metrics port default mismatch — pick one port and document it.

docs/PRD.md (lines 87–93) sets observability.metrics_port: 9090. My verification run failed (ripgrep: "No files were searched") — confirm whether README/examples/CI use 9191 or 9090 and make them consistent.

Run: rg -nP --hidden -S 'metrics_port:\s*(9090|9191)|localhost:(9090|9191)/(metrics|healthz|readyz)' -g '!/node_modules/' and align docs/configs.

Either change PRD to 9191 to match evidence, or keep 9090 and add one explicit sentence in PRD: "Default: 9090; examples/evidence use 9191."

.githooks/pre-commit

.github/workflows/changelog.yml

.github/workflows/goreleaser.yml

coderabbitai · 2025-09-13T00:32:06Z

.github/workflows/goreleaser.yml

+        uses: goreleaser/goreleaser-action@v6
+        with:
+          distribution: goreleaser
+          version: latest
+          args: release --clean
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}


🛠️ Refactor suggestion

“version: latest” is non-deterministic. Pin your toolchain.

Releases must be reproducible.

- - name: Run GoReleaser - uses: goreleaser/goreleaser-action@v6 + - name: Run GoReleaser + uses: goreleaser/goreleaser-action@v6 with: distribution: goreleaser - version: latest + version: v2.6.1 # pin a known-good Goreleaser version args: release --clean

Also consider guarding workflow_dispatch to snapshots to avoid accidental publishes:

- args: release --clean + args: ${{ github.event_name == 'workflow_dispatch' && 'release --clean --skip=publish --snapshot' || 'release --clean' }}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

uses: goreleaser/goreleaser-action@v6

with:

distribution: goreleaser

version: latest

args: release --clean

env:

GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Run GoReleaser

uses: goreleaser/goreleaser-action@v6

with:

distribution: goreleaser

version: v2.6.1 # pin a known-good Goreleaser version

args: ${{ github.event_name == 'workflow_dispatch' && 'release --clean --skip=publish --snapshot' || 'release --clean' }}

env:

GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

🤖 Prompt for AI Agents

.github/workflows/goreleaser.yml lines 35-41: the action uses a non-deterministic "version: latest" which prevents reproducible releases and risks accidental publishes; replace "latest" with a pinned Goreleaser version (e.g. a specific tag or semver like v1.22.0 or the exact commit/tag you validated) so the toolchain is fixed, and add a guard to workflow_dispatch (for example require a specific input like "type=snapshot" or restrict dispatch to a snapshot branch or protect with a conditional that only runs on tagged or main branch) to prevent accidental publishes.

coderabbitai · 2025-09-13T00:32:10Z

docs/did-not-agree/go-toolchain-version.md

+- Use Go 1.25.x across CI and release workflows; go.mod set to 1.25.0.
+
+Rationale
+
+- Addresses govulncheck-reported stdlib CVEs affecting earlier versions.
+- CI and local builds succeed with 1.25.x in this environment; evidence (`go_info`) confirms runtime version.
+
+Tradeoffs
+
+- Requires builders to have Go 1.25 installed; older toolchains will not match CI.
+
+Revisit Criteria
+
+- If compatibility issues arise for consumers on earlier Go versions.
+
+Future Work
+
+- Consider matrix-testing across supported Go versions once stability goals are met.


💡 Verification agent

🧩 Analysis chain

“1.25.x” is vague. Pin floor and verify CI actually uses it everywhere.

State minimum patch (e.g., 1.25.1+) and audit workflows.

-Decision - -- Use Go 1.25.x across CI and release workflows; go.mod set to 1.25.0. +Decision + +- Use Go >=1.25.1 across CI and release workflows; go.mod set to 1.25.0 (module), toolchain pinned via workflows.

🏁 Script executed:

#!/bin/bash rg -nP 'go-version:.*1\.25' .github/workflows -C1 rg -nP '^go\s+1\.25' go.mod rg -nP 'setup-go|actions/setup-go' .github/workflows -n

Length of output: 970

Pin Go toolchain to ≥1.25.1 — stop using 1.25.x.

Found: .github/workflows/{ci.yml, release.yml, goreleaser.yml, changelog.yml} use go-version: '1.25.x'; go.mod contains go 1.25.0.

Action: Replace 1.25.x with a pinned minimum patched release (e.g., 1.25.1 or an exact patch) in those workflow files and update docs/did-not-agree/go-toolchain-version.md to state the minimum; align or justify go.mod accordingly.

Verify: Ensure actions/setup-go@v5 resolves the pinned version in CI and re-run vuln checks (govulncheck).

🤖 Prompt for AI Agents

In docs/did-not-agree/go-toolchain-version.md around lines 5-22, the document currently recommends "Use Go 1.25.x" but the review requires pinning to a minimum patched release; update the text to state a pinned minimum (e.g., "Go >= 1.25.1" or a specific patch like "1.25.1"), and instruct the author to change .github/workflows/{ci.yml, release.yml, goreleaser.yml, changelog.yml} to use that pinned version (eg. go-version: '1.25.1'), then align go.mod (change `go 1.25.0` to `go 1.25.1` or add a short justification if you must keep 1.25.0), ensure actions/setup-go@v5 resolves the pinned patch in CI, and re-run govulncheck to verify the fix.

coderabbitai · 2025-09-13T00:32:10Z

docs/did-not-agree/metrics-scope.md

+
+Decision
+
+- Added essential metrics (job counters, duration histogram, queue length, breaker state, trips, reaper recovered, worker_active). Defer additional metrics like worker restart count and Redis pool internals for now.


🛠️ Refactor suggestion

Name the actual metrics or they will drift from reality.

Readers need the exact metric names exported by the binary.

- Added essential metrics (job counters, duration histogram, queue length, breaker state, trips, reaper recovered, worker_active). Defer additional metrics like worker restart count and Redis pool internals for now. + +Metric names (current) +- jobs_produced_total, jobs_consumed_total, jobs_completed_total, jobs_dlq_total +- job_duration_seconds (Histogram) +- queue_length{queue=...} (Gauge) +- circuit_breaker_state{component=...} (Gauge: 0=Closed,1=Open,2=HalfOpen) +- circuit_breaker_trips_total +- reaper_recovered_total +- worker_active (Gauge)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- Added essential metrics (job counters, duration histogram, queue length, breaker state, trips, reaper recovered, worker_active). Defer additional metrics like worker restart count and Redis pool internals for now.

- Added essential metrics (job counters, duration histogram, queue length, breaker state, trips, reaper recovered, worker_active). Defer additional metrics like worker restart count and Redis pool internals for now.

Metric names (current)

- jobs_produced_total, jobs_consumed_total, jobs_completed_total, jobs_dlq_total

- job_duration_seconds (Histogram)

- queue_length{queue=...} (Gauge)

- circuit_breaker_state{component=...} (Gauge: 0=Closed,1=Open,2=HalfOpen)

- circuit_breaker_trips_total

- reaper_recovered_total

- worker_active (Gauge)

🤖 Prompt for AI Agents

In docs/did-not-agree/metrics-scope.md at line 5, the doc currently lists generic metric categories but lacks the exact metric names exported by the binary; update this line to enumerate the precise metric identifiers (e.g., job counters like job_processed_total, job_failed_total; duration histogram like job_duration_seconds_bucket, job_duration_seconds_sum, job_duration_seconds_count; queue length like queue_length_gauge; breaker state and trips like circuit_breaker_state, circuit_breaker_trips_total; reaper recovered like reaper_recovered_total; worker_active like workers_active_gauge) by pulling the names directly from the code/metrics registry, replace the vague sentence with that exact list, and add a short note to keep metric names synchronized with the binary (or reference the source file/version) so they don't drift.

coderabbitai · 2025-09-13T00:32:10Z

docs/testing-guide.md

+```bash
+docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine
+```
+
+Run the e2e suite:
+
+```bash
+E2E_REDIS_ADDR=localhost:6379 go test ./test/e2e -race -count=1 -v
+```


🛠️ Refactor suggestion

Pin Redis and add a healthcheck. Don’t rely on whatever “latest-alpine” happens to be today.

Flaky e2e is self-inflicted if you don’t control the env.

-docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine +docker run --rm -d --name jobq-redis -p 6379:6379 \ + --health-cmd='redis-cli ping || exit 1' \ + --health-interval=5s --health-timeout=3s --health-retries=10 \ + redis:7.2-alpine

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

```bash

docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine

```

Run the e2e suite:

```bash

E2E_REDIS_ADDR=localhost:6379 go test ./test/e2e -race -count=1 -v

```

🤖 Prompt for AI Agents

In docs/testing-guide.md around lines 132–140, the guidance currently runs Redis with an unpinned image and no healthcheck; update the example to pin a stable Redis image (e.g., redis:7.0.11-alpine) and add Docker healthcheck flags (health-cmd, health-interval, health-retries, health-timeout) to ensure Redis is actually ready before starting tests, and add a short note to wait for the container to report healthy (or poll docker inspect) prior to running the e2e command.

coderabbitai · 2025-09-13T00:32:10Z

docs/testing-guide.md

+Run the e2e suite:
+
+```bash
+E2E_REDIS_ADDR=localhost:6379 go test ./test/e2e -race -count=1 -v
+```
+
+Run the single e2e test:
+
+```bash
+E2E_REDIS_ADDR=localhost:6379 go test ./test/e2e -run '^TestE2E_WorkerCompletesJobWithRealRedis$' -race -count=1 -v
+```


🧹 Nitpick (assertive)

Document state cleanup or your “determinism” is a lie.

Backlog keys from prior runs skew results. Call your own admin tool.

-Run the e2e suite: +Run the e2e suite (clean state): ```bash -E2E_REDIS_ADDR=localhost:6379 go test ./test/e2e -race -count=1 -v +E2E_REDIS_ADDR=localhost:6379 ./bin/admin purge-all --yes +E2E_REDIS_ADDR=localhost:6379 go test ./test/e2e -race -count=1 -v

<details> <summary>🤖 Prompt for AI Agents</summary>

In docs/testing-guide.md around lines 136 to 146, the e2e run instructions don't
clear leftover Redis state between runs causing flaky results; update the doc to
run the project's admin purge command before executing the tests (using the same
E2E_REDIS_ADDR env var) so prior backlog keys are removed, e.g. call ./bin/admin
purge-all --yes with E2E_REDIS_ADDR set immediately before the go test
invocation to ensure a clean state for deterministic tests.

</details>  

coderabbitai · 2025-09-13T00:32:10Z

Makefile

+.PHONY: mdlint
+mdlint:
+	@if ! command -v npx >/dev/null 2>&1; then \
+		echo "npx not found. Please install Node.js to run markdownlint."; \
+		exit 1; \
+	fi
+	@npx -y markdownlint-cli2 "**/*.md" "!**/node_modules/**"


🧹 Nitpick (assertive)

Add a clean target. Don’t ship a Makefile without it.

Satisfies checkmake and basic developer hygiene.

.PHONY: mdlint mdlint: @if ! command -v npx >/dev/null 2>&1; then \ echo "npx not found. Please install Node.js to run markdownlint."; \ exit 1; \ fi @npx -y markdownlint-cli2 "**/*.md" "!**/node_modules/**" + +.PHONY: clean +clean: + rm -rf bin

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

.PHONY: mdlint

mdlint:

@if ! command -v npx >/dev/null 2>&1; then \

echo "npx not found. Please install Node.js to run markdownlint."; \

exit 1; \

fi

@npx -y markdownlint-cli2 "**/*.md" "!**/node_modules/**"

.PHONY: mdlint

mdlint:

@if ! command -v npx >/dev/null 2>&1; then \

echo "npx not found. Please install Node.js to run markdownlint."; \

exit 1; \

fi

@npx -y markdownlint-cli2 "**/*.md" "!**/node_modules/**"

.PHONY: clean

clean:

rm -rf bin

🧰 Tools

🪛 checkmake (0.2.2)

[warning] 33-33: Missing required phony target "clean"

(minphony)

🤖 Prompt for AI Agents

In Makefile around lines 33 to 39, there's no clean target; add a .PHONY: clean and implement a clean target that removes common generated artifacts (e.g., build/dist directories, temporary and cache files like .cache, .venv, *.pyc, node_modules/.cache, and any markdownlint or other tool caches) so developers can restore a pristine repo state; keep the recipe idempotent and safe (use rm -rf for the known artifact paths only).

coderabbitai

Review continued from previous batch...

coderabbitai · 2025-09-13T00:33:18Z

docs/12_performance_baseline.md

+
 - Redis pool: `redis.pool_size_multiplier` (default 10*NumCPU). Increase for higher concurrency; monitor Redis CPU.
 - Timeouts: `redis.read_timeout`/`write_timeout` (default 3s). Too low yields errors under load; too high slows failure detection.
 - Worker concurrency: `worker.count`. Increase up to CPU saturation; watch goroutine scheduling and Redis ops.
 - Backoff parameters: for retry behavior; not relevant for the baseline success path.
 - Priority timeout: `worker.brpoplpush_timeout` (default 1s). Smaller values reduce low-priority latency but add Redis ops.



🧹 Nitpick (assertive)

Recommend a sane default for priority timeout.

1s default is high; call out 200ms for better priority fairness.

- Priority timeout: `worker.brpoplpush_timeout` (default 1s). Smaller values reduce low-priority latency but add Redis ops. +- Priority timeout: `worker.brpoplpush_timeout` (default 1s; recommended 200ms in production to bound added latency).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- Redis pool: `redis.pool_size_multiplier` (default 10*NumCPU). Increase for higher concurrency; monitor Redis CPU.

- Timeouts: `redis.read_timeout`/`write_timeout` (default 3s). Too low yields errors under load; too high slows failure detection.

- Worker concurrency: `worker.count`. Increase up to CPU saturation; watch goroutine scheduling and Redis ops.

- Backoff parameters: for retry behavior; not relevant for the baseline success path.

- Priority timeout: `worker.brpoplpush_timeout` (default 1s). Smaller values reduce low-priority latency but add Redis ops.

- Redis pool: `redis.pool_size_multiplier` (default 10*NumCPU). Increase for higher concurrency; monitor Redis CPU.

- Timeouts: `redis.read_timeout`/`write_timeout` (default 3s). Too low yields errors under load; too high slows failure detection.

- Worker concurrency: `worker.count`. Increase up to CPU saturation; watch goroutine scheduling and Redis ops.

- Backoff parameters: for retry behavior; not relevant for the baseline success path.

- Priority timeout: `worker.brpoplpush_timeout` (default 1s; recommended 200ms in production to bound added latency).

🤖 Prompt for AI Agents

In docs/12_performance_baseline.md around lines 50–56, the priority timeout item uses a 1s default which is too high for fairness; update the text to recommend a sane default of 200ms (with a short note that smaller values improve priority fairness but increase Redis ops), replace the current line mentioning `worker.brpoplpush_timeout` default 1s with the new 200ms recommendation, and add a one-line guidance to monitor Redis CPU/operation rate when reducing this value.

coderabbitai · 2025-09-13T00:33:18Z

docs/evidence/README.md

+- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress.
+- To avoid stale backlog affecting evidence, clear test keys before running a bench:
+
+```bash
+redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter
+redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL
+```


⚠️ Potential issue

Stop telling users to run KEYS and mass DEL. That’s operationally reckless. Use SCAN and UNLINK.

Blocking KEYS and huge DEL will nuke Redis latency. Fix it.

-- To avoid stale backlog affecting evidence, clear test keys before running a bench: - -```bash -redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter -redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL -``` +- To avoid stale backlog affecting evidence, clear test keys before running a bench (non-blocking): + +```bash +# Known small lists can be unlinked directly (async delete): +redis-cli UNLINK jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter +# Never use KEYS in prod. Scan and unlink in sane batches: +redis-cli --scan --pattern 'jobqueue:worker:*:processing' | xargs -r -n 100 redis-cli UNLINK +``` + +WARNING: Do not run purges against production prefixes. Use isolated test namespaces and double-check the key prefix.

🤖 Prompt for AI Agents

In docs/evidence/README.md around lines 44-50, replace the instructions that use KEYS and DEL with guidance to use UNLINK for known small lists (async delete) and to use SCAN in batched iterations with UNLINK for matching keys instead of blocking KEYS/DEL; update the example commands to show using redis-cli --scan piped to xargs with a reasonable batch size (and -r to handle no matches) and add a clear WARNING to never run purges against production prefixes—use isolated test namespaces and double-check key prefixes before deleting.

coderabbitai · 2025-09-13T00:33:18Z

docs/evidence/README.md

+- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress.
+- To avoid stale backlog affecting evidence, clear test keys before running a bench:
+
+```bash
+redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter
+redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL
+```
+
+- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port.
+
+Notes
+
+- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there.
+


🧹 Nitpick (assertive)

One more thing: call out that admin bench bypasses producer metrics and how to validate throughput.

Add a sentence to save future confusion.

- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress. +- Corollary: to validate end-to-end producer metrics, run a producer process instead of `admin bench` and watch `jobs_produced_total`.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress.

- To avoid stale backlog affecting evidence, clear test keys before running a bench:

```bash

redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter

redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL

```

- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port.

Notes

- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there.

- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress.

- Corollary: to validate end-to-end producer metrics, run a producer process instead of `admin bench` and watch `jobs_produced_total`.

- To avoid stale backlog affecting evidence, clear test keys before running a bench:

🤖 Prompt for AI Agents

In docs/evidence/README.md around lines 44 to 57, the text doesn't explicitly state that the admin `bench` command enqueues jobs directly and therefore bypasses producer-side metrics; update the paragraph to add a clear sentence explaining that the admin `bench` uses direct LPUSH (bypassing producer metrics like jobs_produced_total) and instruct readers to validate throughput using jobs_consumed_total or jobs_completed_total and queue length checks (and to use the metrics port 9191 as noted); place this sentence near the existing note about jobs_produced_total so it’s immediately visible when discussing which metrics to use.

coderabbitai · 2025-09-13T00:33:18Z

docs/evidence/README.md

+- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port.
+
+Notes
+
+- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there.
+


🧹 Nitpick (assertive)

Be explicit about the 9090 vs 9191 split. Readers shouldn’t have to guess.

State that default is 9090, but evidence harness uses 9191 via config.

-- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port. +- Default metrics port is `9090`. This harness uses `9191` via `observability.metrics_port` in `config.alpha.yaml`. Ensure your curl/port mappings match the chosen config.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port.

Notes

- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there.

Default metrics port is `9090`. This harness uses `9191` via `observability.metrics_port` in `config.alpha.yaml`. Ensure your curl/port mappings match the chosen config.

Notes

- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there.

🤖 Prompt for AI Agents

In docs/evidence/README.md around lines 52 to 57, the README currently mentions metrics port 9191 but doesn't explicitly state the common default; update the text to clearly say that the Prometheus default port is 9090 while this evidence harness overrides it to 9191 (as set by observability.metrics_port in config.alpha.yaml), and adjust the curl example text to explicitly instruct readers to use port 9191 for this harness (and note where to change it if using the default 9090).

coderabbitai · 2025-09-13T00:33:18Z

docs/evidence/README.md

+redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL
+```
+
+- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port.


🧹 Nitpick (assertive)

Add a cross-reference to README metrics section.

Reduce duplication and drift.

-- Default metrics port is `9090`. This harness uses `9191` via `observability.metrics_port` in `config.alpha.yaml`. Ensure your curl/port mappings match the chosen config. +- Default metrics port is `9090`. This harness uses `9191` via `observability.metrics_port` in `config.alpha.yaml`. Ensure your curl/port mappings match the chosen config. See README “Metrics” for defaults and overrides.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port.

- Default metrics port is `9090`. This harness uses `9191` via `observability.metrics_port` in `config.alpha.yaml`. Ensure your curl/port mappings match the chosen config. See README "Metrics" for defaults and overrides.

🤖 Prompt for AI Agents

In docs/evidence/README.md around line 52, the sentence about the metrics port is duplicated; add a cross-reference to the README metrics section instead of repeating the port number to reduce duplication and drift. Replace the explicit port note with a short pointer like "See the metrics section in this README for the configured port (observability.metrics_port in config.alpha.yaml)" or link to the metrics subsection header so readers are directed to the canonical location and only that section maintains the actual port detail.

coderabbitai · 2025-09-13T00:33:18Z

docs/PRD.md

 ### Producer
+
 - Scan directory recursively using include/exclude globs.
 - Determine priority by extension list.
 - Rate limiting: INCR rate_limit_key; if first increment, set EX=1; if value > rate_limit_per_sec, `TTL`-based precise sleep (with jitter) until window reset before enqueueing more.
 - LPUSH job JSON to priority queue.



🛠️ Refactor suggestion

Rate limiter: use PTTL for ms precision; TTL is coarse and off-by-one-prone.

Your “precise sleep” isn’t precise if you read TTL in seconds.

-- Rate limiting: INCR rate_limit_key; if first increment, set EX=1; if value > rate_limit_per_sec, `TTL`-based precise sleep (with jitter) until window reset before enqueueing more. +- Rate limiting: INCR rate_limit_key; if first increment, set EX=1; if value > rate_limit_per_sec, use `PTTL` to sleep precisely (with jitter) until window reset before enqueueing more.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### Producer

- Scan directory recursively using include/exclude globs.

- Determine priority by extension list.

- Rate limiting: INCR rate_limit_key; if first increment, set EX=1; if value > rate_limit_per_sec, `TTL`-based precise sleep (with jitter) until window reset before enqueueing more.

- LPUSH job JSON to priority queue.

### Producer

- Scan directory recursively using include/exclude globs.

- Determine priority by extension list.

- Rate limiting: INCR rate_limit_key; if first increment, set EX=1; if value > rate_limit_per_sec, use `PTTL` to sleep precisely (with jitter) until window reset before enqueueing more.

- LPUSH job JSON to priority queue.

🤖 Prompt for AI Agents

In docs/PRD.md around lines 125 to 131, the rate-limiter description currently reads that the producer should use TTL to decide a "precise sleep" which is coarse and can be off-by-one; change the design to use Redis PTTL (millisecond precision) when computing remaining window time, compute sleep = max(PTTL, 0) + jitter_ms (random small ms) and sleep that many milliseconds before enqueueing, and document handling for missing/negative PTTL (treat as 0 and recheck/increment atomically) so the loop uses millisecond-accurate waits rather than second-granularity TTL.

coderabbitai · 2025-09-13T00:33:18Z

docs/PRD.md

 ### Worker Fetch
+
 - Unique worker ID: "hostname-PID-idx" for each goroutine.
 - Prioritized fetch: loop priorities in order (e.g., high then low) and call `BRPOPLPUSH` per-queue with a short timeout (default 1s). Guarantees atomic move per-queue, priority preference within timeout granularity, and no job loss. Tradeoff: lower-priority jobs may wait up to the timeout when higher-priority queues are empty.
 - On receipt, SET heartbeat key to job JSON with EX=heartbeat_ttl.



🛠️ Refactor suggestion

Use BLMOVE where supported; BRPOPLPUSH is the fossil.

Modern Redis has BLMOVE. Call it out and prefer it when available.

-- Prioritized fetch: loop priorities in order (e.g., high then low) and call `BRPOPLPUSH` per-queue with a short timeout (default 1s). Guarantees atomic move per-queue, priority preference within timeout granularity, and no job loss. Tradeoff: lower-priority jobs may wait up to the timeout when higher-priority queues are empty. +- Prioritized fetch: loop priorities in order (e.g., high then low) and call `BLMOVE` (Redis ≥6.2) or fallback to `BRPOPLPUSH` per-queue with a short timeout (default 1s). This guarantees atomic move per-queue, priority preference within timeout granularity, and no job loss. Tradeoff: lower-priority jobs may wait up to the timeout when higher-priority queues are empty.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### Worker Fetch

- Unique worker ID: "hostname-PID-idx" for each goroutine.

- Prioritized fetch: loop priorities in order (e.g., high then low) and call `BRPOPLPUSH` per-queue with a short timeout (default 1s). Guarantees atomic move per-queue, priority preference within timeout granularity, and no job loss. Tradeoff: lower-priority jobs may wait up to the timeout when higher-priority queues are empty.

- On receipt, SET heartbeat key to job JSON with EX=heartbeat_ttl.

### Worker Fetch

- Unique worker ID: "hostname-PID-idx" for each goroutine.

- Prioritized fetch: loop priorities in order (e.g., high then low) and call `BLMOVE` (Redis ≥6.2) or fallback to `BRPOPLPUSH` per-queue with a short timeout (default 1s). This guarantees atomic move per-queue, priority preference within timeout granularity, and no job loss. Tradeoff: lower-priority jobs may wait up to the timeout when higher-priority queues are empty.

- On receipt, SET heartbeat key to job JSON with EX=heartbeat_ttl.

🤖 Prompt for AI Agents

In docs/PRD.md around lines 132 to 137, the fetch design currently prescribes BRPOPLPUSH; update the text to prefer BLMOVE (blocking variant of LMOVE) where Redis supports it, and mention falling back to BRPOPLPUSH on older servers. State the advantages of BLMOVE (atomic blocking move without timeout granularity tradeoffs) and add a short note about feature-detection at startup or client-level capability checks to choose BLMOVE first and only use BRPOPLPUSH as a compatibility fallback.

coderabbitai · 2025-09-13T00:33:19Z

docs/PRD.md

 - On success: LPUSH completed_list job JSON; LREM processing_list 1 job; DEL heartbeat key.
 - On failure: increment Retries in payload; exponential backoff; if retries <= max_retries LPUSH back to original priority queue; else LPUSH dead_letter_list; in both cases LREM from processing_list and DEL heartbeat.



🛠️ Refactor suggestion

You’re going to leak memory into “completed” forever. Define retention.

Add trimming or archival guidance. Don’t hand-wave it.

- On success: LPUSH completed_list job JSON; LREM processing_list 1 job; DEL heartbeat key. + - On success: LPUSH completed_list job JSON; LREM processing_list 1 job; DEL heartbeat key. + Implement retention: e.g., `LTRIM completed_list 0 <N-1>` or move completions to a capped stream; document operational caps.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- On success: LPUSH completed_list job JSON; LREM processing_list 1 job; DEL heartbeat key.

- On failure: increment Retries in payload; exponential backoff; if retries <= max_retries LPUSH back to original priority queue; else LPUSH dead_letter_list; in both cases LREM from processing_list and DEL heartbeat.

- On success: LPUSH completed_list job JSON; LREM processing_list 1 job; DEL heartbeat key.

Implement retention: e.g., `LTRIM completed_list 0 <N-1>` or move completions to a capped stream; document operational caps.

- On failure: increment Retries in payload; exponential backoff; if retries <= max_retries LPUSH back to original priority queue; else LPUSH dead_letter_list; in both cases LREM from processing_list and DEL heartbeat.

🤖 Prompt for AI Agents

In docs/PRD.md around lines 142–144, the process currently LPUSHes every successful job into the completed_list with no retention or archival, which will grow unbounded; add a concrete retention policy and actionable trimming/archival steps: specify a max completed_list length (e.g. COMPLETED_LIST_MAX_ITEMS) and recommend trimming after each LPUSH with LTRIM to that length, or alternatively export/rotate older entries to an archival store (S3/Blob/DB) using a background hopper job that moves items older than X days (CONFIG_COMPLETED_RETENTION_DAYS) and then LTRIM/DEL them from Redis; document defaults, config names, and the recommended background/cron frequency and safety steps (atomic RPOPLPUSH or Lua script to move batches) so implementers can add periodic archival and size-based trimming instead of letting completed_list grow forever.

coderabbitai · 2025-09-13T00:33:19Z

docs/PRD.md

+
 - Periodically scan all heartbeat keys matching pattern. For each missing/expired heartbeat, recover jobs lingering in processing lists:
  - For each worker processing list, if list has elements and the corresponding heartbeat key is absent, pop jobs (LPOP) one by one, inspect priority within payload, and LPUSH back to the appropriate priority queue.



🧹 Nitpick (assertive)

Reaper needs bounds and pacing, or it will DOS your Redis. Document it.

State scan cadence, batch sizes, and backoff, not just “periodically scan.”

- Periodically scan all heartbeat keys matching pattern. For each missing/expired heartbeat, recover jobs lingering in processing lists: + - Reaper cadence/bounds: limit SCAN per cycle, cap recovered jobs per iteration, and sleep between batches to avoid saturating Redis. Expose these as config. - For each worker processing list, if list has elements and the corresponding heartbeat key is absent, pop jobs (LPOP) one by one, inspect priority within payload, and LPUSH back to the appropriate priority queue.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- Periodically scan all heartbeat keys matching pattern. For each missing/expired heartbeat, recover jobs lingering in processing lists:

- For each worker processing list, if list has elements and the corresponding heartbeat key is absent, pop jobs (LPOP) one by one, inspect priority within payload, and LPUSH back to the appropriate priority queue.

- Reaper cadence/bounds: limit SCAN per cycle, cap recovered jobs per iteration, and sleep between batches to avoid saturating Redis. Expose these as config.

- For each worker processing list, if list has elements and the corresponding heartbeat key is absent, pop jobs (LPOP) one by one, inspect priority within payload, and LPUSH back to the appropriate priority queue.

🤖 Prompt for AI Agents

In docs/PRD.md around lines 150 to 153, the reaper description currently says only "periodically scan" and pop jobs without bounds; update the doc to specify concrete pacing and limits: define a scan cadence (e.g., every N seconds/minutes), a max number of heartbeat keys inspected per scan (batch size), a max number of jobs reaped per worker per batch, and an exponential or fixed backoff when no stale heartbeats are found or when Redis errors occur; also document inter-batch sleep to throttle LPOP/LPUSH operations and recommend configurable defaults and knobs (scan_interval, batch_size, max_reap_per_worker, backoff_base) so consumers can tune to avoid DOSing Redis.

coderabbitai · 2025-09-13T00:33:19Z

README.md

 ### Metrics

- Prometheus metrics exposed at http://localhost:9090/metrics by default
+- Prometheus metrics exposed at <http://localhost:9090/metrics> by default

 ### Health and Readiness

- Liveness: http://localhost:9090/healthz returns 200 when the process is up
- Readiness: http://localhost:9090/readyz returns 200 only when Redis is reachable
+- Liveness: <http://localhost:9090/healthz> returns 200 when the process is up
+- Readiness: <http://localhost:9090/readyz> returns 200 only when Redis is reachable



💡 Verification agent

🧩 Analysis chain

Verify ports across docs/configs; fix any stragglers to avoid support tickets.

Automate the check and kill the drift now.

Also applies to: 90-107, 119-124

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Find metrics ports mentioned in configs and docs ==" rg -nP 'metrics?_?port.*(9090|9191)' -g '!**/node_modules/**' -C2 || true echo echo "== Find literal localhost metrics URLs ==" rg -nP 'localhost:(9090|9191)/metrics' -g '!**/node_modules/**' -C2 || true echo echo "== Show all occurrences of 9090/9191 in README and evidence docs ==" rg -nP '9090|9191' README.md docs/evidence -n -C2 || true echo echo "Expected:" echo "- README: default 9090 noted; explicit note that evidence uses 9191." echo "- docs/evidence/README.md: clearly states 9191 override and cross-links README."

Length of output: 2649

Stop port drift: README = 9090, evidence harness = 9191 — document the intent and add a CI guard.

Findings: README.md references localhost:9090 (metrics/health and docker run — README.md lines ~74–79, 100–101). Evidence harness intentionally uses 9191 (docs/evidence/config.alpha.yaml & config.used.yaml: observability.metrics_port: 9191; docs/evidence/README.md: curl localhost:9191/metrics; docs/evidence/run_bench.sh: fallback 9191).

Required changes: (1) Update docs/evidence/README.md to explicitly state the 9191 override and add a cross‑link to the top-level README; (2) Add a CI lint (simple ripgrep check) that fails on undocumented mixed metrics ports to prevent future drift.

🤖 Prompt for AI Agents

In README.md lines ~72–80 and related evidence files, the top-level README still documents Prometheus/health endpoints on :9090 while the evidence harness intentionally uses :9191; update docs/evidence/README.md to explicitly state that the evidence harness overrides observability ports to 9191 and add a clear cross-link back to the top-level README so readers understand the override, and add a CI guard (e.g., a small ripgrep-based lint job in the repo CI config) that fails when files mention mixed metrics ports without documentation to prevent future drift.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (8)

.vscode/settings.json (3)
12-15: Kill the deprecated gopls flag. JSON is not a graveyard for removed settings.

"build.experimentalWorkspaceModule" has been deprecated/removed for ages. Delete it.
   "gopls": {
     "ui.semanticTokens": true,
-    "build.experimentalWorkspaceModule": true
   },
16-16: Defaulting to -race on every test run is self-inflicted pain.

Keep it for CI; make local default fast (["-count=1"]) and add a separate “race” profile/task.
-  "go.testFlags": ["-race", "-count=1"],
+  "go.testFlags": ["-count=1"],
Add a VS Code Test Profile or task for race runs invoking ["-race","-count=1"].

1-18: This JSON is broken. Duplicate “[go]” blocks and mismatched braces — it won’t even parse. Fix it.

You have two “[go]” sections (Lines 3 and 5) and an extra “}” (Line 11). Clean this up and merge the settings into a single block.

Apply this diff to produce valid JSON:
-{
-  "editor.formatOnSave": false,
-  "[go]": {
-    "editor.formatOnSave": true,
-"[go]": {
-  "editor.formatOnSave": true,
-  "editor.codeActionsOnSave": {
-    "source.organizeImports": "explicit"
-  }
-},
-  },
-  "gopls": {
-    "ui.semanticTokens": true,
-    "build.experimentalWorkspaceModule": true
-  },
-  "go.testFlags": ["-race", "-count=1"],
-  "go.toolsManagement.autoUpdate": true
-}
+{
+  "editor.formatOnSave": false,
+  "[go]": {
+    "editor.formatOnSave": true,
+    "editor.codeActionsOnSave": {
+      "source.organizeImports": "explicit"
+    }
+  },
+  "gopls": {
+    "ui.semanticTokens": true
+  },
+  "go.testFlags": ["-race", "-count=1"],
+  "go.toolsManagement.autoUpdate": true
+}
.github/workflows/goreleaser.yml (3)
24-27: Keep this step — it feeds .goreleaser.yaml.

You’re exporting GITHUB_REPOSITORY_OWNER/NAME used by release.github.owner/name. The earlier advice to delete this is obsolete.

38-45: Pin your damn GoReleaser version and guard manual runs.

“latest” is non-deterministic. Also, workflow_dispatch should default to snapshot to avoid accidental publishes.
       - name: Run GoReleaser
         uses: goreleaser/goreleaser-action@v6
         with:
           distribution: goreleaser
-          version: latest
-          args: release --clean
+          version: v2.6.1
+          args: ${{ github.event_name == 'workflow_dispatch' && 'release --clean --snapshot --skip=publish' || 'release --clean' }}
         env:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Optionally harden the job to only publish on tags:
 jobs:
   release:
     runs-on: ubuntu-latest
+    if: startsWith(github.ref, 'refs/tags/')
9-12: Tighten permissions and enable OIDC for provenance.

Stop handing the workflow a bazooka. Scope it:

Add id-token: write for signing/provenance.

Keep only what’s needed.
 permissions:
-  contents: write
-  packages: write
+  contents: write   # needed to create GitHub Releases
+  packages: write   # needed to push to GHCR
+  id-token: write   # needed for keyless signing/provenance (cosign/SLSA)
.goreleaser.yaml (2)
12-13: Harden builds: strip paths and VCS noise.

Trim build paths and turn off VCS stamping for reproducibility.
     ldflags:
-      - -s -w -X main.version={{.Version}}
+      - -s -w -trimpath -buildvcs=false -X main.version={{.Version}}
30-37: You say “multi-arch”, but this config won’t build it. Fix the Docker section.

Without buildx + platforms here, you’ll ship whatever arch the runner happens to be. Add platforms and basic OCI labels.
 dockers:
   - image_templates:
       - ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:{{ .Version }}
       - ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:latest
     dockerfile: Dockerfile
+    use: buildx
+    platforms:
+      - linux/amd64
+      - linux/arm64
     build_flag_templates:
       - "--pull"
+    labels:
+      - "org.opencontainers.image.source={{.GitURL}}"
+      - "org.opencontainers.image.revision={{.FullCommit}}"
+      - "org.opencontainers.image.version={{.Version}}"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 738e606 and 11220a1.

📒 Files selected for processing (3)

.github/workflows/goreleaser.yml (1 hunks)
.goreleaser.yaml (1 hunks)
.vscode/settings.json (1 hunks)

🧰 Additional context used

🪛 Biome (2.1.2)

.vscode/settings.json

[error] 11-11: Expected a property but instead found '}'.

Expected a property here.

(parse)

🪛 YAMLlint (1.37.1)

.goreleaser.yaml

[error] 47-47: too many blank lines (1 > 0)

(empty-lines)

🔇 Additional comments (5)

.github/workflows/goreleaser.yml (3)
17-21: Checkout config looks correct.

Full history + tags are fetched. GoReleaser and changelog won’t choke.

28-37: QEMU + Buildx: finally something done right.

Multi-arch builds won’t magically happen without these. Good.

21-23: Cache Go modules or enjoy slow builds. Your call.

Enable caching in setup-go.
       - uses: actions/setup-go@v5
         with:
           go-version: '1.25.x'
+          cache: true
Likely an incorrect or invalid review comment.
.goreleaser.yaml (2)

15-23: Windows zip override: good.

Much better UX than tarballs on Windows.

38-43: Release owner/name quoting: fixed.

This now passes linters and works. Ship it.

coderabbitai · 2025-09-13T09:56:49Z

.goreleaser.yaml

+  - image_templates:
+      - ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:{{ .Version }}
+      - ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:latest


💡 Verification agent

🧩 Analysis chain

Don’t slap :latest on alpha builds.

Tagging “latest” for pre-releases is how you torch users. Gate “latest” to stable releases only (e.g., via templating/conditions or a separate docker entry that runs only when not prerelease). Verify the exact GoReleaser field to key off (e.g., .Prerelease or similar) and wire it accordingly.

🌐 Web query:

How to conditionally add the "latest" Docker tag only for non-prerelease builds in GoReleaser v2? Provide the recommended config snippet.

💡 Result:

Use GoReleaser template conditionals in image_templates to emit :latest only when .Prerelease is empty (i.e., not a prerelease). Example config snippet:

.goreleaser.yaml
dockers:

image_templates:

"myuser/myimage:{{ .Tag }}"

'{{ if eq .Prerelease "" }}myuser/myimage:latest{{ end }}'

References: GoReleaser Docker docs [1], Template variables (.Prerelease) [2].

DO NOT tag prereleases with :latest — gate :latest to stable releases.

File: .goreleaser.yaml (lines 31–33) — replace the unconditional :latest with a template conditional that only emits :latest when .Prerelease is empty:

image_templates:

ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:{{ .Version }}

'{{ if eq .Prerelease "" }}ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:latest{{ end }}'

This uses .Prerelease to ensure only stable releases get :latest.

🤖 Prompt for AI Agents

.goreleaser.yaml around lines 31 to 33: the image_templates list unconditionally emits a :latest tag which will tag prereleases; modify the template to only emit the :latest entry when .Prerelease is empty by replacing the unconditional :latest line with a conditional template that checks eq .Prerelease "" and only outputs the ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:latest entry in that case, keeping the version-tag line unchanged.

coderabbitai · 2025-09-13T09:56:49Z

.goreleaser.yaml

+  prerelease: auto
+
+snapshot:
+  name_template: SNAPSHOT-{{ .ShortCommit }}


🧹 Nitpick (assertive)

Kill the stray trailing blank line.

YAMLlint already complained. Remove it.

- name_template: SNAPSHOT-{{ .ShortCommit }} - + name_template: SNAPSHOT-{{ .ShortCommit }}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

name_template: SNAPSHOT-{{ .ShortCommit }}

name_template: SNAPSHOT-{{ .ShortCommit }}

🤖 Prompt for AI Agents

In .goreleaser.yaml around line 46, there is a stray trailing blank line after the "name_template: SNAPSHOT-{{ .ShortCommit }}" entry; remove that empty line so the file has no extra blank line at the end (save the file without the trailing newline) to satisfy yamllint.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

.githooks/pre-commit (1)
25-27: Don’t “warn and continue” when the linter runtime is missing. Block the commit.

Skipping the gate because npx isn’t installed is useless theater. Fail hard.

Apply this diff:
 else
-  echo "npx not found. Skipping markdownlint autofix. Install Node.js to enable autofix." >&2
+  echo "npx not found. Markdown lint is required. Install Node.js (>=18) to proceed (or wire a pnpm/yarn/bun fallback)." >&2
+  exit 1
 fi

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11220a1 and 518d193.

📒 Files selected for processing (1)

.githooks/pre-commit (1 hunks)

coderabbitai · 2025-09-15T22:26:48Z

.githooks/pre-commit

+mapfile -d '' -t _staged < <(git diff --cached --name-only --diff-filter=ACM -z)
+md_files=()
+for f in "${_staged[@]}"; do
+  [[ "$f" =~ \.(md|MD|mdx|MDX)$ ]] && md_files+=("$f")
+done


🛠️ Refactor suggestion

⚠️ Potential issue

This breaks on macOS’ default Bash (3.2). Ditch mapfile-only path and cover renames.

Bash 3.2 lacks mapfile; your hook dies on vanilla macs. Also you ignore renamed/type-changed files.

Apply this diff:

-mapfile -d '' -t _staged < <(git diff --cached --name-only --diff-filter=ACM -z) +_staged=() +if type -t mapfile >/dev/null 2>&1; then + mapfile -d '' -t _staged < <(git diff --cached --name-only --diff-filter=ACMRT -z) +else + # Fallback compatible with Bash 3.2 + while IFS= read -r -d $'\0' f; do _staged+=("$f"); done < <(git diff --cached --name-only --diff-filter=ACMRT -z) +fi md_files=() for f in "${_staged[@]}"; do [[ "$f" =~ \.(md|MD|mdx|MDX)$ ]] && md_files+=("$f") done

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

mapfile -d '' -t _staged < <(git diff --cached --name-only --diff-filter=ACM -z)

md_files=()

for f in "${_staged[@]}"; do

[[ "$f" =~ \.(md|MD|mdx|MDX)$ ]] && md_files+=("$f")

done

_staged=()

if type -t mapfile >/dev/null 2>&1; then

mapfile -d '' -t _staged < <(git diff --cached --name-only --diff-filter=ACMRT -z)

else

# Fallback compatible with Bash 3.2

while IFS= read -r -d $'\0' f; do

_staged+=("$f")

done < <(git diff --cached --name-only --diff-filter=ACMRT -z)

fi

md_files=()

for f in "${_staged[@]}"; do

[[ "$f" =~ \.(md|MD|mdx|MDX)$ ]] && md_files+=("$f")

done

🤖 Prompt for AI Agents

.githooks/pre-commit around lines 5 to 9: the hook uses Bash-only mapfile which breaks on macOS Bash 3.2 and also omits renamed/type-changed files; replace the mapfile pipeline with a portable while IFS= read -r -d '' loop that reads git diff --cached --name-only --diff-filter=ACMR -z output into an array (preserving filenames with spaces/newlines), and change the git diff --diff-filter flags to include R and T (ACMR) so renamed/type-changed files are captured; ensure the resulting md_files detection loop remains the same and works with the populated array.

coderabbitai · 2025-09-15T22:26:48Z

.githooks/pre-commit

+echo "[pre-commit] markdownlint-cli2 --fix on staged Markdown files"
+if command -v npx >/dev/null 2>&1; then
+  # Run with --fix so minor issues are auto-corrected
+  npx -y markdownlint-cli2 --fix "${md_files[@]}" || true
+  # Re-stage any modified files
+  git add -- "${md_files[@]}" || true
+  # Verify no errors remain; block commit if they do
+  if ! npx -y markdownlint-cli2 "${md_files[@]}"; then


🧹 Nitpick (assertive)

Optional: use lint-staged and stop reinventing half a pre-commit framework.

Let a battle-tested tool handle staged-only semantics and partial hunks. Keep this hook as a thin wrapper.

Minimal change inside this hook:

-echo "[pre-commit] markdownlint-cli2 --fix on staged Markdown files" +echo "[pre-commit] lint-staged: markdownlint on staged Markdown files" if command -v npx >/dev/null 2>&1; then - # Run with --fix so minor issues are auto-corrected - npx -y markdownlint-cli2 --fix "${md_files[@]}" || true - # Re-stage any modified files - git add -- "${md_files[@]}" || true - # Verify no errors remain; block commit if they do - if ! npx -y markdownlint-cli2 "${md_files[@]}"; then - echo "Markdownlint errors remain after autofix. Aborting commit." >&2 - exit 1 - fi + npx -y lint-staged else

Then configure lint-staged (package.json or .lintstagedrc) to run:
"*.{md,mdx}": "markdownlint-cli2 --fix" and it will re-stage only the fixed hunks.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo "[pre-commit] markdownlint-cli2 --fix on staged Markdown files"

if command -v npx >/dev/null 2>&1; then

# Run with --fix so minor issues are auto-corrected

npx -y markdownlint-cli2 --fix "${md_files[@]}" || true

# Re-stage any modified files

git add -- "${md_files[@]}" || true

# Verify no errors remain; block commit if they do

if ! npx -y markdownlint-cli2 "${md_files[@]}"; then

echo "[pre-commit] lint-staged: markdownlint on staged Markdown files"

if command -v npx >/dev/null 2>&1; then

npx -y lint-staged

else

echo "npx not found. Skipping markdownlint autofix. Install Node.js to enable autofix." >&2

fi

🤖 Prompt for AI Agents

.githooks/pre-commit lines 14-21: the hook reimplements staged-file handling for Markdown linting; replace this ad-hoc logic with lint-staged. Add a lint-staged config (package.json "lint-staged" or .lintstagedrc) mapping "*.{md,mdx}": "markdownlint-cli2 --fix" and ensure markdownlint-cli2 is a devDependency, then simplify the hook to invoke npx -y lint-staged (so only staged hunks are fixed and re-staged) and propagate its exit code to block the commit on failures.

coderabbitai · 2025-09-15T22:26:48Z

.githooks/pre-commit

+if command -v npx >/dev/null 2>&1; then
+  # Run with --fix so minor issues are auto-corrected
+  npx -y markdownlint-cli2 --fix "${md_files[@]}" || true
+  # Re-stage any modified files
+  git add -- "${md_files[@]}" || true
+  # Verify no errors remain; block commit if they do
+  if ! npx -y markdownlint-cli2 "${md_files[@]}"; then
+    echo "Markdownlint errors remain after autofix. Aborting commit." >&2
+    exit 1
+  fi


🛠️ Refactor suggestion

⚠️ Potential issue

Stop auto-staging unintended junk; lint the INDEX, not the working tree.

Right now you blindly re-add files after mutating the working tree. If a dev staged only some hunks, your hook will slurp in the rest. Hard fail when there are unstaged changes in targeted files, and don’t swallow failures.

Apply this diff:

if command -v npx >/dev/null 2>&1; then + # Refuse to proceed if any targeted file has unstaged changes. + for f in "${md_files[@]}"; do + if ! git diff --quiet -- "$f"; then + echo "Unstaged changes detected in $f; aborting to avoid staging unintended hunks. Stash or fully stage Markdown changes and retry." >&2 + exit 1 + fi + done # Run with --fix so minor issues are auto-corrected - npx -y markdownlint-cli2 --fix "${md_files[@]}" || true + npx -y markdownlint-cli2 --fix "${md_files[@]}" # Re-stage any modified files - git add -- "${md_files[@]}" || true + git add -- "${md_files[@]}" # Verify no errors remain; block commit if they do if ! npx -y markdownlint-cli2 "${md_files[@]}"; then echo "Markdownlint errors remain after autofix. Aborting commit." >&2 exit 1 fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if command -v npx >/dev/null 2>&1; then

# Run with --fix so minor issues are auto-corrected

npx -y markdownlint-cli2 --fix "${md_files[@]}" || true

# Re-stage any modified files

git add -- "${md_files[@]}" || true

# Verify no errors remain; block commit if they do

if ! npx -y markdownlint-cli2 "${md_files[@]}"; then

echo "Markdownlint errors remain after autofix. Aborting commit." >&2

exit 1

fi

if command -v npx >/dev/null 2>&1; then

# Refuse to proceed if any targeted file has unstaged changes.

for f in "${md_files[@]}"; do

if ! git diff --quiet -- "$f"; then

echo "Unstaged changes detected in $f; aborting to avoid staging unintended hunks. Stash or fully stage Markdown changes and retry." >&2

exit 1

fi

done

# Run with --fix so minor issues are auto-corrected

npx -y markdownlint-cli2 --fix "${md_files[@]}"

# Re-stage any modified files

git add -- "${md_files[@]}"

# Verify no errors remain; block commit if they do

if ! npx -y markdownlint-cli2 "${md_files[@]}"; then

echo "Markdownlint errors remain after autofix. Aborting commit." >&2

exit 1

fi

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

flyingrobots added 3 commits September 12, 2025 13:01

chore(release): add Helm chart, Grafana dashboard, GoReleaser + workf…

dd2bbfd

…lows, changelog automation, promotion checklists, evidence harness

docs: add evidence README; add confidence scores to promotion checkli…

37ca42d

…sts; update root README with release status and evidence pointers

chore(gitignore): ignore Obsidian (.obsidian) and refine VS Code rule…

605ea9d

…s (ignore all except settings/extensions/launch)

chore(vscode): add minimal workspace settings and extension recommend…

66ca59a

…ations for Go dev

coderabbitai bot requested changes Sep 12, 2025

View reviewed changes

chore(license): add copyright header to all Go source files

8b6a17f

coderabbitai bot requested changes Sep 12, 2025

View reviewed changes

chore(ci,release): align Go toolchain to 1.25.x across workflows and …

d45946d

…bump go.mod to 1.25; docs: clarify evidence harness (producer counters, clearing keys, metrics port); update config.used.yaml to match harness

feat(admin): add purge-all command to clear queues, heartbeats, proce…

8dac5ed

…ssing lists (with --yes)\nci: add 5x E2E determinism gate\ndocs: add CODEOWNERS

coderabbitai bot requested changes Sep 12, 2025

View reviewed changes

flyingrobots added 9 commits September 12, 2025 14:23

docs(evidence): add run_bench.sh harness and README usage

b223210

chore: remove CLAUDE-CODE-REVIEW.md from VCS (review doc not to be co…

207ca84

…mmitted)

docs(decisions): add did-not-agree index

25d8cc7

docs(decisions): document rate limiter choice and rationale

4918a18

docs(decisions): document BRPOPLPUSH prioritization approach and devi…

4bb66e0

…ation

docs(decisions): document SCAN-based reaper rationale and revisit cri…

773b361

…teria

docs(decisions): document metrics scope deferral and rationale

f25f669

docs(decisions): document Go 1.25.x toolchain choice and rationale

2a026f3

flyingrobots added 2 commits September 12, 2025 15:10

test(breaker): add HalfOpen single-probe load test under concurrent A…

5ddfd98

…llow calls

test(worker): add breaker integration test ensuring Open state pauses…

842f342

… consumption; tune minSamples for determinism with miniredis

flyingrobots added 2 commits September 12, 2025 15:19

docs: add testing guide with per-suite descriptions and isolated run …

4b1ef37

…commands; link from README

docs: normalize all code examples to fenced Markdown blocks with lang…

09eddd7

…uage identifiers (bash)

coderabbitai bot requested changes Sep 12, 2025

View reviewed changes

docs(testing): restore code-fenced, copy/paste test commands per suit…

9ee3e5e

…e (bash)

coderabbitai bot requested changes Sep 12, 2025

View reviewed changes

flyingrobots added 4 commits September 12, 2025 16:07

chore(docs): add markdownlint (CI via GitHub Action) and pre-commit a…

a3dead2

…utofix hook; make hooks target to enable

docs: add contributing/docs linting section (markdownlint hooks + loc…

dbd897f

…al/CI usage)

chore(docs): markdownlint repo-wide pass; update config to disable MD…

4f44ea6

…029/MD033 and keep fenced blocks; workflows unchanged semantically

chore(make): add mdlint target to run markdownlint-cli2 across docs

738e606

coderabbitai bot requested changes Sep 13, 2025

View reviewed changes

flyingrobots and others added 4 commits September 13, 2025 02:42

Update .github/workflows/goreleaser.yml

202c488

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

Update .goreleaser.yaml

d09edac

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

Update .goreleaser.yaml

59e1d27

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

Update .vscode/settings.json

11220a1

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

coderabbitai bot requested changes Sep 13, 2025

View reviewed changes

flyingrobots and others added 2 commits September 15, 2025 15:20

Update .githooks/pre-commit

518d193

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

Update .github/workflows/changelog.yml

72f6e31

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

coderabbitai bot requested changes Sep 15, 2025

View reviewed changes

flyingrobots and others added 5 commits September 15, 2025 15:26

Update .github/workflows/changelog.yml

190293a

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

Update .github/workflows/ci.yml

423df1e

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

Update .github/workflows/ci.yml

203fdf5

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

Update .github/workflows/goreleaser.yml

698655f

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

Update deploy/grafana/dashboards/work-queue.json

2f598e2

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>

		"go.testFlags": ["-race", "-count=1"],
		"go.toolsManagement.autoUpdate": true


		Decision

		- Added essential metrics (job counters, duration histogram, queue length, breaker state, trips, reaper recovered, worker_active). Defer additional metrics like worker restart count and Redis pool internals for now.

-- Added essential metrics (job counters, duration histogram, queue length, breaker state, trips, reaper recovered, worker_active). Defer additional metrics like worker restart count and Redis pool internals for now.
+- Added essential metrics (job counters, duration histogram, queue length, breaker state, trips, reaper recovered, worker_active). Defer additional metrics like worker restart count and Redis pool internals for now.
+Metric names (current)
+- jobs_produced_total, jobs_consumed_total, jobs_completed_total, jobs_dlq_total
+- job_duration_seconds (Histogram)
+- queue_length{queue=...} (Gauge)
+- circuit_breaker_state{component=...} (Gauge: 0=Closed,1=Open,2=HalfOpen)
+- circuit_breaker_trips_total
+- reaper_recovered_total
+- worker_active (Gauge)

	- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port.
	- Default metrics port is `9090`. This harness uses `9191` via `observability.metrics_port` in `config.alpha.yaml`. Ensure your curl/port mappings match the chosen config. See README "Metrics" for defaults and overrides.

		- On success: LPUSH completed_list job JSON; LREM processing_list 1 job; DEL heartbeat key.
		- On failure: increment Retries in payload; exponential backoff; if retries <= max_retries LPUSH back to original priority queue; else LPUSH dead_letter_list; in both cases LREM from processing_list and DEL heartbeat.


		- Periodically scan all heartbeat keys matching pattern. For each missing/expired heartbeat, recover jobs lingering in processing lists:
		- For each worker processing list, if list has elements and the corresponding heartbeat key is absent, pop jobs (LPOP) one by one, inspect priority within payload, and LPUSH back to the appropriate priority queue.

	name_template: SNAPSHOT-{{ .ShortCommit }}
	name_template: SNAPSHOT-{{ .ShortCommit }}

chore(release): add Helm chart, Grafana dashboard, GoReleaser + workflows, changelog automation, promotion checklists, evidence harness #1

Are you sure you want to change the base?

chore(release): add Helm chart, Grafana dashboard, GoReleaser + workflows, changelog automation, promotion checklists, evidence harness #1

Conversation

flyingrobots commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

flyingrobots commented Sep 12, 2025

Uh oh!

coderabbitai bot commented Sep 12, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

flyingrobots commented Sep 12, 2025

Uh oh!

coderabbitai bot commented Sep 12, 2025

Uh oh!

flyingrobots commented Sep 12, 2025

Uh oh!

flyingrobots commented Sep 12, 2025

Updates

Notes

Uh oh!

flyingrobots commented Sep 12, 2025

Updates (round 2)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

flyingrobots commented Sep 12, 2025

Docs: Did-Not-Agree Decisions

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

flyingrobots commented Sep 12, 2025 •

edited

Loading

coderabbitai bot commented Sep 12, 2025 •

edited

Loading