-
Notifications
You must be signed in to change notification settings - Fork 0
chore(release): add Helm chart, Grafana dashboard, GoReleaser + workflows, changelog automation, promotion checklists, evidence harness #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…lows, changelog automation, promotion checklists, evidence harness
…sts; update root README with release status and evidence pointers
…s (ignore all except settings/extensions/launch)
Warning Rate limit exceeded@flyingrobots has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 58 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (4)
Summary by CodeRabbit
WalkthroughAdds CI/release automation (changelog + GoReleaser), upgrades Go to 1.25, introduces Grafana dashboard and new Prometheus metrics, extends admin CLI and admin APIs, enforces single-probe circuit-breaker half-open semantics, streams producer enqueueing, hardens worker/reaper Redis handling, adds tests, docs, hooks, linting, and many documentation/evidence artifacts. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Dev as Developer
participant GH as GitHub Actions
participant Repo as Repository
participant Chg as Changelog WF
participant Rel as GoReleaser WF
participant GHCR as GHCR
Dev->>GH: push tag vX.Y.Z or manual dispatch
GH->>Chg: trigger changelog job
GH->>Rel: trigger goreleaser job
rect rgba(220,240,255,0.12)
Chg->>Repo: checkout (full history & tags)
Chg->>Chg: setup Go 1.25 & install git-chglog
Chg->>Repo: generate CHANGELOG.md (git-chglog)
Chg->>Repo: git add/commit CHANGELOG.md (|| true)
Chg->>Repo: git push (non-fatal)
end
rect rgba(230,255,220,0.12)
Rel->>Repo: checkout (fetch-depth 0)
Rel->>Rel: setup Go 1.25
Rel->>GHCR: docker login (GITHUB_TOKEN)
Rel->>Rel: goreleaser release --clean
Rel->>GHCR: push images & publish artifacts
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
@coderabbitai review |
✅ Actions performedReview triggered.
|
…ations for Go dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 32
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.goreleaser.yaml (1)
45-46
: Remove trailing blank line.YAMLlint flags the extra empty line.
snapshot: name_template: SNAPSHOT-{{ .ShortCommit }} -
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (15)
.github/workflows/changelog.yml
(1 hunks).github/workflows/goreleaser.yml
(1 hunks).gitignore
(1 hunks).goreleaser.yaml
(1 hunks)README.md
(1 hunks)deploy/grafana/dashboards/work-queue.json
(1 hunks)docs/15_promotion_checklists.md
(1 hunks)docs/evidence/README.md
(1 hunks)docs/evidence/bench.json
(1 hunks)docs/evidence/ci_run.json
(1 hunks)docs/evidence/config.alpha.yaml
(1 hunks)docs/evidence/config.used.yaml
(1 hunks)docs/evidence/issues.json
(1 hunks)docs/evidence/metrics_after.txt
(1 hunks)docs/evidence/metrics_before.txt
(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/15_promotion_checklists.md
5-5: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
6-6: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
33-33: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
34-34: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
44-44: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
45-45: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
52-52: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
53-53: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above
(MD022, blanks-around-headings)
64-64: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
65-65: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above
(MD022, blanks-around-headings)
README.md
56-56: Bare URL used
(MD034, no-bare-urls)
59-59: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
64-64: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
docs/evidence/README.md
5-5: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
6-6: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
🪛 YAMLlint (1.37.1)
docs/evidence/config.used.yaml
[error] 20-20: too many spaces after colon
(colons)
[error] 22-22: too many spaces after colon
(colons)
docs/evidence/config.alpha.yaml
[error] 20-20: too many spaces after colon
(colons)
[error] 22-22: too many spaces after colon
(colons)
.goreleaser.yaml
[error] 38-38: too many spaces inside braces
(braces)
[error] 38-38: too many spaces inside braces
(braces)
[error] 39-39: too many spaces inside braces
(braces)
[error] 39-39: too many spaces inside braces
(braces)
[error] 45-45: too many blank lines (1 > 0)
(empty-lines)
🔇 Additional comments (7)
.gitignore (1)
35-36
: Obsidian ignore LGTM..goreleaser.yaml (1)
7-13
: Confirm version variable path.Ensure your main package exposes
var version string
so-X main.version={{.Version}}
sets it correctly.Would you like a quick grep script to verify the symbol is present in cmd/job-queue-system?
.github/workflows/goreleaser.yml (1)
20-21
: Pin Go toolchain input to matrix/default branch if needed.Go 1.24.x is fine if intentional; just flagging that CI and evidence show 1.24/1.25 variance.
Confirm you want releases built with 1.24.x while evidence artifacts show go1.25.0.
docs/evidence/metrics_after.txt (1)
18-18
: Evidence inconsistencies: produced/consumed/queue backlog and Go version.
- go_info shows go1.25.0 while CI uses 1.24.x.
- jobs_consumed_total=480, jobs_completed_total=480, but jobs_produced_total=0.
- queue_length low_priority=2020 suggests backlog. If producer metrics are disabled, note it.
Please confirm the run context and annotate README with expected counter relationships to avoid misinterpretation by reviewers.
Also applies to: 109-112, 121-121, 135-139
docs/evidence/config.used.yaml (1)
21-24
: Validate key patterns match implementation.Ensure processing/heartbeat keys align with code to prevent orphaned heartbeats.
I can generate a repo scan to find Redis key usage sites and cross-check patterns.
docs/evidence/metrics_before.txt (1)
1-136
: Baseline snapshot looks consistent.No issues spotted; good complement to metrics_after.
docs/evidence/README.md (1)
10-15
: Verify metrics port in curl command matches config.README uses 9191 while config.used.yaml shows 9090; if config.alpha.yaml differs, note it explicitly.
I can scan all configs for metrics_port and propose a single source of truth in docs.
Also applies to: 42-45
# VS Code: ignore all by default, allow key shared files | ||
.vscode/* | ||
!.vscode/extensions.json | ||
!.vscode/settings.json | ||
!.vscode/launch.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Selective .vscode tracking is fine; ensure no secrets slip in.
extensions.json/settings.json/launch.json can hold credentials (e.g., debug env). Recommend adding a pre-commit check to block secrets in these files.
🤖 Prompt for AI Agents
In .gitignore around lines 27 to 31, the reviewer asks for a safeguard because
tracked .vscode files (extensions.json, settings.json, launch.json) can contain
secrets; add a pre-commit hook or integrate a repository check (e.g., Husky +
lint-staged or a CI check) that scans those specific files for
secrets/credentials and rejects commits when patterns are detected. Implement
the hook to run a lightweight secret scanner (regex checks for common patterns
like AWS keys, private keys, auth tokens, env var declarations) only on
.vscode/extensions.json, .vscode/settings.json, and .vscode/launch.json, and
ensure it exits non-zero with a clear error message instructing the developer to
remove secrets or move them to environment variables; add documentation in
CONTRIBUTING.md about the check and how to bypass temporarily with
maintainer-approved workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
.vscode/extensions.json
(1 hunks).vscode/settings.json
(1 hunks)
🔇 Additional comments (1)
.vscode/settings.json (1)
2-8
: Good Go save-actions setupPer-file Go formatting and organize-imports on save look appropriate.
{ | ||
"recommendations": [ | ||
"golang.go" | ||
], | ||
"unwantedRecommendations": [] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add YAML/Helm tooling to match new chart assets
Recommend adding Kubernetes/Helm and YAML support to align with the new Helm chart and K8s manifests.
{
"recommendations": [
- "golang.go"
+ "golang.go",
+ "redhat.vscode-yaml",
+ "ms-kubernetes-tools.vscode-kubernetes-tools"
],
"unwantedRecommendations": []
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
{ | |
"recommendations": [ | |
"golang.go" | |
], | |
"unwantedRecommendations": [] | |
} | |
{ | |
"recommendations": [ | |
"golang.go", | |
"redhat.vscode-yaml", | |
"ms-kubernetes-tools.vscode-kubernetes-tools" | |
], | |
"unwantedRecommendations": [] | |
} |
🤖 Prompt for AI Agents
.vscode/extensions.json around lines 1 to 6: the workspace extensions list only
recommends the Go extension but the repo now contains Helm charts and Kubernetes
YAML manifests; update the recommendations array to include YAML and
Helm/Kubernetes tooling (for example add "redhat.vscode-yaml",
"ms-azuretools.vscode-helm" and "ms-kubernetes-tools.vscode-kubernetes-tools")
so users get proper linting, schema support and Helm assistance when opening the
workspace.
"gopls": { | ||
"ui.semanticTokens": true, | ||
"build.experimentalWorkspaceModule": true | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify gopls experimental flag; it may be obsolete
gopls’s build.experimentalWorkspaceModule has changed across versions; confirm it’s still valid with your pinned Go extension/gopls. If obsolete, drop it.
"gopls": {
"ui.semanticTokens": true,
- "build.experimentalWorkspaceModule": true
+ // "build.experimentalWorkspaceModule": true
},
To verify with upstream docs:
🌐 Web query:
Is gopls setting "build.experimentalWorkspaceModule" still supported in the latest gopls? If deprecated, what is the recommended default behavior or replacement?
💡 Result:
Short answer: No — it was deprecated and then removed. Use go.work (multi-module workspace support in the go command) instead.
Details / sources:
- Deprecation announced (replaced by go.work; removed in gopls v0.11.0+). [1]
- Release notes / removal and recommended replacement (create a go.work; e.g.
go work use -r .
). [2][3]
Sources:
[1] gopls v0.10.x release notes (deprecates experimentalWorkspaceModule in favor of go.work).
[2] gopls v0.12.0 notes (removes the setting; instructs using go.work / go work use -r .
).
[3] GitHub issue tracking removal of experimentalWorkspaceModule.
Remove deprecated gopls setting "build.experimentalWorkspaceModule"
Deprecated in gopls v0.10.x and removed by v0.12.0; use go.work (e.g. go work use -r .
) for multi-module workspaces. Action: delete the "build.experimentalWorkspaceModule" entry from .vscode/settings.json (lines 9–12); do not comment it out (JSON has no comments).
🤖 Prompt for AI Agents
In .vscode/settings.json around lines 9 to 12, the gopls setting
"build.experimentalWorkspaceModule" is deprecated and should be removed; edit
the file and delete the "build.experimentalWorkspaceModule": true entry (do not
comment it out), leaving the rest of the "gopls" object intact and ensure the
JSON remains valid.
"go.testFlags": ["-race", "-count=1"], | ||
"go.toolsManagement.autoUpdate": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Test flags trade-off
Running tests with -race and -count=1 by default is great for catching issues but slower. If dev feedback becomes a concern, consider moving -race to a separate profile/task.
🤖 Prompt for AI Agents
In .vscode/settings.json at lines 13-14 the default go.testFlags include
["-race", "-count=1"], which slows local feedback; remove "-race" from the
default go.testFlags and add a separate configuration/task for race-enabled
tests (or a VS Code test profile) so developers can run fast tests by default
while still having an easy opt-in for race detection; update settings.json to
set go.testFlags to ["-count=1"] and add a new task or profile (e.g., in
.vscode/tasks.json or launch configurations) named "Go: test (race)" that
invokes tests with ["-race","-count=1"] and document usage in the project
README.
@coderabbitai full review |
✅ Actions performedFull review triggered. |
…bump go.mod to 1.25; docs: clarify evidence harness (producer counters, clearing keys, metrics port); update config.used.yaml to match harness
Addressed CodeRabbit requests:\n\n- Aligned Go toolchain to 1.25.x across CI, release, and GoReleaser workflows; bumped go.mod to 1.25.0 (evidence had go_info=go1.25.0).\n- Confirmed version variable path: main has 'var version string' and ldflags '-X main.version' is wired.\n- Clarified evidence harness: noted admin bench does not increment jobs_produced_total; added instructions to clear Redis keys before running to avoid backlog; standardized metrics port to 9191 and updated docs/evidence/config.used.yaml.\n- Verified processing/heartbeat key patterns match implementation.\n\nIf you want, I can add a script or admin subcommand to purge all test keys to make the harness one-click.\n |
…ssing lists (with --yes)\nci: add 5x E2E determinism gate\ndocs: add CODEOWNERS
Updates
Notes
# Example purge-all
./bin/job-queue-system --role=admin --admin-cmd=purge-all --yes --config=config/config.yaml |
Updates (round 2)
If you want additional CODEOWNERS for SRE/Security, please share their GitHub handles (e.g., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
.github/workflows/ci.yml (1)
12-22
: Service config: drop unnecessary port mapping; keep healthcheck.For GitHub Actions services you don't need host port mapping; the runner can reach the service via
localhost:${mapped}
only if you map, but mapping isn't required—use the service name and container port or keep the healthcheck and remove the host mapping to avoid collisions on self-hosted runners.If you want host access, keep as-is; otherwise:
services: redis: image: redis:7-alpine - ports: - - 6379:6379 options: >- --health-cmd "redis-cli ping" --health-interval 5s --health-timeout 3s --health-retries 5And set
E2E_REDIS_ADDR: redis:6379
..github/workflows/release.yml (3)
20-41
: You’ve built two competing release pipelines. Kill one before it kills your releases.This workflow builds/pushes images and creates a GitHub Release on tag push. The new GoReleaser workflow does the same. You'll double-publish images and stomp release notes.
Options:
- Delete this workflow and rely solely on GoReleaser, or
- Restrict triggers (e.g., only prerelease tags here) or set
if: false
until migration is done.Minimal disable:
jobs: build-and-publish: + if: false # disabled; GoReleaser owns releases/images runs-on: ubuntu-latest
21-27
: Stop running “make tidy” in a release job.Don't mutate modules during release. Fetch deps only.
- - name: Build binary - run: | - make tidy - make build VERSION=${GITHUB_REF_NAME} + - name: Build binary + run: make build VERSION=${GITHUB_REF_NAME}
25-37
: Single-arch Docker build is not acceptable for users on arm64.If you insist on keeping this workflow, set up Buildx + QEMU and push multi-arch.
+ - name: Set up QEMU + uses: docker/setup-qemu-action@v3 + - name: Set up Buildx + uses: docker/setup-buildx-action@v3 - name: Docker build - run: | - docker build -t ghcr.io/${{ github.repository }}:${GITHUB_REF_NAME} -t ghcr.io/${{ github.repository }}:latest . + run: | + docker buildx build \ + --platform linux/amd64,linux/arm64 \ + -t ghcr.io/${{ github.repository }}:${GITHUB_REF_NAME} \ + -t ghcr.io/${{ github.repository }}:latest \ + --push .docs/evidence/README.md (1)
16-23
: More MD032: surround bullet lists under “Important notes” and “Notes”.Spacing. Do it.
-Important notes -- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress. -- To avoid stale backlog affecting evidence, clear test keys before running a bench: `redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter` and `redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL`. -- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port. +Important notes + +- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress. +- To avoid stale backlog affecting evidence, clear test keys before running a bench: `redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter` and `redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL`. +- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port. @@ -Notes -- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there. +Notes + +- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there.
♻️ Duplicate comments (8)
.github/workflows/goreleaser.yml (2)
17-20
: Fetch full history or GoReleaser will bork your changelog and versioning.You didn't fetch tags. GoReleaser needs them.
- - uses: actions/checkout@v4 + - uses: actions/checkout@v4 + with: + fetch-depth: 0
25-31
: Set up QEMU + Buildx so your multi-arch images aren’t fantasy.Without these, any multi-arch config in .goreleaser.yaml won’t build/push properly.
+ - name: Set up QEMU + uses: docker/setup-qemu-action@v3 + - name: Set up Buildx + uses: docker/setup-buildx-action@v3 - name: Login to GHCR uses: docker/login-action@v3README.md (3)
56-56
: Fix MD034: don’t dump bare URLs like it’s 1998.Wrap the PR URL.
-Release branch open for v0.4.0-alpha: see PR https://github.com/flyingrobots/go-redis-work-queue/pull/1 +Release branch open for v0.4.0-alpha: see PR <https://github.com/flyingrobots/go-redis-work-queue/pull/1>
63-66
: Same MD032 problem under “Evidence artifacts”.Space it out.
-Evidence artifacts (docs/evidence/): +Evidence artifacts (docs/evidence/): + - ci_run.json (CI URL), bench.json (throughput/latency), metrics_before/after.txt, config.alpha.yaml - -To reproduce evidence locally, see docs/evidence/README.md. + +To reproduce evidence locally, see docs/evidence/README.md.
58-62
: Fix MD032: lists must be surrounded by blank lines.Add spacing so linters stop whining.
-Promotion gates and confidence summary (details in docs/15_promotion_checklists.md): +Promotion gates and confidence summary (details in docs/15_promotion_checklists.md): + - Alpha → Beta: overall confidence ~0.85 (functional/observability/CI strong; perf and coverage improvements planned) - Beta → RC: overall confidence ~0.70 (needs controlled perf run, chaos tests, soak) - RC → GA: overall confidence ~0.70 (release flow ready; soak and rollback rehearsal pending) +docs/evidence/config.used.yaml (1)
20-22
: YAML lint violations: too many spaces after colon. Fix it.This garbage breaks linters.
high: "jobqueue:high_priority" - low: "jobqueue:low_priority" + low: "jobqueue:low_priority" processing_list_pattern: "jobqueue:worker:%s:processing" - heartbeat_key_pattern: "jobqueue:processing:worker:%s" + heartbeat_key_pattern: "jobqueue:processing:worker:%s"docs/evidence/README.md (2)
8-15
: Make “Reproduce locally” a real heading and surround the list.Follow basic Markdown rules.
-Reproduce locally +## Reproduce locally + 1) Ensure Redis is running on `localhost:6379` (e.g., `docker run -p 6379:6379 redis:7-alpine`) 2) Build binary: `make build` 3) Start worker: `./bin/job-queue-system --role=worker --config=docs/evidence/config.alpha.yaml` 4) In another terminal, run bench: `./bin/job-queue-system --role=admin --config=docs/evidence/config.alpha.yaml --admin-cmd=bench --bench-count=1000 --bench-rate=500 --bench-priority=low --bench-timeout=60s` 5) Capture metrics: `curl -sS localhost:9191/metrics | head -n 200 > docs/evidence/metrics_after.txt` +
1-7
: markdownlint: headings and lists need blank lines.Add spacing after the H1.
# Evidence for v0.4.0-alpha Promotion - - CI run: see `ci_run.json` (contains URL to the successful workflow run) - Bench JSON: `bench.json` (admin bench with 1000 jobs at 500 rps) - Config used: `config.alpha.yaml` - Metrics snapshots: `metrics_before.txt`, `metrics_after.txt` +
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (7)
.github/workflows/ci.yml
(1 hunks).github/workflows/goreleaser.yml
(1 hunks).github/workflows/release.yml
(1 hunks)README.md
(2 hunks)docs/evidence/README.md
(1 hunks)docs/evidence/config.used.yaml
(1 hunks)go.mod
(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/evidence/README.md
17-17: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
22-22: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
README.md
56-56: Bare URL used
(MD034, no-bare-urls)
59-59: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
64-64: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
🪛 YAMLlint (1.37.1)
docs/evidence/config.used.yaml
[error] 20-20: too many spaces after colon
(colons)
[error] 22-22: too many spaces after colon
(colons)
🔇 Additional comments (4)
go.mod (1)
50-56
: Sanity-check ultra-recent google genproto pins.These pseudo-versions (20250825…) are fine if intentional, but they drag in massive trees. Confirm you really need them; otherwise prune by running
go mod tidy -compat=1.25
locally and re-vetting..github/workflows/goreleaser.yml (1)
18-20
: Enable Go module cache.Shave minutes off releases.
- uses: actions/setup-go@v5 with: - go-version: '1.25.x' + go-version: '1.25.x' + cache: trueLikely an incorrect or invalid review comment.
README.md (1)
21-21
: Good: explicit Go 1.25+.Alignment with toolchain and CI looks consistent.
docs/evidence/config.used.yaml (1)
42-44
: Port consistency check with docs.This file uses metrics_port 9191. Ensure README and evidence curl commands consistently use 9191 or clearly call out overrides.
- name: Run GoReleaser | ||
uses: goreleaser/goreleaser-action@v6 | ||
with: | ||
distribution: goreleaser | ||
version: latest | ||
args: release --clean | ||
env: | ||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Scope permissions narrowly and surface attestations.
Consider adding provenance/signing (SLSA/cosign) and setting tighter permissions per step. Not blocking, but you’ll thank me later.
🤖 Prompt for AI Agents
.github/workflows/goreleaser.yml lines 31-38: tighten the step permissions and
enable provenance/signing for releases by (1) adding a top-level permissions
block that limits the job to the minimum rights (e.g., contents: read, id-token:
write for OIDC token exchange, and only add packages or actions permissions if
absolutely required), (2) granting the goreleaser step only the specific
environment variables/secrets it needs instead of full GITHUB_TOKEN scope, and
(3) enabling artifact provenance and signing by calling goreleaser with
provenance flags and/or invoking a cosign/sigstore step after build (or
configure goreleaser to sign) using OIDC (id-token) and the COSIGN_* secrets so
releases are attested and signed. Ensure the workflow creates and exposes
attestations (SLSA provenance) and that any tokens/secrets are minimal and
scoped.
module github.com/flyingrobots/go-redis-work-queue | ||
|
||
go 1.24.0 | ||
go 1.25.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go.mod “go” directive is invalid; use major.minor, not patch.
go 1.25.0
will be rejected. The directive must be go 1.25
.
-go 1.25.0
+go 1.25
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
go 1.25.0 | |
go 1.25 |
🤖 Prompt for AI Agents
In go.mod around lines 3 to 3, the go directive currently uses a patch version
("go 1.25.0") which is invalid; change it to a major.minor form ("go 1.25").
Update the line to remove the patch component so the file reads the supported
directive format.
…worker ID uniqueness and pause via config; better error handling; cancel-safe timers\nfeat(metrics): add circuit_breaker_trips_total and reaper_recovered_total\nfeat(producer): stream file scanning + path validation\nci: tests updated for new breaker behavior
Docs: Did-Not-Agree DecisionsPer request, I documented deviations/deferrals under
These files include rationale, tradeoffs, and revisit criteria. Happy to adjust or add more entries as reviews continue. |
… consumption; tune minSamples for determinism with miniredis
…commands; link from README
…uage identifiers (bash)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 36
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (15)
internal/producer/producer.go (2)
95-119
: Rate limiter is racy across producers; make INCR+EXPIRE atomic and avoid herd-sleep.Multiple producers can miss setting TTL. Use a Lua script to INCR and set EX with NX in one shot; then sleep based on returned TTL once, not per overage.
I can provide a small Lua +
EvalSha
helper if you want it wired here.
70-73
: Fail fast — do not LPush to an empty Redis key.Producer and reaper both fall back to cfg.Worker.Queues[default] without ensuring it’s set; if missing you’ll LPush to "" and no worker will consume. Consumers skip empty keys (internal/worker/worker.go:89–93), so this silently drops/loses jobs.
- Fix producer (internal/producer/producer.go:69–73): after resolving key, if key == "" return an error. (Your suggested diff is correct.)
- Fix reaper (internal/reaper/reaper.go:66–70): apply the same guard — never LPush to an empty dest (log+metric or skip if you can’t fail the process).
- Add startup config validation: ensure cfg.Worker.Queues contains entries for all priorities and that Producer.DefaultPriority maps to a non-empty queue key.
- Add tests (unit/e2e) to cover missing mappings.
internal/obs/metrics.go (1)
69-78
: Your metrics HTTP server has no timeouts. That’s amateur hour.Add sane timeouts to avoid trivial slowloris issues.
import ( "fmt" "net/http" + "time" @@ - srv := &http.Server{Addr: fmt.Sprintf(":%d", cfg.Observability.MetricsPort), Handler: mux} + srv := &http.Server{ + Addr: fmt.Sprintf(":%d", cfg.Observability.MetricsPort), + Handler: mux, + ReadHeaderTimeout: 5 * time.Second, + ReadTimeout: 10 * time.Second, + WriteTimeout: 10 * time.Second, + IdleTimeout: 120 * time.Second, + }internal/reaper/reaper.go (2)
59-75
: Non-atomic RPOP + LPUSH can lose jobs on crash. You’re running a queue, not a shredder. Fix it.Crash between RPOP and LPUSH = payload disappears. Use a staging list with RPOPLPUSH + LMOVE so recovery is crash-safe.
- for { - payload, err := r.rdb.RPop(ctx, plist).Result() - if err == redis.Nil { break } - if err != nil { r.log.Warn("reaper rpop error", obs.Err(err)); break } - job, err := queue.UnmarshalJob(payload) - if err != nil { - continue - } - prio := job.Priority - dest := r.cfg.Worker.Queues[prio] - if dest == "" { dest = r.cfg.Worker.Queues[r.cfg.Producer.DefaultPriority] } - if err := r.rdb.LPush(ctx, dest, payload).Err(); err != nil { - r.log.Error("requeue failed", obs.Err(err)) - } else { - obs.ReaperRecovered.Inc() - r.log.Warn("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID)) - } - } + staging := fmt.Sprintf("jobqueue:reaper:%s:staging", workerID) + for { + // Atomic pop from processing -> staging + payload, err := r.rdb.RPopLPush(ctx, plist, staging).Result() + if err == redis.Nil { break } + if err != nil { r.log.Warn("reaper rpoplpush error", obs.Err(err)); break } + + job, err := queue.UnmarshalJob(payload) + if err != nil { + // leave payload in staging; next scan can handle or we can drop with LREM policy if desired + r.log.Warn("bad job payload left in staging", obs.Err(err)) + continue + } + prio := job.Priority + dest := r.cfg.Worker.Queues[prio] + if dest == "" { dest = r.cfg.Worker.Queues[r.cfg.Producer.DefaultPriority] } + + // Atomic move staging -> destination + if _, err := r.rdb.LMove(ctx, staging, dest, "RIGHT", "LEFT").Result(); err != nil { + r.log.Error("requeue move failed", obs.Err(err)) + // payload remains in staging; will be retried next cycle + continue + } + obs.ReaperRecovered.Inc() + r.log.Info("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID)) + }
54-55
: You’re ignoring EXISTS errors. That’s sloppy and dangerous.A transient Redis error shouldn’t be treated like “worker dead.”
- exists, _ := r.rdb.Exists(ctx, hbKey).Result() - if exists == 1 { continue } // worker healthy + exists, err := r.rdb.Exists(ctx, hbKey).Result() + if err != nil { + r.log.Warn("reaper heartbeat exists check error", obs.Err(err)) + continue + } + if exists == 1 { continue } // worker healthyinternal/config/config.go (3)
124-128
: Your default metrics port conflicts with your docs. Pick one and stick to it.Docs say 9191; code ships 9090. Stop gaslighting your operators.
- MetricsPort: 9090, + MetricsPort: 9191,
177-181
: Set viper default to match the new port. Consistency isn’t optional.- v.SetDefault("observability.metrics_port", def.Observability.MetricsPort) + v.SetDefault("observability.metrics_port", def.Observability.MetricsPort)(Ensure def.Observability.MetricsPort reflects 9191 after the change above.)
200-226
: Validate the new breaker_pause and sampling interval. Don’t accept garbage.Missing constraints invite foot-guns.
func Validate(cfg *Config) error { @@ if cfg.Worker.BRPopLPushTimeout <= 0 || cfg.Worker.BRPopLPushTimeout > cfg.Worker.HeartbeatTTL/2 { return fmt.Errorf("worker.brpoplpush_timeout must be >0 and <= heartbeat_ttl/2") } + if cfg.Worker.BreakerPause <= 0 { + return fmt.Errorf("worker.breaker_pause must be > 0") + } + if !strings.Contains(cfg.Worker.HeartbeatKeyPattern, "%s") { + return fmt.Errorf("worker.heartbeat_key_pattern must contain %%s") + } + if !strings.Contains(cfg.Worker.ProcessingListPattern, "%s") { + return fmt.Errorf("worker.processing_list_pattern must contain %%s") + } if cfg.Producer.RateLimitPerSec < 0 { return fmt.Errorf("producer.rate_limit_per_sec must be >= 0") } if cfg.Observability.MetricsPort <= 0 || cfg.Observability.MetricsPort > 65535 { return fmt.Errorf("observability.metrics_port must be 1..65535") } + if cfg.Observability.QueueSampleInterval <= 0 { + return fmt.Errorf("observability.queue_sample_interval must be > 0") + } return nil }README.md (1)
35-41
: Ports are WRONG. README says 9090; config/evidence use 9191. Pick one.You already standardized on 9191 elsewhere. Fix Metrics/Health/Readiness to 9191 here to stop gaslighting your users.
Apply:
-- Prometheus metrics exposed at http://localhost:9090/metrics by default +- Prometheus metrics exposed at http://localhost:9191/metrics by default @@ -- Liveness: http://localhost:9090/healthz returns 200 when the process is up -- Readiness: http://localhost:9090/readyz returns 200 only when Redis is reachable +- Liveness: http://localhost:9191/healthz returns 200 when the process is up +- Readiness: http://localhost:9191/readyz returns 200 only when Redis is reachabledocs/evidence/README.md (1)
24-41
: More MD032: surround lists and code fences with blank lines.Fix spacing around “Automated harness” bullets and the code block.
-Automated harness +## Automated harness + - A convenience script `docs/evidence/run_bench.sh` automates the above steps. - Default params: COUNT=1000, RATE=500, PRIORITY=low, CONFIG=docs/evidence/config.alpha.yaml, PURGE=1. - Example: ```bash bash docs/evidence/run_bench.sh # or override COUNT=2000 RATE=1000 PRIORITY=low bash docs/evidence/run_bench.shOutputs are written under
docs/evidence/run_YYYYmmdd_HHMMSS/
:
+
- worker.log, worker.pid
- metrics_before.txt, metrics_after.txt
- stats_before.json, stats_after.json
- bench.json
</blockquote></details> <details> <summary>internal/breaker/breaker.go (1)</summary><blockquote> `108-116`: **Ensure lastTransition is updated before returning in all HalfOpen paths.** Minor, but be explicit: set lastTransition immediately upon state change before touching flags to avoid future footguns. Also, add a comment clarifying probe-completion semantics. ```diff - if ok { - cb.state = Closed - } else { - cb.state = Open - } - // the single probe completed; allow a future probe after cooldown or next Allow - cb.halfOpenInFlight = false - cb.lastTransition = now + if ok { + cb.state = Closed + } else { + cb.state = Open + } + cb.lastTransition = now + // The single HalfOpen probe completed; clear in-flight to allow a future probe on next HalfOpen window. + cb.halfOpenInFlight = false
cmd/job-queue-system/main.go (2)
85-92
: Do NOT start the HTTP/metrics server in admin mode. It’s pointless and can collide with a running worker.You’re binding to the metrics port for a one-shot CLI. If that port is already in use, your admin command dies before doing anything. Gate the HTTP server (and the queue-length updater) behind role != "admin".
Example fix (outside the changed hunk):
// Before creating httpSrv / starting queue length updater: if role != "admin" { httpSrv := obs.StartHTTPServer(cfg, readyCheck) defer func() { _ = httpSrv.Shutdown(context.Background()) }() // Background metrics: queue lengths obs.StartQueueLengthUpdater(ctx, cfg, rdb, logger) }
142-149
: Stop abusing default; add a proper 'admin' case.This default+if dance is sloppy and confuses static analysis. Add a real case and let default handle unknown roles.
- default: - // admin role - if role == "admin" { - runAdmin(ctx, cfg, rdb, logger, adminCmd, adminQueue, adminN, adminYes, benchCount, benchRate, benchPriority, benchTimeout) - return - } - logger.Fatal("unknown role", obs.String("role", role)) + case "admin": + runAdmin(ctx, cfg, rdb, logger, adminCmd, adminQueue, adminN, adminYes, benchCount, benchRate, benchPriority, benchTimeout) + return + default: + logger.Fatal("unknown role", obs.String("role", role))internal/worker/worker.go (2)
51-70
: Circuit-breaker trip counting is racy. Centralize transition detection in this goroutine.Multiple workers read state, then Record, then read state again — you double-count on contention. Track last state here and increment exactly once per transition to Open.
- go func() { - ticker := time.NewTicker(2 * time.Second) - defer ticker.Stop() - for { - select { - case <-ctx.Done(): - return - case <-ticker.C: - switch w.cb.State() { - case breaker.Closed: - obs.CircuitBreakerState.Set(0) - case breaker.HalfOpen: - obs.CircuitBreakerState.Set(1) - case breaker.Open: - obs.CircuitBreakerState.Set(2) - } - } - } - }() + go func() { + ticker := time.NewTicker(2 * time.Second) + defer ticker.Stop() + last := w.cb.State() + for { + select { + case <-ctx.Done(): + return + case <-ticker.C: + cur := w.cb.State() + if cur != last && cur == breaker.Open { + obs.CircuitBreakerTrips.Inc() + } + last = cur + switch cur { + case breaker.Closed: + obs.CircuitBreakerState.Set(0) + case breaker.HalfOpen: + obs.CircuitBreakerState.Set(1) + case breaker.Open: + obs.CircuitBreakerState.Set(2) + } + } + } + }()
110-114
: Why are you stuffing the entire payload into the heartbeat value?Unless some code reads that value (show me where), it’s wasted memory and bandwidth. Store a tiny token instead.
If unused, change to:
- _ = w.rdb.Set(ctx, hbKey, payload, w.cfg.Worker.HeartbeatTTL).Err() + _ = w.rdb.Set(ctx, hbKey, "1", w.cfg.Worker.HeartbeatTTL).Err()
♻️ Duplicate comments (4)
.github/workflows/ci.yml (1)
24-49
: Stop mutating go.mod in CI, pin govulncheck, and enable caching. This is basic hygiene.You’re still doing the “latest” nonsense and
go mod tidy
in CI. Reproducibility matters.- name: Setup Go uses: actions/setup-go@v5 with: - go-version: '1.25.x' + go-version: '1.25.x' + cache: true - - name: Tidy - run: go mod tidy + - name: Modules (download only; don’t mutate) + run: go mod download @@ - - name: Build - run: go build ./... + - name: Build + run: go build -mod=readonly ./... @@ - - name: Test + - name: Test env: - E2E_REDIS_ADDR: localhost:6379 - run: go test ./... -race -count=1 + E2E_REDIS_ADDR: 127.0.0.1:6379 + run: go test -mod=readonly ./... -race -count=1 @@ - - name: Govulncheck - run: | - go install golang.org/x/vuln/cmd/govulncheck@latest - $(go env GOPATH)/bin/govulncheck ./... + - name: Govulncheck + uses: golang/govulncheck-action@v1 + with: + go-version-input: '1.25.x' + vulncheck-version: 'v1.1.3'README.md (2)
58-59
: Stop using bare URLs (MD034).Wrap the PR URL.
-Release branch open for v0.4.0-alpha: see PR https://github.com/flyingrobots/go-redis-work-queue/pull/1 +Release branch open for v0.4.0-alpha: see PR <https://github.com/flyingrobots/go-redis-work-queue/pull/1>
60-64
: Blank lines around lists (MD032).Add spacing above and below the block.
-Promotion gates and confidence summary (details in docs/15_promotion_checklists.md): +Promotion gates and confidence summary (details in docs/15_promotion_checklists.md): + - Alpha → Beta: overall confidence ~0.85 (functional/observability/CI strong; perf and coverage improvements planned) - Beta → RC: overall confidence ~0.70 (needs controlled perf run, chaos tests, soak) - RC → GA: overall confidence ~0.70 (release flow ready; soak and rollback rehearsal pending) +docs/evidence/README.md (1)
1-6
: We already told you: blank lines and proper heading. Do it.Same MD022/MD032 mess. Convert “Reproduce locally” into a heading and add spacing. You ignored this once.
Apply:
# Evidence for v0.4.0-alpha Promotion - -- CI run: see `ci_run.json` (contains URL to the successful workflow run) -- Bench JSON: `bench.json` (admin bench with 1000 jobs at 500 rps) -- Config used: `config.alpha.yaml` -- Metrics snapshots: `metrics_before.txt`, `metrics_after.txt` + +- CI run: see `ci_run.json` (contains URL to the successful workflow run) +- Bench JSON: `bench.json` (admin bench with 1000 jobs at 500 rps) +- Config used: `config.alpha.yaml` +- Metrics snapshots: `metrics_before.txt`, `metrics_after.txt` -Reproduce locally +## Reproduce locally + 1) Ensure Redis is running on `localhost:6379` (e.g., `docker run -p 6379:6379 redis:7-alpine`) 2) Build binary: `make build` 3) Start worker: `./bin/job-queue-system --role=worker --config=docs/evidence/config.alpha.yaml` 4) In another terminal, run bench: `./bin/job-queue-system --role=admin --config=docs/evidence/config.alpha.yaml --admin-cmd=bench --bench-count=1000 --bench-rate=500 --bench-priority=low --bench-timeout=60s` 5) Capture metrics: `curl -sS localhost:9191/metrics | head -n 200 > docs/evidence/metrics_after.txt`
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (21)
.github/CODEOWNERS
(1 hunks).github/workflows/ci.yml
(2 hunks)README.md
(2 hunks)cmd/job-queue-system/main.go
(3 hunks)docs/did-not-agree/README.md
(1 hunks)docs/did-not-agree/go-toolchain-version.md
(1 hunks)docs/did-not-agree/metrics-scope.md
(1 hunks)docs/did-not-agree/priority-fetching-brpoplpush.md
(1 hunks)docs/did-not-agree/rate-limiter-fixed-window.md
(1 hunks)docs/did-not-agree/reaper-scan-vs-registry.md
(1 hunks)docs/evidence/README.md
(1 hunks)docs/evidence/run_bench.sh
(1 hunks)internal/admin/admin.go
(2 hunks)internal/breaker/breaker.go
(4 hunks)internal/breaker/breaker_load_test.go
(1 hunks)internal/breaker/breaker_test.go
(2 hunks)internal/config/config.go
(4 hunks)internal/obs/metrics.go
(2 hunks)internal/producer/producer.go
(3 hunks)internal/reaper/reaper.go
(2 hunks)internal/worker/worker.go
(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
internal/breaker/breaker_load_test.go (1)
internal/breaker/breaker.go (3)
State
(10-10)Closed
(13-13)Open
(15-15)
internal/admin/admin.go (1)
internal/config/config.go (3)
Config
(75-81)Worker
(31-44)Producer
(46-54)
internal/reaper/reaper.go (2)
internal/obs/logger.go (2)
Err
(31-31)String
(28-28)internal/obs/metrics.go (1)
ReaperRecovered
(55-58)
internal/config/config.go (1)
internal/worker/worker.go (1)
Worker
(20-26)
cmd/job-queue-system/main.go (2)
internal/admin/admin.go (3)
PurgeAll
(219-263)Bench
(108-156)StatsKeys
(169-214)internal/obs/logger.go (1)
Err
(31-31)
internal/producer/producer.go (2)
internal/config/config.go (1)
Producer
(46-54)internal/queue/job.go (1)
NewJob
(20-31)
internal/worker/worker.go (4)
internal/breaker/breaker.go (4)
New
(36-38)CircuitBreaker
(24-34)State
(10-10)Open
(15-15)internal/config/config.go (2)
Worker
(31-44)CircuitBreaker
(56-61)internal/obs/metrics.go (2)
JobProcessingDuration
(38-42)CircuitBreakerTrips
(51-54)internal/obs/logger.go (3)
Err
(31-31)String
(28-28)Int
(29-29)
🪛 Shellcheck (0.10.0)
docs/evidence/run_bench.sh
[warning] 50-50: i appears unused. Verify use (or export if used externally).
(SC2034)
🪛 markdownlint-cli2 (0.17.2)
docs/did-not-agree/go-toolchain-version.md
4-4: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
7-7: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
11-11: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
14-14: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
17-17: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
docs/did-not-agree/reaper-scan-vs-registry.md
4-4: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
7-7: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
11-11: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
14-14: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
17-17: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
docs/did-not-agree/README.md
4-4: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
7-7: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
docs/did-not-agree/priority-fetching-brpoplpush.md
4-4: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
7-7: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
11-11: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
15-15: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
18-18: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
docs/did-not-agree/rate-limiter-fixed-window.md
4-4: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
7-7: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
11-11: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
14-14: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
18-18: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
docs/evidence/README.md
17-17: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
22-22: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
37-37: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
README.md
26-26: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1
(MD005, list-indent)
26-26: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
27-27: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1
(MD005, list-indent)
27-27: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
28-28: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1
(MD005, list-indent)
28-28: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
29-29: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
29-29: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
30-30: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
30-30: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
58-58: Bare URL used
(MD034, no-bare-urls)
61-61: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
66-66: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
🔇 Additional comments (9)
internal/producer/producer.go (2)
56-56
: Streaming enqueue during WalkDir: good.Lower memory, simpler flow. Keep it.
1-1
: Standardize license headers — add SPDX and a single, correct copyright holderRepo-wide header check shows many files contain "Copyright 2025 James Ross" (e.g. internal/producer/producer.go, internal/worker/worker.go, internal/redisclient/client.go, test/e2e/e2e_test.go). This is sloppy — fix it.
- Action: Replace top-of-file comments across the repo with an SPDX-License-Identifier matching the project license and a single canonical copyright owner (organization or "The go-redis-work-queue Authors"). Do not assume the license; confirm the repo’s license first.
- Verify: run rg -n "SPDX-License-Identifier|Copyright" -C2 and update every file returned.
internal/config/config.go (1)
101-106
: Heartbeat/processing patterns look reversed vs. reaper scan. Verify now.You scan “jobqueue:worker:*:processing” but your default heartbeat is “jobqueue:processing:worker:%s”. If that’s intentional, fine—prove it’s consistent everywhere.
README.md (2)
52-54
: Docker mapping must match the metrics port.Map 9191 if that’s the default. This is basic.
Apply:
-- Run: docker run --rm -p 9090:9090 --env-file env.list job-queue-system:latest --role=all +- Run: docker run --rm -p 9191:9191 --env-file env.list job-queue-system:latest --role=all
65-68
: Add blank line before the "Evidence artifacts" list and update the sample-config link to config/config.example.yaml
- MD032: insert a blank line between "Evidence artifacts (docs/evidence/):" and the list (README.md ≈ lines 65–68).
- Update the "sample configuration" mention (≈ line 10) to link to config/config.example.yaml — that file already exists in the repo.
-Evidence artifacts (docs/evidence/): +Evidence artifacts (docs/evidence/): + - ci_run.json (CI URL), bench.json (throughput/latency), metrics_before/after.txt, config.alpha.yaml - -To reproduce evidence locally, see docs/evidence/README.md. + +To reproduce evidence locally, see docs/evidence/README.md.Likely an incorrect or invalid review comment.
internal/breaker/breaker_test.go (1)
1-1
: License header likely inconsistent with repo.“Copyright 2025 James Ross” in a company/repo project? Use the project’s SPDX header and owner.
Apply:
-// Copyright 2025 James Ross +// SPDX-License-Identifier: Apache-2.0 +// Copyright (c) 2025 The go-redis-work-queue Authorsinternal/breaker/breaker_load_test.go (1)
1-1
: License header: align with project.Same problem as the other test file.
-// Copyright 2025 James Ross +// SPDX-License-Identifier: Apache-2.0 +// Copyright (c) 2025 The go-redis-work-queue Authorscmd/job-queue-system/main.go (2)
42-42
: Flag help updated correctly. Keep help text and README in sync.Looks fine. Ensure README and docs reflect the full set of admin commands.
179-183
: stats-keys output shape is sane.JSON-encoding the result is exactly what we need for evidence harnessing.
case "purge-all": | ||
if !yes { logger.Fatal("refusing to purge without --yes") } | ||
n, err := admin.PurgeAll(ctx, cfg, rdb) | ||
if err != nil { logger.Fatal("admin purge-all error", obs.Err(err)) } | ||
fmt.Printf("purged %d keys\n", n) | ||
case "bench": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add a dry-run for purge-all and structured log.
Dangerous command. Offer a --dry-run to print the keys that would be removed, and log the deletion count via zap too.
- n, err := admin.PurgeAll(ctx, cfg, rdb)
+ n, err := admin.PurgeAll(ctx, cfg, rdb)
if err != nil { logger.Fatal("admin purge-all error", obs.Err(err)) }
- fmt.Printf("purged %d keys\n", n)
+ logger.Info("purge-all completed", obs.Int("deleted_keys", int(n)))
+ fmt.Printf("purged %d keys\n", n)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
case "purge-all": | |
if !yes { logger.Fatal("refusing to purge without --yes") } | |
n, err := admin.PurgeAll(ctx, cfg, rdb) | |
if err != nil { logger.Fatal("admin purge-all error", obs.Err(err)) } | |
fmt.Printf("purged %d keys\n", n) | |
case "bench": | |
case "purge-all": | |
if !yes { logger.Fatal("refusing to purge without --yes") } | |
n, err := admin.PurgeAll(ctx, cfg, rdb) | |
if err != nil { logger.Fatal("admin purge-all error", obs.Err(err)) } | |
logger.Info("purge-all completed", obs.Int("deleted_keys", int(n))) | |
fmt.Printf("purged %d keys\n", n) | |
case "bench": |
🤖 Prompt for AI Agents
In cmd/job-queue-system/main.go around lines 169 to 174, the "purge-all" branch
must support a --dry-run and emit a structured zap log of the deletion count:
add a boolean dryRun flag check (e.g. from CLI flags) and when dryRun is true
call admin.PurgeAll with a mode or new helper that returns the list of keys to
be removed without deleting them, print those keys to stdout (one per line) and
do not perform deletion; when dryRun is false keep the existing delete behavior
but after successful deletion log the count using the zap logger with a
structured field like logger.Info("purged keys", zap.Int("count", n)) instead of
printing only to stdout or using unstructured logger.Fatal; propagate and handle
errors the same way as before.
obs.ReaperRecovered.Inc() | ||
r.log.Warn("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
“Requeued” isn’t a warning. Use Info.
Don’t pollute warning signal with normal operation.
- r.log.Warn("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID))
+ r.log.Info("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
obs.ReaperRecovered.Inc() | |
r.log.Warn("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID)) | |
} | |
obs.ReaperRecovered.Inc() | |
r.log.Info("requeued abandoned job", obs.String("id", job.ID), obs.String("to", dest), obs.String("trace_id", job.TraceID), obs.String("span_id", job.SpanID)) | |
} |
🤖 Prompt for AI Agents
In internal/reaper/reaper.go around lines 72 to 74, the log call for a
successfully requeued abandoned job is using Warn which is incorrect for normal
operation; change the logging level to Info (e.g., replace r.log.Warn(...) with
r.log.Info(...)) while preserving the same message and obs fields so the event
remains recorded without signalling a warning.
host, _ := os.Hostname() | ||
pid := os.Getpid() | ||
now := time.Now().UnixNano() | ||
randSfx := fmt.Sprintf("%04x", time.Now().UnixNano()&0xffff) | ||
base := fmt.Sprintf("%s-%d-%d-%s", host, pid, now, randSfx) | ||
return &Worker{cfg: cfg, rdb: rdb, log: log, cb: cb, baseID: base} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Stop pretending time.Now() is randomness. Use crypto/rand for the suffix.
This ID doesn’t need to be cryptographic, but at least don’t generate two nanos and mask bits like it’s 1999.
+ "crypto/rand"
+ "encoding/hex"
@@
- now := time.Now().UnixNano()
- randSfx := fmt.Sprintf("%04x", time.Now().UnixNano()&0xffff)
- base := fmt.Sprintf("%s-%d-%d-%s", host, pid, now, randSfx)
+ now := time.Now().UnixNano()
+ rb := make([]byte, 2)
+ _, _ = rand.Read(rb)
+ randSfx := hex.EncodeToString(rb)
+ base := fmt.Sprintf("%s-%d-%d-%s", host, pid, now, randSfx)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
host, _ := os.Hostname() | |
pid := os.Getpid() | |
now := time.Now().UnixNano() | |
randSfx := fmt.Sprintf("%04x", time.Now().UnixNano()&0xffff) | |
base := fmt.Sprintf("%s-%d-%d-%s", host, pid, now, randSfx) | |
return &Worker{cfg: cfg, rdb: rdb, log: log, cb: cb, baseID: base} | |
} | |
host, _ := os.Hostname() | |
pid := os.Getpid() | |
now := time.Now().UnixNano() | |
rb := make([]byte, 2) | |
_, _ = rand.Read(rb) | |
randSfx := hex.EncodeToString(rb) | |
base := fmt.Sprintf("%s-%d-%d-%s", host, pid, now, randSfx) | |
return &Worker{cfg: cfg, rdb: rdb, log: log, cb: cb, baseID: base} | |
} |
🤖 Prompt for AI Agents
In internal/worker/worker.go around lines 30 to 36, the code is using
time.Now().UnixNano() twice and masking bits as a pseudo-random suffix; replace
that with crypto/rand to generate the suffix. Read 2 bytes from crypto/rand (or
an appropriate small byte slice), handle any error (fallback to deterministic
value or return error), format those bytes as a four‑hex-digit suffix, and use a
single captured timestamp for the base ID construction so the ID is stable and
the suffix is truly random.
for ctx.Err() == nil { | ||
if !w.cb.Allow() { | ||
time.Sleep(100 * time.Millisecond) | ||
time.Sleep(w.cfg.Worker.BreakerPause) | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
Guard against zero BreakerPause. Don’t spin like a lunatic.
If BreakerPause is 0, you tight-loop under Open breaker.
- if !w.cb.Allow() {
- time.Sleep(w.cfg.Worker.BreakerPause)
+ if !w.cb.Allow() {
+ pause := w.cfg.Worker.BreakerPause
+ if pause <= 0 {
+ pause = 100 * time.Millisecond
+ }
+ time.Sleep(pause)
continue
}
Stop the tight spin — guard against zero BreakerPause.
File: internal/worker/worker.go Lines: 80-84
If BreakerPause <= 0 time.Sleep(0) causes a tight-loop when the breaker is open — don't spin like a lunatic. Apply:
- if !w.cb.Allow() {
- time.Sleep(w.cfg.Worker.BreakerPause)
+ if !w.cb.Allow() {
+ pause := w.cfg.Worker.BreakerPause
+ if pause <= 0 {
+ pause = 100 * time.Millisecond
+ }
+ time.Sleep(pause)
continue
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
for ctx.Err() == nil { | |
if !w.cb.Allow() { | |
time.Sleep(100 * time.Millisecond) | |
time.Sleep(w.cfg.Worker.BreakerPause) | |
continue | |
} | |
for ctx.Err() == nil { | |
if !w.cb.Allow() { | |
pause := w.cfg.Worker.BreakerPause | |
if pause <= 0 { | |
pause = 100 * time.Millisecond | |
} | |
time.Sleep(pause) | |
continue | |
} |
🤖 Prompt for AI Agents
In internal/worker/worker.go around lines 80-84, the loop calls
time.Sleep(w.cfg.Worker.BreakerPause) unguarded so if BreakerPause is zero or
negative the loop spins tightly; change this to enforce a sane minimum pause
(e.g., compute a sleepDur = w.cfg.Worker.BreakerPause and if sleepDur <= 0 set
it to a small default like 1ms or 100ms), then wait using either
time.Sleep(sleepDur) or a context-aware wait (select on time.After(sleepDur) and
ctx.Done()) to avoid busy spinning and still respect cancellation.
// measure state transition around Record() to count trips | ||
start := time.Now() | ||
// process job | ||
ok := w.processJob(ctx, workerID, srcQueue, procList, hbKey, payload) | ||
obs.JobProcessingDuration.Observe(time.Since(start).Seconds()) | ||
prev := w.cb.State() | ||
w.cb.Record(ok) | ||
curr := w.cb.State() | ||
if prev != curr && curr == breaker.Open { | ||
obs.CircuitBreakerTrips.Inc() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Delete the per-iteration prev/curr logic. It’s wrong under concurrency and will overcount.
With the above centralized logic, this cruft must go.
- // measure state transition around Record() to count trips
start := time.Now()
// process job
ok := w.processJob(ctx, workerID, srcQueue, procList, hbKey, payload)
obs.JobProcessingDuration.Observe(time.Since(start).Seconds())
- prev := w.cb.State()
w.cb.Record(ok)
- curr := w.cb.State()
- if prev != curr && curr == breaker.Open {
- obs.CircuitBreakerTrips.Inc()
- }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// measure state transition around Record() to count trips | |
start := time.Now() | |
// process job | |
ok := w.processJob(ctx, workerID, srcQueue, procList, hbKey, payload) | |
obs.JobProcessingDuration.Observe(time.Since(start).Seconds()) | |
prev := w.cb.State() | |
w.cb.Record(ok) | |
curr := w.cb.State() | |
if prev != curr && curr == breaker.Open { | |
obs.CircuitBreakerTrips.Inc() | |
} | |
} | |
start := time.Now() | |
// process job | |
ok := w.processJob(ctx, workerID, srcQueue, procList, hbKey, payload) | |
obs.JobProcessingDuration.Observe(time.Since(start).Seconds()) | |
w.cb.Record(ok) | |
} |
🤖 Prompt for AI Agents
In internal/worker/worker.go around lines 114 to 125, remove the per-iteration
prev/curr state capture and comparison: delete "prev := w.cb.State()", "curr :=
w.cb.State()", and the "if prev != curr && curr == breaker.Open {
obs.CircuitBreakerTrips.Inc() }" block; keep the call to "w.cb.Record(ok)" and
the surrounding timing/processing logic intact so state transitions are no
longer counted per-iteration here (trip counting should be handled centrally
elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
README.md (1)
66-74
: Your metrics/health ports are wrong. You standardized to 9191; README still says 9090. Fix it everywhere, including Docker run.This misleads ops and breaks copy/paste.
-### Metrics - -- Prometheus metrics exposed at http://localhost:9090/metrics by default +### Metrics + +- Prometheus metrics exposed at http://localhost:9191/metrics by default @@ -### Health and Readiness - -- Liveness: http://localhost:9090/healthz returns 200 when the process is up -- Readiness: http://localhost:9090/readyz returns 200 only when Redis is reachable +### Health and Readiness + +- Liveness: http://localhost:9191/healthz returns 200 when the process is up +- Readiness: http://localhost:9191/readyz returns 200 only when Redis is reachableAnd update Docker mapping:
-docker run --rm -p 9090:9090 --env-file env.list job-queue-system:latest --role=all +docker run --rm -p 9191:9191 --env-file env.list job-queue-system:latest --role=alldocs/evidence/README.md (1)
61-66
: Add a heading and the missing blank line before the list.-Outputs are written under `docs/evidence/run_YYYYmmdd_HHMMSS/`: -- worker.log, worker.pid -- metrics_before.txt, metrics_after.txt -- stats_before.json, stats_after.json -- bench.json +## Outputs + +Outputs are written under `docs/evidence/run_YYYYmmdd_HHMMSS/`: + +- worker.log, worker.pid +- metrics_before.txt, metrics_after.txt +- stats_before.json, stats_after.json +- bench.json
♻️ Duplicate comments (3)
README.md (1)
102-102
: Fix markdownlint MD034 (bare URL) — we already told you this.Wrap the PR URL.
-Release branch open for v0.4.0-alpha: see PR https://github.com/flyingrobots/go-redis-work-queue/pull/1 +Release branch open for v0.4.0-alpha: see PR <https://github.com/flyingrobots/go-redis-work-queue/pull/1>docs/evidence/README.md (2)
37-45
: Do NOT tell users to run KEYS/DEL. Use purge-all or SCAN/UNLINK. Blocking Redis is amateur hour.You literally ship an admin purge-all. Document it. And if someone must use redis-cli, use SCAN + UNLINK with -r. Apply:
-Important notes -- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress. -- To avoid stale backlog affecting evidence, clear test keys before running a bench: -```bash -redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter -redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL -``` -- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port. +## Important notes + +- The admin `bench` command enqueues jobs directly (LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress. +- To avoid stale backlog affecting evidence, prefer the built-in purge (destructive): + +```bash +./bin/job-queue-system --role=admin --config=docs/evidence/config.alpha.yaml --admin-cmd=purge-all --yes +``` + +- If you must use redis-cli, avoid blocking the server: + +```bash +# Known small keys: +redis-cli UNLINK jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter +# Large per-worker keys: +redis-cli --scan --pattern 'jobqueue:worker:*:processing' | xargs -r -n 100 redis-cli UNLINK +``` + +- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port.
46-48
: Make “Notes” a real heading and space the list.-Notes -- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there. +## Notes + +- The simple latency reported in `bench.json` compares current time to each job's `creation_time` after completion sampling and is a coarse approximation. For precise latency distributions, prefer the Prometheus histogram `job_processing_duration_seconds` (use `_bucket` with `histogram_quantile()` to compute quantiles).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
README.md
(2 hunks)docs/08_deployment.md
(1 hunks)docs/12_performance_baseline.md
(1 hunks)docs/13_release_versioning.md
(1 hunks)docs/14_ops_runbook.md
(2 hunks)docs/evidence/README.md
(1 hunks)docs/testing-guide.md
(1 hunks)internal/worker/worker_breaker_integration_test.go
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/worker/worker_breaker_integration_test.go (3)
internal/config/config.go (3)
Load
(133-198)Redis
(13-24)Backoff
(26-29)internal/queue/job.go (1)
NewJob
(20-31)internal/breaker/breaker.go (1)
State
(10-10)
🪛 markdownlint-cli2 (0.17.2)
docs/testing-guide.md
5-5: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
6-6: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
10-10: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
11-11: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
15-15: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
16-16: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
21-21: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
26-26: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
27-27: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
32-32: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
34-34: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
36-36: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
40-40: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
41-41: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
46-46: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
48-48: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
50-50: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
54-54: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
55-55: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
59-59: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
63-63: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
64-64: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
69-69: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
71-71: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
73-73: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
77-77: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
78-78: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
82-82: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
86-86: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
87-87: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
94-94: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
96-96: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
98-98: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
102-102: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
106-106: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
108-108: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
110-110: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
112-112: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
114-114: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
116-116: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
118-118: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
122-122: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
123-123: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
130-130: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
135-135: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
136-136: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
docs/08_deployment.md
29-29: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
docs/14_ops_runbook.md
42-42: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
docs/13_release_versioning.md
22-22: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
23-23: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
26-26: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
docs/12_performance_baseline.md
20-20: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
20-20: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
21-21: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
23-23: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
24-24: Ordered list item prefix
Expected: 1; Actual: 2; Style: 1/2/3
(MD029, ol-prefix)
24-24: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
27-27: Ordered list item prefix
Expected: 2; Actual: 3; Style: 1/2/3
(MD029, ol-prefix)
27-27: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
28-28: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
30-30: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
31-31: Ordered list item prefix
Expected: 1; Actual: 4; Style: 1/1/1
(MD029, ol-prefix)
31-31: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
31-31: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
32-32: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
36-36: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
docs/evidence/README.md
10-10: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
11-11: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
15-15: Ordered list item prefix
Expected: 1; Actual: 2; Style: 1/1/1
(MD029, ol-prefix)
15-15: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
16-16: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
20-20: Ordered list item prefix
Expected: 1; Actual: 3; Style: 1/1/1
(MD029, ol-prefix)
20-20: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
21-21: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
25-25: Ordered list item prefix
Expected: 1; Actual: 4; Style: 1/1/1
(MD029, ol-prefix)
25-25: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
26-26: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
32-32: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1
(MD029, ol-prefix)
32-32: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
33-33: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
38-38: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
39-39: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
40-40: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
43-43: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
44-44: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
47-47: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
62-62: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
README.md
20-20: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
21-21: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
25-25: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
26-26: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
30-30: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
31-31: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
35-35: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
36-36: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
40-40: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
41-41: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
45-45: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
46-46: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
85-85: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
86-86: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
90-90: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
91-91: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
95-95: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
96-96: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
102-102: Bare URL used
(MD034, no-bare-urls)
105-105: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
110-110: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
🔇 Additional comments (3)
docs/testing-guide.md (1)
10-13
: Verified — 'build' target present (Makefile: line 12). docs/testing-guide.md'smake build
instruction is accurate; no change needed.internal/worker/worker_breaker_integration_test.go (2)
39-53
: Guard against missing queue config to avoid pushing to the empty key.If defaults ever change, this silently pushes to "". Add a precondition check (as shown above) or set
cfg.Worker.Queues["low"]
explicitly in the test.
4-14
: Resolved — repo consistently uses go-redis v8; no v9 imports found.
go.mod lists github.com/go-redis/redis/v8 v8.11.5 and repository imports all reference github.com/go-redis/redis/v8; no github.com/redis/go-redis/v9 or other v9 references detected.
Config overrides via env vars. Example: | ||
`WORKER_COUNT=32 REDIS_ADDR=redis:6379 ./job-queue-system --role=worker --config=config.yaml` | ||
```bash | ||
WORKER_COUNT=32 REDIS_ADDR=redis:6379 ./job-queue-system --role=worker --config=config.yaml | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Fix the fenced block spacing and align paths.
Surround the code fence with blank lines (MD031) and make the binary/config paths consistent with the rest of the repo (use ./bin and config/config.yaml).
-Config overrides via env vars. Example:
-```bash
-WORKER_COUNT=32 REDIS_ADDR=redis:6379 ./job-queue-system --role=worker --config=config.yaml
-```
+Config overrides via env vars. Example:
+
+```bash
+WORKER_COUNT=32 REDIS_ADDR=redis:6379 ./bin/job-queue-system --role=worker --config=config/config.yaml
+```
+
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
29-29: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
In docs/08_deployment.md around lines 28 to 31, the fenced code block lacks
surrounding blank lines and uses inconsistent paths; add a blank line before and
after the ```bash fence and update the example to use ./bin/job-queue-system and
config/config.yaml so the binary and config paths match the repo convention.
## Baseline Procedure | ||
1) Start Redis (e.g., docker run -p 6379:6379 redis:7-alpine) | ||
1) Start Redis | ||
```bash | ||
docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Call out key cleanup before baseline runs.
Benchmarks lie if stale keys exist. Explicitly instruct to purge Redis keys (or start a fresh container) before running.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
19-19: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
20-20: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
20-20: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
21-21: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
23-23: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
In docs/12_performance_baseline.md around lines 19–23, the baseline procedure
doesn't ensure a clean Redis state; add a clear pre-benchmark cleanup step
instructing users to either flush all Redis keys (e.g., via redis-cli FLUSHALL)
or start a fresh Redis container before running the benchmark, place this step
immediately before "Start Redis" (or replace the current start step with a note
recommending a fresh instance), and mention that stale keys will skew results.
1) Start Redis | ||
```bash | ||
docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine | ||
``` | ||
2) Copy `config/config.example.yaml` to `config/config.yaml` and set: | ||
- `worker.count`: 16 on a 4 vCPU node (adjust as needed) | ||
- `redis.addr`: "localhost:6379" | ||
3) In one shell, run the worker: | ||
- `./bin/job-queue-system --role=worker --config=config/config.yaml` | ||
4) In another shell, run the bench (enqueue and wait for completion): | ||
- `./bin/job-queue-system --role=admin --admin-cmd=bench --bench-count=2000 --bench-rate=1000 --bench-priority=low --bench-timeout=60s` | ||
3) In one shell, run the worker | ||
```bash | ||
./bin/job-queue-system --role=worker --config=config/config.yaml | ||
``` | ||
4) In another shell, run the bench (enqueue and wait for completion) | ||
```bash | ||
./bin/job-queue-system --role=admin --admin-cmd=bench \ | ||
--bench-count=2000 --bench-rate=1000 \ | ||
--bench-priority=low --bench-timeout=60s | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
The list/fences formatting is borked (MD029/MD031/MD032). Normalize numbering and add blank lines.
Use “1.” auto-numbering and surround code fences with blank lines to keep markdownlint quiet.
-1) Start Redis
-```bash
-docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine
-```
-2) Copy `config/config.example.yaml` to `config/config.yaml` and set:
+1. Start Redis
+
+```bash
+docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine
+```
+
+1. Copy `config/config.example.yaml` to `config/config.yaml` and set:
- `worker.count`: 16 on a 4 vCPU node (adjust as needed)
- `redis.addr`: "localhost:6379"
-3) In one shell, run the worker
-```bash
-./bin/job-queue-system --role=worker --config=config/config.yaml
-```
-4) In another shell, run the bench (enqueue and wait for completion)
-```bash
-./bin/job-queue-system --role=admin --admin-cmd=bench \
- --bench-count=2000 --bench-rate=1000 \
- --bench-priority=low --bench-timeout=60s
-```
+1. In one shell, run the worker
+
+```bash
+./bin/job-queue-system --role=worker --config=config/config.yaml
+```
+
+1. In another shell, run the bench (enqueue and wait for completion)
+
+```bash
+./bin/job-queue-system --role=admin --admin-cmd=bench \
+ --bench-count=2000 --bench-rate=1000 \
+ --bench-priority=low --bench-timeout=60s
+```
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
20-20: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
20-20: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
21-21: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
23-23: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
24-24: Ordered list item prefix
Expected: 1; Actual: 2; Style: 1/2/3
(MD029, ol-prefix)
24-24: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
27-27: Ordered list item prefix
Expected: 2; Actual: 3; Style: 1/2/3
(MD029, ol-prefix)
27-27: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
28-28: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
30-30: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
31-31: Ordered list item prefix
Expected: 1; Actual: 4; Style: 1/1/1
(MD029, ol-prefix)
31-31: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
31-31: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
32-32: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
36-36: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
In docs/12_performance_baseline.md around lines 20 to 36, the numbered list and
fenced code blocks violate markdownlint rules (MD029/MD031/MD032); normalize all
numbered items to use "1." for auto-numbering, ensure there is a blank line
before and after each triple-backtick code fence, and keep the indented bullet
lines (`- worker.count`, `- redis.addr`) directly under their list item; update
the four steps to use "1." for each top-level item, add blank lines above and
below each fenced block, and preserve the two sub-bullets under the copy-config
step exactly as-is.
4) Tag release | ||
```bash | ||
git tag vX.Y.Z[-pre] | ||
git push --tags | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Stop pushing all tags; use annotated tags and push a single tag. Also fix MD031/MD032.
Push only the intended tag, not your entire tag namespace. Add blank lines around the fence.
-4) Tag release
-```bash
-git tag vX.Y.Z[-pre]
-git push --tags
-```
+4) Tag release (annotated)
+
+```bash
+git tag -a vX.Y.Z[-pre] -m "release vX.Y.Z[-pre]"
+git push origin vX.Y.Z[-pre]
+```
+
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
22-22: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
23-23: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
26-26: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
docs/13_release_versioning.md around lines 22-26: the current snippet pushes all
tags and uses a non-annotated tag without proper blank lines around the fenced
block; change to instruct creating an annotated tag with -a and -m and pushing
only that single tag (git push origin <tag>), and add blank lines before and
after the ```bash fenced block to satisfy MD031/MD032. Ensure the step label
mentions "Tag release (annotated)" and update the commands accordingly.
mr, _ := miniredis.Run() | ||
defer mr.Close() | ||
|
||
cfg, _ := config.Load("nonexistent.yaml") | ||
cfg.Redis.Addr = mr.Addr() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stop ignoring errors. This is a test, not a wish.
Check errors from miniredis.Run and config.Load or fail fast.
- mr, _ := miniredis.Run()
+ mr, err := miniredis.Run()
+ if err != nil {
+ t.Fatalf("miniredis: %v", err)
+ }
@@
- cfg, _ := config.Load("nonexistent.yaml")
+ cfg, err := config.Load("nonexistent.yaml")
+ if err != nil {
+ t.Fatalf("load config: %v", err)
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
mr, _ := miniredis.Run() | |
defer mr.Close() | |
cfg, _ := config.Load("nonexistent.yaml") | |
cfg.Redis.Addr = mr.Addr() | |
mr, err := miniredis.Run() | |
if err != nil { | |
t.Fatalf("miniredis: %v", err) | |
} | |
defer mr.Close() | |
cfg, err := config.Load("nonexistent.yaml") | |
if err != nil { | |
t.Fatalf("load config: %v", err) | |
} | |
cfg.Redis.Addr = mr.Addr() |
🤖 Prompt for AI Agents
In internal/worker/worker_breaker_integration_test.go around lines 19–23, the
code ignores errors from miniredis.Run() and config.Load("nonexistent.yaml");
update the test to capture and check both error returns and fail fast on error
(e.g., mr, err := miniredis.Run(); if err != nil { t.Fatalf("miniredis.Run
failed: %v", err) } and cfg, err := config.Load(...); if err != nil {
t.Fatalf("config.Load failed: %v", err) }), and ensure defer mr.Close() is only
called after mr is successfully created.
// Enqueue failing jobs (filename contains "fail") | ||
for i := 0; i < 5; i++ { | ||
j := queue.NewJob( | ||
"id-fail-", | ||
"/tmp/fail-test.txt", // contains "fail" to force failure | ||
1, | ||
"low", | ||
"", | ||
"", | ||
) | ||
payload, _ := j.Marshal() | ||
if err := rdb.LPush(context.Background(), cfg.Worker.Queues["low"], payload).Err(); err != nil { | ||
t.Fatalf("lpush: %v", err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Brittle payload and sloppy error handling. Make IDs unique, assert Marshal error, and validate the queue key.
Also, relying on a filename containing “fail” is a hack. If possible, inject a failing processor; if not, keep it but make the rest solid.
- for i := 0; i < 5; i++ {
- j := queue.NewJob(
- "id-fail-",
+ key := cfg.Worker.Queues["low"]
+ if key == "" {
+ t.Fatalf("missing low-priority queue key in config")
+ }
+ for i := 0; i < 5; i++ {
+ j := queue.NewJob(
+ fmt.Sprintf("id-fail-%d", i),
"/tmp/fail-test.txt", // contains "fail" to force failure
1,
"low",
"",
"",
)
- payload, _ := j.Marshal()
- if err := rdb.LPush(context.Background(), cfg.Worker.Queues["low"], payload).Err(); err != nil {
+ payload, err := j.Marshal()
+ if err != nil {
+ t.Fatalf("marshal: %v", err)
+ }
+ if err := rdb.LPush(context.Background(), key, payload).Err(); err != nil {
t.Fatalf("lpush: %v", err)
}
}
Add missing import:
import "fmt"
🤖 Prompt for AI Agents
In internal/worker/worker_breaker_integration_test.go around lines 39 to 53, the
test enqueues brittle payloads and mishandles errors: make job IDs unique (e.g.,
include the loop index or GUID), check and handle the error returned by
j.Marshal instead of ignoring it, validate that cfg.Worker.Queues["low"] exists
and is non-empty before using it (fail the test with a clear message if not),
improve LPush error context (include the queue key in the t.Fatalf message), and
if feasible replace the filename-based failure trigger with an injected failing
processor; if injection isn’t possible keep the filename hack but ensure the
rest is robust; also add the required import "fmt" as requested.
log, _ := zap.NewDevelopment() | ||
w := New(cfg, rdb, log) | ||
ctx, cancel := context.WithCancel(context.Background()) | ||
done := make(chan struct{}) | ||
go func() { defer close(done); _ = w.Run(ctx) }() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check logger construction and don’t pretend everything succeeds.
[zap.NewDevelopment] returns an error. Handle it.
- log, _ := zap.NewDevelopment()
+ log, err := zap.NewDevelopment()
+ if err != nil {
+ t.Fatalf("logger: %v", err)
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
log, _ := zap.NewDevelopment() | |
w := New(cfg, rdb, log) | |
ctx, cancel := context.WithCancel(context.Background()) | |
done := make(chan struct{}) | |
go func() { defer close(done); _ = w.Run(ctx) }() | |
log, err := zap.NewDevelopment() | |
if err != nil { | |
t.Fatalf("logger: %v", err) | |
} | |
w := New(cfg, rdb, log) | |
ctx, cancel := context.WithCancel(context.Background()) | |
done := make(chan struct{}) | |
go func() { defer close(done); _ = w.Run(ctx) }() |
🤖 Prompt for AI Agents
In internal/worker/worker_breaker_integration_test.go around lines 55 to 60, the
call to zap.NewDevelopment ignores its returned error; update the test to
capture both returned values (logger, err) and handle the error instead of
assuming success — for example, fail the test immediately with t.Fatalf or
require.NoError(t, err) if creation fails, or alternatively fall back to a safe
logger (e.g., zap.NewNop()) when err != nil; then use the verified logger
variable when constructing w.
// Wait up to 2s for breaker to open | ||
deadline := time.Now().Add(2 * time.Second) | ||
opened := false | ||
for time.Now().Before(deadline) { | ||
if w.cb.State() == 2 { // Open | ||
opened = true | ||
break | ||
} | ||
time.Sleep(5 * time.Millisecond) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
🛠️ Refactor suggestion
Magic number “2” for breaker state? No. Use a constant.
Depend on a public constant from the breaker package instead of hard-coding 2.
- if w.cb.State() == 2 { // Open
+ if w.cb.State() == breaker.StateOpen { // Open
If breaker.StateOpen
doesn’t exist, expose it and I’ll wire the change.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// Wait up to 2s for breaker to open | |
deadline := time.Now().Add(2 * time.Second) | |
opened := false | |
for time.Now().Before(deadline) { | |
if w.cb.State() == 2 { // Open | |
opened = true | |
break | |
} | |
time.Sleep(5 * time.Millisecond) | |
} | |
// Wait up to 2s for breaker to open | |
deadline := time.Now().Add(2 * time.Second) | |
opened := false | |
for time.Now().Before(deadline) { | |
if w.cb.State() == breaker.StateOpen { // Open | |
opened = true | |
break | |
} | |
time.Sleep(5 * time.Millisecond) | |
} |
🤖 Prompt for AI Agents
In internal/worker/worker_breaker_integration_test.go around lines 61 to 70,
replace the hard-coded numeric state check (== 2) with the public breaker
package constant (e.g., breaker.StateOpen); update the import if necessary to
reference the breaker package; if breaker.StateOpen does not yet exist, add and
export a named constant in the breaker package (e.g., const StateOpen = 2) and
use that constant in this test so the state comparison is clear and not a magic
number.
// While breaker is Open (cooldown 100ms), queue length should not decrease | ||
n1, _ := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result() | ||
time.Sleep(50 * time.Millisecond) // less than cooldown | ||
n2, _ := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result() | ||
if n2 < n1 { | ||
cancel(); <-done | ||
t.Fatalf("queue drained during breaker open: before=%d after=%d", n1, n2) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re still ignoring Redis errors. Stop it.
LLen can fail; assert it.
- n1, _ := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result()
+ n1, err := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result()
+ if err != nil {
+ cancel(); <-done
+ t.Fatalf("llen before: %v", err)
+ }
@@
- n2, _ := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result()
+ n2, err := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result()
+ if err != nil {
+ cancel(); <-done
+ t.Fatalf("llen after: %v", err)
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// While breaker is Open (cooldown 100ms), queue length should not decrease | |
n1, _ := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result() | |
time.Sleep(50 * time.Millisecond) // less than cooldown | |
n2, _ := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result() | |
if n2 < n1 { | |
cancel(); <-done | |
t.Fatalf("queue drained during breaker open: before=%d after=%d", n1, n2) | |
} | |
// While breaker is Open (cooldown 100ms), queue length should not decrease | |
n1, err := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result() | |
if err != nil { | |
cancel(); <-done | |
t.Fatalf("llen before: %v", err) | |
} | |
time.Sleep(50 * time.Millisecond) // less than cooldown | |
n2, err := rdb.LLen(context.Background(), cfg.Worker.Queues["low"]).Result() | |
if err != nil { | |
cancel(); <-done | |
t.Fatalf("llen after: %v", err) | |
} | |
if n2 < n1 { | |
cancel(); <-done | |
t.Fatalf("queue drained during breaker open: before=%d after=%d", n1, n2) | |
} |
🤖 Prompt for AI Agents
In internal/worker/worker_breaker_integration_test.go around lines 76 to 83, the
test calls rdb.LLen(...).Result() and ignores the returned errors; change the
code to capture and assert the errors (e.g., n1, err := rdb.LLen(...).Result();
if err != nil { cancel(); <-done; t.Fatalf("LLen failed before sleep: %v", err)
}) for both n1 and n2 so Redis failures are surfaced and the test fails fast
with the error details.
Batch 1: Addressed CodeRabbit nits (workflows)I pulled all review comments and started knocking them down. First batch focused on CI/workflows suggestions:
I will continue in batches (config/docs/nits next). If there are any high‑priority comments you want me to tackle first, let me know the IDs or paste them and I’ll jump them to the front. |
…utofix hook; make hooks target to enable
…029/MD033 and keep fenced blocks; workflows unchanged semantically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 51
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (28)
docs/06_technical_spec.md (3)
19-64
: Config schema is incomplete: no TLS/ACL, no Cluster/Sentinel knobs, no heartbeat refresh, no defaults. Fix it.You’re shipping a production-adjacent system and forgot basic knobs. Add TLS and HA options, make heartbeat refresh explicit, and document sane defaults.
Apply:
redis: addr: string username: string password: string db: int + tls: + enabled: bool + ca_file: string + cert_file: string + key_file: string + insecure_skip_verify: bool + sentinel: + enabled: bool + master_name: string + addrs: [string] + cluster: + enabled: bool + addrs: [string] + route_by_latency: bool pool_size_multiplier: int min_idle_conns: int dial_timeout: duration read_timeout: duration write_timeout: duration max_retries: int worker: count: int heartbeat_ttl: duration + heartbeat_refresh_interval: duration max_retries: int - backoff: { base: duration, max: duration } + backoff: { base: duration, max: duration, jitter_pct: float } priorities: [string] queues: { <priority>: string } processing_list_pattern: string # printf format, %s workerID heartbeat_key_pattern: string # printf format, %s workerID completed_list: string dead_letter_list: string brpoplpush_timeout: duration producer: scan_dir: string include_globs: [string] exclude_globs: [string] default_priority: string high_priority_exts: [string] rate_limit_per_sec: int rate_limit_key: string circuit_breaker: failure_threshold: float window: duration cooldown_period: duration min_samples: int observability: - metrics_port: int + metrics_port: int # default 9191 log_level: string queue_sample_interval: duration tracing: { enabled: bool, endpoint: string } +env_override_prefix: string # e.g., JOBQ_; mapping uses UPPER_SNAKE for nested keys, e.g., JOBQ_WORKER_HEARTBEAT_TTL
131-134
: Logging/tracing: forbid payload/PII, specify OTLP, and define sampling.You don’t want secrets in logs, and tracing needs protocol clarity.
Apply:
- Logs are JSON with keys: `level`, `ts`, `msg`, `trace_id`, `span_id`, `job_id`, `queue`, `worker_id`. +- Logs are JSON with keys: `level`, `ts`, `msg`, `trace_id`, `span_id`, `job_id`, `queue`, `worker_id`. Do NOT log payload bodies or secrets. - Tracing: create a span per job processing; propagate `trace_id/span_id` if present; otherwise create new. +Tracing exporter: OTLP (gRPC) by default; `observability.tracing.endpoint` accepts OTLP endpoint (e.g., http(s):// or grpc target). Configure sampling rate (default 1.0) and baggage propagation.
81-92
: Job schema nit: types, not example literals.
filesize: 0
is misleading. Also makepriority
enum align withworker.priorities
.Apply:
- "filesize": 0, - "priority": "high|low", + "filesize": "<int-bytes>", + "priority": "<one-of worker.priorities>",docs/05_architecture.md (4)
10-17
: TOC is missing Security. Add it. Yesterday.You ship release infra and public images but have zero “Security Considerations” in the ToC. Unacceptable for production-ish docs.
Apply:
- [Performance Targets](#performance-targets) + - [Security Considerations](#security-considerations)
65-72
: Non-atomic “Reap” flow is wrong and will lose messages. Fix the flow.“RPOP processing -> LPUSH orig queue” is a two-step. Crash in between and you’ve dropped work. Use LMOVE or RPOPLPUSH.
-6) Reap: if heartbeat missing, RPOP processing list items back to originating priority queue. +6) Reap: if heartbeat missing, atomically move items back to the originating priority queue using LMOVE {processing} {queue} RIGHT LEFT (Redis ≥6.2) or RPOPLPUSH for older versions.Also specify how you discover the “originating” queue (embedded metadata vs key mapping). If it’s embedded, state the payload field name.
73-82
: Add “Security Considerations” section with bare minimum hygiene.You’re pushing images and Helm charts; document auth/TLS or you’ll train users to deploy Redis wide open.
@@ ## Technology Stack @@ - Container: Distroless base image + +## Security Considerations + +- Redis: Require AUTH and TLS; document `rediss://` usage and certificate mounting in Helm values. +- Secrets: Never pass credentials via flags; use env/secret mounts (Kubernetes Secrets), and mark them as `sensitive` in GoReleaser if templated. +- Network: Isolate Redis with network policies/security groups; disallow Internet exposure. +- RBAC: Limit the service account; read-only for metrics endpoints. +- Images: Use distroless nonroot and drop capabilities; enable rootless where possible. +- Supply chain: Pin base images by digest; enable `govulncheck` in CI (already present) and sign images (cosign) as a follow-up.Also applies to: 10-17
90-94
: Cite evidence or remove these performance bullets now.
- docs/evidence/bench.json records throughput_jobs_per_sec=16.096426 ⇒ ≈965.79 jobs/min (admin bench: COUNT=1000, RATE=500, duration=62.1256s). Do not claim "≥1k jobs/min per 4 vCPU" — no hardware/vCPU/RAM/cloud SKU recorded. (see docs/evidence/bench.json)
- docs/evidence/config.used.yaml shows worker.count=8, redis: localhost:6379, heartbeat_ttl: 30s, rate_limit_per_sec: 1000 — Redis version/topology, network, payload-size distribution, and machine specs are missing. (see docs/evidence/config.used.yaml)
- docs/evidence/metrics_after.txt shows jobs_completed_total=480 and job_processing_duration_seconds_count=480 — this conflicts with bench.json and provides insufficient samples for a reliable p95 claim. Do not assert "p95 < 2s" without the full Prometheus histogram export.
- Recovery claim "< 10s" directly contradicts heartbeat_ttl=30s in config.used.yaml. Either state the exact TTL + reaper/scan interval that produce <10s, or correct the recovery number.
- Action (one-line): replace the three bullets in docs/05_architecture.md with precise, evidence-backed statements that reference docs/evidence/{bench.json,config.used.yaml,metrics_before.txt,metrics_after.txt,run_bench.sh} and explicitly list hardware (CPU/RAM/cloud SKU), Redis version/topology, network conditions, payload size distribution, bench params (COUNT,RATE), and heartbeat/reaper config — or remove the unsupported claims.
docs/04_sprint_plans.md (3)
39-47
: Admin stories are stale vs code.You added purge-all and stats-keys elsewhere. Mirror them here or you’ll drift instantly.
-1) As an operator, I can run admin commands to inspect and manage queues. (8) - - Acceptance: `stats`, `peek`, `purge-dlq` subcommands; safe operations with confirmation; tests. +1) As an operator, I can run admin commands to inspect and manage queues. (8) + - Acceptance: `stats`, `stats-keys`, `peek`, `purge-dlq`, `purge-all` (with `--yes`) subcommands; safe operations with confirmation; tests.
23-26
: Config knob already exists — add a test that proves it
- Reality: interval is configurable via Observability.QueueSampleInterval (mapstructure key "queue_sample_interval") and defaults to 2s (internal/config/config.go; config/config.example.yaml; docs/evidence/config.used.yaml). StartQueueLengthUpdater already uses cfg.Observability.QueueSampleInterval (internal/obs/queue_length.go).
- Action (required): add a test (e.g. internal/obs/queue_length_test.go or extend an existing miniredis test) that sets Observability.QueueSampleInterval to a non-default value and verifies StartQueueLengthUpdater respects that interval — do not merge without this.
20-31
: Don't promise what you didn't implement — either remove “metrics server” from readiness or implement gating.readyCheck in cmd/job-queue-system/main.go (≈line 86) only pings Redis; StartHTTPServer (internal/obs/http.go) registers /metrics and calls the provided readiness callback — there is no check that a separate metrics server is running. Update docs/04_sprint_plans.md (lines 20–31) to match reality or modify the readiness check to also verify the metrics endpoint/listener (change cmd/job-queue-system/main.go and/or internal/obs/http.go accordingly).
docs/10_risk_register.md (1)
15-26
: Assign owners and triggers or this matrix is dead weight.Add “Owner” and “Trigger/Threshold” columns; otherwise nobody moves when a risk flips.
-| # | Risk | Prob. | Impact (1-5) | Mitigation | Contingency | -|---|------|-------|--------------|------------|-------------| +| # | Risk | Prob. | Impact (1-5) | Owner | Trigger/Threshold | Mitigation | Contingency | +|---|------|-------|--------------|-------|-------------------|------------|-------------|docs/14_ops_runbook.md (1)
91-99
: Rollback steps miss the obvious: stop producers first and pin digests.Prevent refilling queues during rollback; use digests, not tags.
- Rollback: - 1) Select previous good tag (e.g., `vX.Y.(Z-1)`). - 2) Deploy image `ghcr.io/<owner>/<repo>:vX.Y.(Z-1)`. + 0) Temporarily scale producers to zero to stop inflow. + 1) Select previous good release (e.g., `vX.Y.(Z-1)`). + 2) Deploy pinned image `ghcr.io/<owner>/<repo>@sha256:<digest>`. 3) Verify `/healthz` and `/readyz` return 200.docs/09_requirements.md (1)
19-27
: Scope creep and mismatch. Keep requirements aligned with the codebase.
- “Producer scans directory” is oddly specific for a generic queue. Make it an example, not a requirement.
- Admin tooling list is incomplete (missing
stats-keys
,purge-all
).- Breaker semantics: call out single-probe HalfOpen if that’s the design.
-- Producer scans directory, prioritizes by extension, and enqueues JSON jobs. (complexity: ~80 LoC; O(1) `LPUSH` per job) +- Producer API to enqueue jobs; file-scanning producer provided as an example tool (not core). (O(1) `LPUSH` per job) @@ -- Circuit breaker with Closed/Open/HalfOpen states; cooldown. (~120 LoC) +- Circuit breaker with Closed/Open/HalfOpen (single-probe) states; cooldown. (~120 LoC) @@ -- Admin tooling: `stats`, `peek`, `purge-dlq`. (~120 LoC) +- Admin tooling: `stats`, `stats-keys`, `peek`, `purge-dlq`, `purge-all` (requires `--yes`). (~150 LoC)docs/07_test_plan.md (2)
35-38
: Define how you’ll measure the “±10%” limiter claim. Otherwise it’s hand-wavy.Spell out PromQL so humans and robots agree on pass/fail.
4) Producer rate limit: per-second cap respected within ±10% under burst. + Measurement: + - PromQL: sum(rate(jobs_produced_total[1m])) by (producer) compared to configured rps. + - Pass if 0.9*rps ≤ 1m-rate ≤ 1.1*rps for ≥ 55s of any 60s window during test.
48-51
: Stop calling lint “optional”. Make it mandatory and scan images, too.Ship with guardrails or regret it later.
- `govulncheck` in CI; fail on critical CVEs. -- Static checks: `go vet` and `golangci-lint` (optional) for code issues. +- Static checks: `go vet` and `golangci-lint` (required) for code issues; CI fails on new warnings. - Secrets: ensure no secrets in logs; validate config does not dump secrets. +- Supply-chain: image scan (e.g., Trivy/Grype) on built images; fail on Critical/High.Want me to add a golangci-lint + Trivy job to CI right now?
docs/03_milestones.md (2)
21-23
: “TraceID/SpanID used for spans and logs” — vague. Which tracer, which fields, where?Specify the instrumentation stack, env flags, and acceptance check.
- - Deliverables: queue length gauges updated every 2s; TraceID/SpanID used for spans and logs + - Deliverables: queue length gauges updated every 2s; OpenTelemetry tracing with OTLP exporter; logs include trace_id/span_id + - Acceptance: OTEL_EXPORTER_OTLP_ENDPOINT set; traces visible in backend; logs show trace_id/span_id fields under load
17-37
: Who owns each milestone? Without owners and exit tests, dates are fantasy.Add DRI(s) and a concrete “exit test” per milestone.
-1. Health + Validation (due 2025-09-20) +1. Health + Validation (due 2025-09-20, Owner: @flyingrobots) - Deliverables: `/healthz` and `/readyz` endpoints; strict config validation with errors; docs updated - - Success: endpoints return 200; malformed configs fail with descriptive messages + - Success: endpoints return 200; malformed configs fail with descriptive messages + - Exit test: `curl -sf localhost:9191/readyz`; CI test asserts non-zero exit on bad configdocs/02_release_plan.md (2)
16-22
: Dates without cutoffs are worthless — add cut times and tag names.Define exact cut times (UTC) and the tag format per milestone. Otherwise people will miss the window.
## Release Cadence -- v0.4.0-alpha — 2025-09-26 -- v0.7.0-beta — 2025-10-10 -- v0.9.0-rc — 2025-10-24 -- v1.0.0 — 2025-11-07 +- v0.4.0-alpha — 2025-09-26 18:00 UTC (tag: v0.4.0-alpha) +- v0.7.0-beta — 2025-10-10 18:00 UTC (tag: v0.7.0-beta) +- v0.9.0-rc — 2025-10-24 18:00 UTC (tag: v0.9.0-rc) +- v1.0.0 — 2025-11-07 18:00 UTC (tag: v1.0.0)
62-69
: Acceptance criteria need measurable checks, not vibes.Add explicit queries and pass/fail checks. Tie chaos to “no lost jobs” with evidence.
## Acceptance Criteria - All acceptance tests for the release pass in CI (unit, integration, race). -- Metrics present: job counters, histograms, queue length gauges, breaker state. +- Metrics present: job counters, histograms, queue length gauges, breaker state. Verified by scraping /metrics and asserting the following series exist: jobs_completed_total, jobs_failed_total, job_processing_duration_seconds_bucket, queue_length, circuit_breaker_state. - Health endpoints return HTTP 200 and surface readiness correctly. -- On chaos tests: automatic recovery without manual intervention; no lost jobs. +- On chaos tests: automatic recovery without manual intervention; no lost jobs (jobs_produced_total == jobs_completed_total + jobs_failed_total + dlq_length). - Documentation updated (README, PRD, ops guide) for each release. + - govulncheck reports 0 vulnerabilities at default severity threshold. + - Release artifacts: SBOM and cosign signatures published alongside images.docs/11_project_charter.md (2)
16-21
: Define “small files” precisely.Quantify payload size so perf targets are testable.
- Performance: 1k jobs/min per 4 vCPU with p95 < 2s for small files. + - Performance: 1k jobs/min per 4 vCPU with p95 < 2s for “small files” (payload ≤ 1 MiB).
34-41
: Tie metrics to actual series/queries.Name the metrics and how you’ll measure coverage; otherwise this is just marketing.
## Success Metrics -- Availability: Ready > 99.9% over rolling 30 days. +- Availability: Ready > 99.9% over rolling 30 days (measured via /readyz SLO burn). - Reliability: Zero lost jobs; DLQ rate < 0.5% of consumed jobs. -- Performance: p95 processing duration < 2s for small files. +- Performance: p95 processing duration < 2s for small files (PromQL: histogram_quantile(0.95, sum by (le) (rate(job_processing_duration_seconds_bucket[5m])))). - Observability: 100% of operations emit metrics; tracing coverage ≥ 80% of job processing. -- Quality: Unit coverage ≥ 80% for core packages; CI green on main. +- Quality: Unit coverage ≥ 80% for core packages; CI green on main (race-enabled).docs/08_deployment.md (3)
18-27
: Document full-history checkout and Go cache.Your CI will randomly break changelog/release without full history; also cache your Go build to not waste CI minutes.
Stages (GitHub Actions): 1) Lint & Vet: run `go vet` (optionally `golangci-lint`) 2) Build: compile all packages 3) Unit + Race: `go test -race ./...` 4) Integration/E2E: Redis service container; full flow tests 5) Security: `govulncheck` -6) Package: Docker image build (main branch tags and releases) -7) Release: tag + changelog on releases +6) Package: Docker image build (main branch tags and releases) with Go module/build cache +7) Release: tag + changelog on releases (git fetch --tags and full history)
39-47
: Rollback doc must mention DLQ snapshot and purge-all safety.Don’t encourage destructive actions without backup and explicit confirmation.
5) If needed, drain DLQ using `purge-dlq` (with backup/export first). 6) Document incident and root cause. +7) As a last resort, `purge-all --yes` (only after snapshotting all queues/keys).
49-55
: Add alerts for reaper anomalies and tag immutability.You miss obvious outages without these.
- Alerts (suggested thresholds): - `circuit_breaker_state > 0` for > 60s → WARN - `rate(jobs_failed_total[5m]) > 0.1 * rate(jobs_consumed_total[5m])` → CRITICAL - `queue_length{queue="jobqueue:dead_letter"} > 100` → WARN - `/readyz` non-200 for > 30s → CRITICAL + - `increase(reaper_recovered_total[5m]) < 1 and sum(queue_length) > 0` → WARN (reaper stuck) + - Deployments not based on immutable tag (image:latest) detected → CRITICALdocs/12_performance_baseline.md (1)
58-61
: Record environment details or your numbers are meaningless.CPU model, RAM, Redis version, and local/remote deployment must be stated.
- On a 4 vCPU node, `bench-count=2000`, `bench-rate=1000` should achieve ≥1k jobs/min throughput, with p95 latency < 2s for small files (<1MB). -- If results fall short, see tuning levers and ensure host/Redis are not CPU or I/O bound. +- If results fall short, see tuning levers and ensure host/Redis are not CPU or I/O bound. Always record: CPU model, RAM, Redis version, network locality.docs/00_assessment.md (3)
19-23
: Name the “all” mode consistently.Pick “all-in-one” or “all” and use it everywhere (readme, flags, Helm).
-- Modes: `--role=producer|worker|all` with YAML config and env overrides. +- Modes: `--role=producer|worker|all-in-one` with YAML config and env overrides.
36-46
: This is outdated — admin CLI exists now. Update the gaps.You added
purge-all
andstats-keys
. Stop claiming the CLI doesn’t exist; list what’s still missing.- Operational tooling: admin CLI (peek queue, purge DLQ, show stats) not yet implemented. + - Operational tooling: admin CLI exists (`purge-dlq`, `purge-all --yes`, `stats-keys`). Missing: `peek`, fine-grained purge, DLQ export tooling.
56-61
: Priorities should reflect reality.Health/readiness and queue gauges are already slated for alpha; remove duplication or tag with milestone owners.
1. Add health/readiness probe and queue length updater. 2. Use TraceID/SpanID to start spans and enrich logs. 3. Strengthen rate limiter timing and jitter; document guarantees. 4. Add config validation and error reporting. -5. Write e2e tests with real Redis (service container) and performance benchmarks. +5. Write e2e tests with real Redis (service container) and performance benchmarks (owner: @flyingrobots, target: v0.7.0-beta).
♻️ Duplicate comments (15)
docs/15_promotion_checklists.md (2)
36-45
: Same “Evidence links as above” hand-wave. Link the actual artifacts here too.Don’t make reviewers hunt.
15-20
: Replace placeholder “Evidence links” with actual files. Placeholders are useless.You already added evidence artifacts—link them.
- - CI run URL: … - - Bench JSON: … - - Metrics snapshot(s): … - - Issues list: … + - CI run JSON: docs/evidence/ci_run.json + - Bench JSON: docs/evidence/bench.json + - Metrics snapshots: docs/evidence/metrics_before.txt, docs/evidence/metrics_after.txt + - Config used: docs/evidence/config.used.yaml + - Repro steps: docs/evidence/README.md.github/workflows/goreleaser.yml (2)
9-12
: Permissions are broad and unsigned artifacts are table stakes now. Tighten and attest.Least privilege + provenance.
permissions: - contents: write - packages: write + contents: write + packages: write + id-token: write # for keyless signing/attestationsThen enable provenance/signing in GoReleaser (cosign/SLSA) or add a post-step to sign images with OIDC. I can wire this in if you want.
28-33
: You promised multi-arch images but didn’t set up QEMU/Buildx. That will blow up.Enable emulation and a buildx builder before release.
- name: Login to GHCR uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} + - name: Set up QEMU + uses: docker/setup-qemu-action@v3 + - name: Set up Buildx + uses: docker/setup-buildx-action@v3docs/did-not-agree/rate-limiter-fixed-window.md (1)
5-5
: You forgot the only safe EXPIRE pattern. This omission misleads implementers.Set expiry only when INCR creates the key, otherwise you extend the window under load.
Decision - Keep fixed-window limiter (INCR + 1s EXPIRE + TTL-based sleep with jitter) for v0.4.0-alpha. + +Implementation Note +- Set EXPIRE/PEXPIRE only when INCR returns 1 (first hit in the window). Use PTTL to sleep. +- Never update expiry on subsequent increments; otherwise you slide the window by accident. +- Consider a Lua wrapper (atomically): INCR; if count==1 then PEXPIRE key 1000; return count.docs/did-not-agree/reaper-scan-vs-registry.md (1)
1-24
: Fix the headings and stop hand-waving SCAN budgets. Now.This still violates MD032 and reads like a napkin. Use proper headings, correct complexity, and document COUNT/cadence.
-# Reaper: SCAN-Based Recovery vs Worker Registry - -Decision - -- Keep SCAN-based discovery of processing lists for v0.4.0-alpha, instead of maintaining a registry of active workers or relying on keyspace notifications. - -Rationale - -- Simplicity and robustness: SCAN requires no extra moving parts or configuration and tolerates sudden worker exits. -- Predictable load: bounded SCAN page sizes and periodic cadence maintain manageable overhead. - -Tradeoffs - -- SCAN is O(keys); at very large worker fleets, registry/notifications can reduce overhead. - -Revisit Criteria - -- If reaper CPU or Redis time spent on SCAN becomes material (observed via profiling/metrics) under expected fleet sizes. - -Future Work - -- Add optional worker registry with TTL stored in Redis; reaper would iterate registry members and target per-worker keys directly. -- Consider Redis keyspace notifications where operationally acceptable. +# Reaper: SCAN-Based Recovery vs Worker Registry + +## Decision + +- Keep SCAN-based discovery of processing lists for v0.4.0-alpha, instead of maintaining a registry of active workers or relying on keyspace notifications. + +## Rationale + +- Simplicity and robustness: SCAN requires no extra moving parts or configuration and tolerates sudden worker exits. +- Predictable load: bounded SCAN page sizes and periodic cadence maintain manageable overhead. + +## Tradeoffs + +- SCAN is O(N); at very large worker fleets, registry/notifications can reduce overhead. + +## Operational Defaults + +- Key pattern: `jobqueue:worker:*:processing` +- SCAN COUNT: 500–1000 per call +- Cadence: every 250ms with a 50ms sleep between pages +- Max pass time budget: 200ms per sweep + +## Revisit Criteria + +- If reaper CPU or Redis time spent on SCAN becomes material (observed via profiling/metrics) under expected fleet sizes. + +## Future Work + +- Add optional worker registry with TTL stored in Redis; reaper would iterate registry members and target per-worker keys directly. +- Consider Redis keyspace notifications where operationally acceptable.docs/08_deployment.md (1)
34-37
: Fix MD031 and align paths. This was already flagged.Add a blank line around the fence and use the repo’s ./bin and config/config.yaml paths.
-Config overrides via env vars. Example: -```bash -WORKER_COUNT=32 REDIS_ADDR=redis:6379 ./job-queue-system --role=worker --config=config.yaml -``` +Config overrides via env vars. Example: + +```bash +WORKER_COUNT=32 REDIS_ADDR=redis:6379 ./bin/job-queue-system --role=worker --config=config/config.yaml +```docs/13_release_versioning.md (1)
26-31
: Do not push all tags. Use annotated tags and push the one you mean.We already told you. Fix it.
-4) Tag release +4) Tag release (annotated) -```bash -git tag vX.Y.Z[-pre] -git push --tags -``` +```bash +git tag -a vX.Y.Z[-pre] -m "release vX.Y.Z[-pre]" +git push origin vX.Y.Z[-pre] +```docs/did-not-agree/priority-fetching-brpoplpush.md (1)
1-24
: Fix headings, cite the client behavior, and bound the timeout.Still MD032-ugly and vague on client semantics. Document the go-redis version and recommend a 200ms per-queue timeout to cap worst-case added latency.
-# Priority Fetching and BRPOPLPUSH Semantics - -Decision - -- Use per-queue BRPOPLPUSH with short timeout to emulate multi-queue priority rather than a single command returning queue name and value. - -Rationale - -- Redis does not provide multi-source BRPOPLPUSH. Looping priorities with a short timeout preserves atomic move semantics per queue and delivers predictable prioritization. -- go-redis returns only the value for BRPopLPush. We record the source queue implicitly by the loop order and use the known `srcQueue` when processing. - -Tradeoffs - -- Lower-priority jobs may incur up to the per-queue timeout in latency when higher-priority queues are empty. -- We do not rely on returned queue name; this is documented and tested. - -Revisit Criteria - -- If sub-second latency for low priority becomes unacceptable or we need multi-queue fairness beyond simple priority preference. - -Future Work - -- Explore a Lua-assisted sweep to pick the first non-empty queue without waiting the full timeout per queue in sequence. +# Priority Fetching and BRPOPLPUSH Semantics + +## Decision + +- Use per-queue BRPOPLPUSH with short timeout to emulate multi-queue priority rather than a single command returning queue name and value. + +## Rationale + +- Redis does not provide multi-source BRPOPLPUSH. Looping priorities with a short timeout preserves atomic move semantics per queue and delivers predictable prioritization. +- go-redis (BRPopLPush) returns only the value (document the exact version from go.mod). We record the source queue by loop order and use the known `srcQueue` when processing. + +## Tradeoffs + +- Lower-priority jobs may incur up to the per-queue timeout in latency when higher-priority queues are empty. +- We do not rely on returned queue name; this is documented and tested. + +## Operational Defaults + +- Recommended per-queue BRPOPLPUSH timeout: 200ms (caps worst-case added latency while keeping ops cost reasonable). + +## Revisit Criteria + +- If sub-second latency for low priority becomes unacceptable or we need multi-queue fairness beyond simple priority. + +## Future Work + +- Explore a Lua-assisted sweep to pick the first non-empty queue without waiting the full timeout per queue in sequence.To lock in the actual client version, run:
#!/bin/bash # Print go-redis dependency to cite exact behavior in docs rg -nP 'github\.com/(go-redis|redis)/redis' go.mod || true rg -n 'BRPopLPush' -g '!**/vendor/**' -S .docs/12_performance_baseline.md (1)
30-47
: Fix list/fence formatting and use auto-numbering.Standardize to “1.” and keep blank lines around fences. This keeps markdownlint quiet and humans sane.
-2) Copy `config/config.example.yaml` to `config/config.yaml` and set: +1. Copy `config/config.example.yaml` to `config/config.yaml` and set: - `worker.count`: 16 on a 4 vCPU node (adjust as needed) - `redis.addr`: "localhost:6379" -3) In one shell, run the worker +1. In one shell, run the worker + ```bash ./bin/job-queue-system --role=worker --config=config/config.yaml
-4) In another shell, run the bench (enqueue and wait for completion)
+1. In another shell, run the bench (enqueue and wait for completion)
+./bin/job-queue-system --role=admin --admin-cmd=bench \ --bench-count=2000 --bench-rate=1000 \ --bench-priority=low --bench-timeout=60s</blockquote></details> <details> <summary>docs/evidence/README.md (3)</summary><blockquote> `1-7`: **Fix the Markdown basics. Add blank lines around the heading and the list (MD022/MD032).** Your very first block violates the most basic markdownlint rules. ```diff # Evidence for v0.4.0-alpha Promotion + - CI run: see `ci_run.json` (contains URL to the successful workflow run) - Bench JSON: `bench.json` (admin bench with 1000 jobs at 500 rps) - Config used: `config.alpha.yaml` - Metrics snapshots: `metrics_before.txt`, `metrics_after.txt` +
8-41
: Make this a proper section and fix list/fence spacing (MD029/MD031/MD032).Stop abusing “1)” and jammed fences. Use a heading, “1.” style, and blank lines.
-Reproduce locally - -1) Ensure Redis is running on localhost:6379 - -```bash -docker run -p 6379:6379 --rm --name jobq-redis redis:7-alpine -``` - -2) Build the binary - -```bash -make build -``` - -3) Start worker - -```bash -./bin/job-queue-system --role=worker --config=docs/evidence/config.alpha.yaml -``` - -4) In another terminal, run bench - -```bash -./bin/job-queue-system --role=admin --config=docs/evidence/config.alpha.yaml \ - --admin-cmd=bench --bench-count=1000 --bench-rate=500 \ - --bench-priority=low --bench-timeout=60s -``` - -5) Capture metrics - -```bash -curl -sS localhost:9191/metrics | head -n 200 > docs/evidence/metrics_after.txt -``` +## Reproduce locally + +1. Ensure Redis is running on `localhost:6379`. + +```bash +docker run -p 6379:6379 --rm --name jobq-redis redis:7-alpine +``` + +2. Build the binary. + +```bash +make build +``` + +3. Start worker. + +```bash +./bin/job-queue-system --role=worker --config=docs/evidence/config.alpha.yaml +``` + +4. In another terminal, run bench. + +```bash +./bin/job-queue-system --role=admin --config=docs/evidence/config.alpha.yaml \ + --admin-cmd=bench --bench-count=1000 --bench-rate=500 \ + --bench-priority=low --bench-timeout=60s +``` + +5. Capture metrics. + +```bash +curl -sS localhost:9191/metrics | head -n 200 > docs/evidence/metrics_after.txt +```
58-69
: Make “Automated harness” a real section and fix spacing/fences.Basic formatting, please.
-Automated harness - -- A convenience script `docs/evidence/run_bench.sh` automates the above steps. -- Default params: COUNT=1000, RATE=500, PRIORITY=low, CONFIG=docs/evidence/config.alpha.yaml, PURGE=1. -- Example: - -```bash -bash docs/evidence/run_bench.sh -# or override -COUNT=2000 RATE=1000 PRIORITY=low bash docs/evidence/run_bench.sh -``` +## Automated harness + +- A convenience script `docs/evidence/run_bench.sh` automates the above steps. +- Default params: `COUNT=1000`, `RATE=500`, `PRIORITY=low`, `CONFIG=docs/evidence/config.alpha.yaml`, `PURGE=1`. +- Example: + +```bash +bash docs/evidence/run_bench.sh +# or override +COUNT=2000 RATE=1000 PRIORITY=low bash docs/evidence/run_bench.sh +```README.md (2)
99-107
: Consider adding a note that the port mapping must match config.People will copy/paste and then stare at a 404.
-docker run --rm -p 9090:9090 --env-file env.list job-queue-system:latest --role=all +docker run --rm -p 9090:9090 --env-file env.list job-queue-system:latest --role=all +# If using the evidence harness config (metrics_port=9191), map 9191 instead: +# docker run --rm -p 9191:9191 --env-file env.list job-queue-system:latest --role=all --config=docs/evidence/config.alpha.yaml
74-79
: Document the metrics port reality. Default is 9090, evidence harness uses 9191—say it here.Don’t force readers to diff configs to figure it out.
- - Prometheus metrics exposed at <http://localhost:9090/metrics> by default + - Prometheus metrics exposed at <http://localhost:9090/metrics> by default. + Evidence harness in docs uses 9191 via config override; adjust port mappings accordingly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (33)
.githooks/pre-commit
(1 hunks).github/workflows/changelog.yml
(1 hunks).github/workflows/goreleaser.yml
(1 hunks).github/workflows/markdownlint.yml
(1 hunks).markdownlint.yaml
(1 hunks)CHANGELOG.md
(1 hunks)Makefile
(1 hunks)README.md
(2 hunks)docs/00_assessment.md
(3 hunks)docs/01_product_roadmap.md
(1 hunks)docs/02_release_plan.md
(3 hunks)docs/03_milestones.md
(2 hunks)docs/04_sprint_plans.md
(5 hunks)docs/05_architecture.md
(4 hunks)docs/06_technical_spec.md
(4 hunks)docs/07_test_plan.md
(2 hunks)docs/08_deployment.md
(3 hunks)docs/09_requirements.md
(2 hunks)docs/10_risk_register.md
(1 hunks)docs/11_project_charter.md
(2 hunks)docs/12_performance_baseline.md
(1 hunks)docs/13_release_versioning.md
(1 hunks)docs/14_ops_runbook.md
(3 hunks)docs/15_promotion_checklists.md
(1 hunks)docs/PRD.md
(5 hunks)docs/did-not-agree/README.md
(1 hunks)docs/did-not-agree/go-toolchain-version.md
(1 hunks)docs/did-not-agree/metrics-scope.md
(1 hunks)docs/did-not-agree/priority-fetching-brpoplpush.md
(1 hunks)docs/did-not-agree/rate-limiter-fixed-window.md
(1 hunks)docs/did-not-agree/reaper-scan-vs-registry.md
(1 hunks)docs/evidence/README.md
(1 hunks)docs/testing-guide.md
(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
.github/workflows/markdownlint.yml
[warning] 3-3: truthy value should be one of [false, true]
(truthy)
[error] 6-6: too many spaces inside brackets
(brackets)
[error] 6-6: too many spaces inside brackets
(brackets)
🪛 checkmake (0.2.2)
Makefile
[warning] 33-33: Missing required phony target "clean"
(minphony)
🔇 Additional comments (13)
docs/06_technical_spec.md (2)
119-122
: Define breaker scope and enforce single-probe HalfOpen — implementation not found.File: docs/06_technical_spec.md (lines 119–122)
Docs list states but omit scope. Decide whether breakers are per-worker (in-process) or cluster-wide, and lock HalfOpen so only a single in-flight probe runs (one goroutine/process; use per-process lock or distributed coordination if cluster-wide). Your earlier search returned no matches — cannot verify implementation.
Apply:
- States: Closed → Open on threshold; Open → HalfOpen on cooldown; HalfOpen → Closed on success, else Open. +Scope: breaker is in-process (per worker instance). Half-open allows a single in-flight probe globally per breaker instance; additional calls fail-fast.
74-78
: Flag naming inconsistency — standardize --role value (all
vsall-in-one
)README mixes forms: prose uses "all-in-one" (README.md:3) while examples use
--role=all
(README.md:35,47,100). docs/06_technical_spec.md shows--role={producer|worker|all}
(lines 74–78). deploy/chart not found in repo — cannot verify Helm usage.
Action: pick a single canonical value and make it consistent across CLI help, code, Helm chart, and all docs. Also add this doc line: "Precedence: flags > env > config file. Missing file loads defaults; invalid values cause non-zero exit." Search & replace all--role
/ "all-in-one" occurrences and update examples to match the chosen form.docs/05_architecture.md (2)
20-54
: Make completion atomic — this is a data-loss riskThe diagram shows "LPUSH completed + LREM proc + DEL HB" as three separate ops (non‑atomic). A crash between them can mark a job completed while leaving processing entries/heartbeat behind. Either state these are executed atomically (Lua script or MULTI/EXEC) or justify why the non‑atomic ordering is safe and how failures are handled.
File: docs/05_architecture.md (lines 20–54)
- Exec-- success -->Done[LPUSH completed + LREM proc + DEL HB] + Exec-- success -->Done[Complete atomically (Lua/Tx): LPUSH completed; LREM proc; DEL HB]Add a short note under "Data Flows" describing the atomic completion mechanism or the failure/recovery guarantees.
74-82
: docs/05_architecture.md — sync stack details with the repo (do it now)
- Go: go.mod declares "go 1.25.0" (go.mod:3) — replace "Language: Go 1.21+" with "Language: Go 1.25.x".
- Container: Dockerfile (root) shows builder "FROM golang:1.21 AS build" (line 3) and runtime "FROM gcr.io/distroless/base-debian12:nonroot" (line 10) — update to the explicit distroless image and reconcile the builder vs go.mod.
- Redis client: no explicit github.com/redis/go-redis entry found in go.mod — do not hardcode "go-redis v8"; point readers to go.mod for the exact module path/version.
- Tracing: grep for otlptrace(http|grpc) in internal returned nothing — do not assert "OTLP HTTP exporter" until you confirm; change to "OpenTelemetry (OTLP; HTTP or gRPC — see internal/obs/tracing.go)" or update with the exact exporter after checking the implementation.
Suggested replacement diff:
- Language: Go 1.21+ - Redis Client: go-redis v8 - Tracing: OpenTelemetry (OTLP HTTP exporter) - Container: Distroless base image + Language: Go 1.25.x + Redis Client: see go.mod for exact module path & version + Tracing: OpenTelemetry (OTLP; HTTP or gRPC — align with internal/obs/tracing.go) + Container: gcr.io/distroless/base-debian12:nonroot (runtime); builder currently golang:1.21 (reconcile with go.mod)CHANGELOG.md (1)
5-16
: Stop hand-waving — make Unreleased actionable and linkable.
- Group entries under Added / Changed / Fixed (Keep a Changelog) and add a compare link for [Unreleased]; add a dated v0.4.0 section when you cut the release.
- Verified present in this PR:
- Config validation — internal/config/config.go (+ tests: internal/config/config_test.go)
- Worker active gauge ("worker_active") — internal/obs/metrics.go
- Health/readiness endpoints (/healthz, /readyz) — internal/obs/http.go
- govulncheck in CI — .github/workflows/ci.yml
- Not verified by the above checks (drop or add precise links/files/commits):
- Admin CLI: stats, peek, purge-dlq
- Queue length gauges updater
- Tracing propagation from job IDs
- Smarter rate limiting (TTL-based sleep + jitter)
- E2E tests with Redis service in CI
- Action: either remove unverifiable items from Unreleased or replace each with a link to the exact commit/file/PR that implements it.
docs/did-not-agree/README.md (1)
5-11
: Formatting fixed. Keep it that way.MD032 addressed; list now has proper blank lines. Good.
docs/15_promotion_checklists.md (1)
7-9
: /readyz implemented — resolved.Handler is registered in internal/obs/http.go (mux.HandleFunc("/readyz", ...)); README.md, docs/15_promotion_checklists.md, docs/14_ops_runbook.md, docs/PRD.md reference it; docs/evidence/run_bench.sh probes it; cmd/job-queue-system/main.go starts the HTTP server. No change required.
docs/evidence/README.md (1)
70-75
: LGTM: outputs list reads cleanly.README.md (4)
20-31
: LGTM: example config + build flow are sane and concise.
50-70
: LGTM: admin commands block is clear.Keep it in a single fenced block like this.
119-124
: LGTM: evidence artifacts section reads fine now.
111-118
: LGTM: status and promotion gates block formatted correctly.docs/PRD.md (1)
87-93
: Metrics port default mismatch — pick one port and document it.docs/PRD.md (lines 87–93) sets observability.metrics_port: 9090. My verification run failed (ripgrep: "No files were searched") — confirm whether README/examples/CI use 9191 or 9090 and make them consistent.
- Run: rg -nP --hidden -S 'metrics_port:\s*(9090|9191)|localhost:(9090|9191)/(metrics|healthz|readyz)' -g '!/node_modules/' and align docs/configs.
- Either change PRD to 9191 to match evidence, or keep 9090 and add one explicit sentence in PRD: "Default: 9090; examples/evidence use 9191."
uses: goreleaser/goreleaser-action@v6 | ||
with: | ||
distribution: goreleaser | ||
version: latest | ||
args: release --clean | ||
env: | ||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
“version: latest” is non-deterministic. Pin your toolchain.
Releases must be reproducible.
- - name: Run GoReleaser
- uses: goreleaser/goreleaser-action@v6
+ - name: Run GoReleaser
+ uses: goreleaser/goreleaser-action@v6
with:
distribution: goreleaser
- version: latest
+ version: v2.6.1 # pin a known-good Goreleaser version
args: release --clean
Also consider guarding workflow_dispatch to snapshots to avoid accidental publishes:
- args: release --clean
+ args: ${{ github.event_name == 'workflow_dispatch' && 'release --clean --skip=publish --snapshot' || 'release --clean' }}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
uses: goreleaser/goreleaser-action@v6 | |
with: | |
distribution: goreleaser | |
version: latest | |
args: release --clean | |
env: | |
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | |
- name: Run GoReleaser | |
uses: goreleaser/goreleaser-action@v6 | |
with: | |
distribution: goreleaser | |
version: v2.6.1 # pin a known-good Goreleaser version | |
args: ${{ github.event_name == 'workflow_dispatch' && 'release --clean --skip=publish --snapshot' || 'release --clean' }} | |
env: | |
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} |
🤖 Prompt for AI Agents
.github/workflows/goreleaser.yml lines 35-41: the action uses a
non-deterministic "version: latest" which prevents reproducible releases and
risks accidental publishes; replace "latest" with a pinned Goreleaser version
(e.g. a specific tag or semver like v1.22.0 or the exact commit/tag you
validated) so the toolchain is fixed, and add a guard to workflow_dispatch (for
example require a specific input like "type=snapshot" or restrict dispatch to a
snapshot branch or protect with a conditional that only runs on tagged or main
branch) to prevent accidental publishes.
- Use Go 1.25.x across CI and release workflows; go.mod set to 1.25.0. | ||
|
||
Rationale | ||
|
||
- Addresses govulncheck-reported stdlib CVEs affecting earlier versions. | ||
- CI and local builds succeed with 1.25.x in this environment; evidence (`go_info`) confirms runtime version. | ||
|
||
Tradeoffs | ||
|
||
- Requires builders to have Go 1.25 installed; older toolchains will not match CI. | ||
|
||
Revisit Criteria | ||
|
||
- If compatibility issues arise for consumers on earlier Go versions. | ||
|
||
Future Work | ||
|
||
- Consider matrix-testing across supported Go versions once stability goals are met. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
“1.25.x” is vague. Pin floor and verify CI actually uses it everywhere.
State minimum patch (e.g., 1.25.1+) and audit workflows.
-Decision
-
-- Use Go 1.25.x across CI and release workflows; go.mod set to 1.25.0.
+Decision
+
+- Use Go >=1.25.1 across CI and release workflows; go.mod set to 1.25.0 (module), toolchain pinned via workflows.
🏁 Script executed:
#!/bin/bash
rg -nP 'go-version:.*1\.25' .github/workflows -C1
rg -nP '^go\s+1\.25' go.mod
rg -nP 'setup-go|actions/setup-go' .github/workflows -n
Length of output: 970
Pin Go toolchain to ≥1.25.1 — stop using 1.25.x
.
- Found: .github/workflows/{ci.yml, release.yml, goreleaser.yml, changelog.yml} use
go-version: '1.25.x'
; go.mod containsgo 1.25.0
. - Action: Replace
1.25.x
with a pinned minimum patched release (e.g.,1.25.1
or an exact patch) in those workflow files and update docs/did-not-agree/go-toolchain-version.md to state the minimum; align or justifygo.mod
accordingly. - Verify: Ensure actions/setup-go@v5 resolves the pinned version in CI and re-run vuln checks (govulncheck).
🤖 Prompt for AI Agents
In docs/did-not-agree/go-toolchain-version.md around lines 5-22, the document
currently recommends "Use Go 1.25.x" but the review requires pinning to a
minimum patched release; update the text to state a pinned minimum (e.g., "Go >=
1.25.1" or a specific patch like "1.25.1"), and instruct the author to change
.github/workflows/{ci.yml, release.yml, goreleaser.yml, changelog.yml} to use
that pinned version (eg. go-version: '1.25.1'), then align go.mod (change `go
1.25.0` to `go 1.25.1` or add a short justification if you must keep 1.25.0),
ensure actions/setup-go@v5 resolves the pinned patch in CI, and re-run
govulncheck to verify the fix.
|
||
Decision | ||
|
||
- Added essential metrics (job counters, duration histogram, queue length, breaker state, trips, reaper recovered, worker_active). Defer additional metrics like worker restart count and Redis pool internals for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Name the actual metrics or they will drift from reality.
Readers need the exact metric names exported by the binary.
- Added essential metrics (job counters, duration histogram, queue length, breaker state, trips, reaper recovered, worker_active). Defer additional metrics like worker restart count and Redis pool internals for now.
+
+Metric names (current)
+- jobs_produced_total, jobs_consumed_total, jobs_completed_total, jobs_dlq_total
+- job_duration_seconds (Histogram)
+- queue_length{queue=...} (Gauge)
+- circuit_breaker_state{component=...} (Gauge: 0=Closed,1=Open,2=HalfOpen)
+- circuit_breaker_trips_total
+- reaper_recovered_total
+- worker_active (Gauge)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- Added essential metrics (job counters, duration histogram, queue length, breaker state, trips, reaper recovered, worker_active). Defer additional metrics like worker restart count and Redis pool internals for now. | |
- Added essential metrics (job counters, duration histogram, queue length, breaker state, trips, reaper recovered, worker_active). Defer additional metrics like worker restart count and Redis pool internals for now. | |
Metric names (current) | |
- jobs_produced_total, jobs_consumed_total, jobs_completed_total, jobs_dlq_total | |
- job_duration_seconds (Histogram) | |
- queue_length{queue=...} (Gauge) | |
- circuit_breaker_state{component=...} (Gauge: 0=Closed,1=Open,2=HalfOpen) | |
- circuit_breaker_trips_total | |
- reaper_recovered_total | |
- worker_active (Gauge) |
🤖 Prompt for AI Agents
In docs/did-not-agree/metrics-scope.md at line 5, the doc currently lists
generic metric categories but lacks the exact metric names exported by the
binary; update this line to enumerate the precise metric identifiers (e.g., job
counters like job_processed_total, job_failed_total; duration histogram like
job_duration_seconds_bucket, job_duration_seconds_sum,
job_duration_seconds_count; queue length like queue_length_gauge; breaker state
and trips like circuit_breaker_state, circuit_breaker_trips_total; reaper
recovered like reaper_recovered_total; worker_active like workers_active_gauge)
by pulling the names directly from the code/metrics registry, replace the vague
sentence with that exact list, and add a short note to keep metric names
synchronized with the binary (or reference the source file/version) so they
don't drift.
```bash | ||
docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine | ||
``` | ||
|
||
Run the e2e suite: | ||
|
||
```bash | ||
E2E_REDIS_ADDR=localhost:6379 go test ./test/e2e -race -count=1 -v | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Pin Redis and add a healthcheck. Don’t rely on whatever “latest-alpine” happens to be today.
Flaky e2e is self-inflicted if you don’t control the env.
-docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine
+docker run --rm -d --name jobq-redis -p 6379:6379 \
+ --health-cmd='redis-cli ping || exit 1' \
+ --health-interval=5s --health-timeout=3s --health-retries=10 \
+ redis:7.2-alpine
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```bash | |
docker run --rm -d --name jobq-redis -p 6379:6379 redis:7-alpine | |
``` | |
Run the e2e suite: | |
```bash | |
E2E_REDIS_ADDR=localhost:6379 go test ./test/e2e -race -count=1 -v | |
``` |
🤖 Prompt for AI Agents
In docs/testing-guide.md around lines 132–140, the guidance currently runs Redis
with an unpinned image and no healthcheck; update the example to pin a stable
Redis image (e.g., redis:7.0.11-alpine) and add Docker healthcheck flags
(health-cmd, health-interval, health-retries, health-timeout) to ensure Redis is
actually ready before starting tests, and add a short note to wait for the
container to report healthy (or poll docker inspect) prior to running the e2e
command.
Run the e2e suite: | ||
|
||
```bash | ||
E2E_REDIS_ADDR=localhost:6379 go test ./test/e2e -race -count=1 -v | ||
``` | ||
|
||
Run the single e2e test: | ||
|
||
```bash | ||
E2E_REDIS_ADDR=localhost:6379 go test ./test/e2e -run '^TestE2E_WorkerCompletesJobWithRealRedis$' -race -count=1 -v | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Document state cleanup or your “determinism” is a lie.
Backlog keys from prior runs skew results. Call your own admin tool.
-Run the e2e suite:
+Run the e2e suite (clean state):
```bash
-E2E_REDIS_ADDR=localhost:6379 go test ./test/e2e -race -count=1 -v
+E2E_REDIS_ADDR=localhost:6379 ./bin/admin purge-all --yes
+E2E_REDIS_ADDR=localhost:6379 go test ./test/e2e -race -count=1 -v
<details>
<summary>🤖 Prompt for AI Agents</summary>
In docs/testing-guide.md around lines 136 to 146, the e2e run instructions don't
clear leftover Redis state between runs causing flaky results; update the doc to
run the project's admin purge command before executing the tests (using the same
E2E_REDIS_ADDR env var) so prior backlog keys are removed, e.g. call ./bin/admin
purge-all --yes with E2E_REDIS_ADDR set immediately before the go test
invocation to ensure a clean state for deterministic tests.
</details>
<!-- fingerprinting:phantom:triton:chinchilla -->
<!-- This is an auto-generated comment by CodeRabbit -->
.PHONY: mdlint | ||
mdlint: | ||
@if ! command -v npx >/dev/null 2>&1; then \ | ||
echo "npx not found. Please install Node.js to run markdownlint."; \ | ||
exit 1; \ | ||
fi | ||
@npx -y markdownlint-cli2 "**/*.md" "!**/node_modules/**" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add a clean target. Don’t ship a Makefile without it.
Satisfies checkmake and basic developer hygiene.
.PHONY: mdlint
mdlint:
@if ! command -v npx >/dev/null 2>&1; then \
echo "npx not found. Please install Node.js to run markdownlint."; \
exit 1; \
fi
@npx -y markdownlint-cli2 "**/*.md" "!**/node_modules/**"
+
+.PHONY: clean
+clean:
+ rm -rf bin
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
.PHONY: mdlint | |
mdlint: | |
@if ! command -v npx >/dev/null 2>&1; then \ | |
echo "npx not found. Please install Node.js to run markdownlint."; \ | |
exit 1; \ | |
fi | |
@npx -y markdownlint-cli2 "**/*.md" "!**/node_modules/**" | |
.PHONY: mdlint | |
mdlint: | |
@if ! command -v npx >/dev/null 2>&1; then \ | |
echo "npx not found. Please install Node.js to run markdownlint."; \ | |
exit 1; \ | |
fi | |
@npx -y markdownlint-cli2 "**/*.md" "!**/node_modules/**" | |
.PHONY: clean | |
clean: | |
rm -rf bin |
🧰 Tools
🪛 checkmake (0.2.2)
[warning] 33-33: Missing required phony target "clean"
(minphony)
🤖 Prompt for AI Agents
In Makefile around lines 33 to 39, there's no clean target; add a .PHONY: clean
and implement a clean target that removes common generated artifacts (e.g.,
build/dist directories, temporary and cache files like .cache, .venv, *.pyc,
node_modules/.cache, and any markdownlint or other tool caches) so developers
can restore a pristine repo state; keep the recipe idempotent and safe (use rm
-rf for the known artifact paths only).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review continued from previous batch...
|
||
- Redis pool: `redis.pool_size_multiplier` (default 10*NumCPU). Increase for higher concurrency; monitor Redis CPU. | ||
- Timeouts: `redis.read_timeout`/`write_timeout` (default 3s). Too low yields errors under load; too high slows failure detection. | ||
- Worker concurrency: `worker.count`. Increase up to CPU saturation; watch goroutine scheduling and Redis ops. | ||
- Backoff parameters: for retry behavior; not relevant for the baseline success path. | ||
- Priority timeout: `worker.brpoplpush_timeout` (default 1s). Smaller values reduce low-priority latency but add Redis ops. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Recommend a sane default for priority timeout.
1s default is high; call out 200ms for better priority fairness.
- Priority timeout: `worker.brpoplpush_timeout` (default 1s). Smaller values reduce low-priority latency but add Redis ops.
+- Priority timeout: `worker.brpoplpush_timeout` (default 1s; recommended 200ms in production to bound added latency).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- Redis pool: `redis.pool_size_multiplier` (default 10*NumCPU). Increase for higher concurrency; monitor Redis CPU. | |
- Timeouts: `redis.read_timeout`/`write_timeout` (default 3s). Too low yields errors under load; too high slows failure detection. | |
- Worker concurrency: `worker.count`. Increase up to CPU saturation; watch goroutine scheduling and Redis ops. | |
- Backoff parameters: for retry behavior; not relevant for the baseline success path. | |
- Priority timeout: `worker.brpoplpush_timeout` (default 1s). Smaller values reduce low-priority latency but add Redis ops. | |
- Redis pool: `redis.pool_size_multiplier` (default 10*NumCPU). Increase for higher concurrency; monitor Redis CPU. | |
- Timeouts: `redis.read_timeout`/`write_timeout` (default 3s). Too low yields errors under load; too high slows failure detection. | |
- Worker concurrency: `worker.count`. Increase up to CPU saturation; watch goroutine scheduling and Redis ops. | |
- Backoff parameters: for retry behavior; not relevant for the baseline success path. | |
- Priority timeout: `worker.brpoplpush_timeout` (default 1s; recommended 200ms in production to bound added latency). |
🤖 Prompt for AI Agents
In docs/12_performance_baseline.md around lines 50–56, the priority timeout item
uses a 1s default which is too high for fairness; update the text to recommend a
sane default of 200ms (with a short note that smaller values improve priority
fairness but increase Redis ops), replace the current line mentioning
`worker.brpoplpush_timeout` default 1s with the new 200ms recommendation, and
add a one-line guidance to monitor Redis CPU/operation rate when reducing this
value.
- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress. | ||
- To avoid stale backlog affecting evidence, clear test keys before running a bench: | ||
|
||
```bash | ||
redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter | ||
redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stop telling users to run KEYS and mass DEL. That’s operationally reckless. Use SCAN and UNLINK.
Blocking KEYS and huge DEL will nuke Redis latency. Fix it.
-- To avoid stale backlog affecting evidence, clear test keys before running a bench:
-
-```bash
-redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter
-redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL
-```
+- To avoid stale backlog affecting evidence, clear test keys before running a bench (non-blocking):
+
+```bash
+# Known small lists can be unlinked directly (async delete):
+redis-cli UNLINK jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter
+# Never use KEYS in prod. Scan and unlink in sane batches:
+redis-cli --scan --pattern 'jobqueue:worker:*:processing' | xargs -r -n 100 redis-cli UNLINK
+```
+
+WARNING: Do not run purges against production prefixes. Use isolated test namespaces and double-check the key prefix.
🤖 Prompt for AI Agents
In docs/evidence/README.md around lines 44-50, replace the instructions that use
KEYS and DEL with guidance to use UNLINK for known small lists (async delete)
and to use SCAN in batched iterations with UNLINK for matching keys instead of
blocking KEYS/DEL; update the example commands to show using redis-cli --scan
piped to xargs with a reasonable batch size (and -r to handle no matches) and
add a clear WARNING to never run purges against production prefixes—use isolated
test namespaces and double-check key prefixes before deleting.
- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress. | ||
- To avoid stale backlog affecting evidence, clear test keys before running a bench: | ||
|
||
```bash | ||
redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter | ||
redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL | ||
``` | ||
|
||
- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port. | ||
|
||
Notes | ||
|
||
- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
One more thing: call out that admin bench bypasses producer metrics and how to validate throughput.
Add a sentence to save future confusion.
- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress.
+- Corollary: to validate end-to-end producer metrics, run a producer process instead of `admin bench` and watch `jobs_produced_total`.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress. | |
- To avoid stale backlog affecting evidence, clear test keys before running a bench: | |
```bash | |
redis-cli DEL jobqueue:high_priority jobqueue:low_priority jobqueue:completed jobqueue:dead_letter | |
redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL | |
``` | |
- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port. | |
Notes | |
- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there. | |
- The admin `bench` command enqueues jobs directly (it does LPUSH), so `jobs_produced_total` will remain 0 in this harness; use `jobs_consumed_total`/`jobs_completed_total` and queue lengths to assess throughput and progress. | |
- Corollary: to validate end-to-end producer metrics, run a producer process instead of `admin bench` and watch `jobs_produced_total`. | |
- To avoid stale backlog affecting evidence, clear test keys before running a bench: | |
🤖 Prompt for AI Agents
In docs/evidence/README.md around lines 44 to 57, the text doesn't explicitly
state that the admin `bench` command enqueues jobs directly and therefore
bypasses producer-side metrics; update the paragraph to add a clear sentence
explaining that the admin `bench` uses direct LPUSH (bypassing producer metrics
like jobs_produced_total) and instruct readers to validate throughput using
jobs_consumed_total or jobs_completed_total and queue length checks (and to use
the metrics port 9191 as noted); place this sentence near the existing note
about jobs_produced_total so it’s immediately visible when discussing which
metrics to use.
- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port. | ||
|
||
Notes | ||
|
||
- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Be explicit about the 9090 vs 9191 split. Readers shouldn’t have to guess.
State that default is 9090, but evidence harness uses 9191 via config.
-- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port.
+- Default metrics port is `9090`. This harness uses `9191` via `observability.metrics_port` in `config.alpha.yaml`. Ensure your curl/port mappings match the chosen config.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port. | |
Notes | |
- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there. | |
Default metrics port is `9090`. This harness uses `9191` via `observability.metrics_port` in `config.alpha.yaml`. Ensure your curl/port mappings match the chosen config. | |
Notes | |
- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there. |
🤖 Prompt for AI Agents
In docs/evidence/README.md around lines 52 to 57, the README currently mentions
metrics port 9191 but doesn't explicitly state the common default; update the
text to clearly say that the Prometheus default port is 9090 while this evidence
harness overrides it to 9191 (as set by observability.metrics_port in
config.alpha.yaml), and adjust the curl example text to explicitly instruct
readers to use port 9191 for this harness (and note where to change it if using
the default 9090).
redis-cli KEYS 'jobqueue:worker:*:processing' | xargs -n 50 redis-cli DEL | ||
``` | ||
|
||
- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add a cross-reference to README metrics section.
Reduce duplication and drift.
-- Default metrics port is `9090`. This harness uses `9191` via `observability.metrics_port` in `config.alpha.yaml`. Ensure your curl/port mappings match the chosen config.
+- Default metrics port is `9090`. This harness uses `9191` via `observability.metrics_port` in `config.alpha.yaml`. Ensure your curl/port mappings match the chosen config. See README “Metrics” for defaults and overrides.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- The metrics port in this harness is `9191` (see `observability.metrics_port` in config.alpha.yaml). Ensure your curl commands match this port. | |
- Default metrics port is `9090`. This harness uses `9191` via `observability.metrics_port` in `config.alpha.yaml`. Ensure your curl/port mappings match the chosen config. See README "Metrics" for defaults and overrides. |
🤖 Prompt for AI Agents
In docs/evidence/README.md around line 52, the sentence about the metrics port
is duplicated; add a cross-reference to the README metrics section instead of
repeating the port number to reduce duplication and drift. Replace the explicit
port note with a short pointer like "See the metrics section in this README for
the configured port (observability.metrics_port in config.alpha.yaml)" or link
to the metrics subsection header so readers are directed to the canonical
location and only that section maintains the actual port detail.
### Producer | ||
|
||
- Scan directory recursively using include/exclude globs. | ||
- Determine priority by extension list. | ||
- Rate limiting: INCR rate_limit_key; if first increment, set EX=1; if value > rate_limit_per_sec, `TTL`-based precise sleep (with jitter) until window reset before enqueueing more. | ||
- LPUSH job JSON to priority queue. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Rate limiter: use PTTL for ms precision; TTL is coarse and off-by-one-prone.
Your “precise sleep” isn’t precise if you read TTL in seconds.
-- Rate limiting: INCR rate_limit_key; if first increment, set EX=1; if value > rate_limit_per_sec, `TTL`-based precise sleep (with jitter) until window reset before enqueueing more.
+- Rate limiting: INCR rate_limit_key; if first increment, set EX=1; if value > rate_limit_per_sec, use `PTTL` to sleep precisely (with jitter) until window reset before enqueueing more.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
### Producer | |
- Scan directory recursively using include/exclude globs. | |
- Determine priority by extension list. | |
- Rate limiting: INCR rate_limit_key; if first increment, set EX=1; if value > rate_limit_per_sec, `TTL`-based precise sleep (with jitter) until window reset before enqueueing more. | |
- LPUSH job JSON to priority queue. | |
### Producer | |
- Scan directory recursively using include/exclude globs. | |
- Determine priority by extension list. | |
- Rate limiting: INCR rate_limit_key; if first increment, set EX=1; if value > rate_limit_per_sec, use `PTTL` to sleep precisely (with jitter) until window reset before enqueueing more. | |
- LPUSH job JSON to priority queue. |
🤖 Prompt for AI Agents
In docs/PRD.md around lines 125 to 131, the rate-limiter description currently
reads that the producer should use TTL to decide a "precise sleep" which is
coarse and can be off-by-one; change the design to use Redis PTTL (millisecond
precision) when computing remaining window time, compute sleep = max(PTTL, 0) +
jitter_ms (random small ms) and sleep that many milliseconds before enqueueing,
and document handling for missing/negative PTTL (treat as 0 and
recheck/increment atomically) so the loop uses millisecond-accurate waits rather
than second-granularity TTL.
### Worker Fetch | ||
|
||
- Unique worker ID: "hostname-PID-idx" for each goroutine. | ||
- Prioritized fetch: loop priorities in order (e.g., high then low) and call `BRPOPLPUSH` per-queue with a short timeout (default 1s). Guarantees atomic move per-queue, priority preference within timeout granularity, and no job loss. Tradeoff: lower-priority jobs may wait up to the timeout when higher-priority queues are empty. | ||
- On receipt, SET heartbeat key to job JSON with EX=heartbeat_ttl. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use BLMOVE where supported; BRPOPLPUSH is the fossil.
Modern Redis has BLMOVE. Call it out and prefer it when available.
-- Prioritized fetch: loop priorities in order (e.g., high then low) and call `BRPOPLPUSH` per-queue with a short timeout (default 1s). Guarantees atomic move per-queue, priority preference within timeout granularity, and no job loss. Tradeoff: lower-priority jobs may wait up to the timeout when higher-priority queues are empty.
+- Prioritized fetch: loop priorities in order (e.g., high then low) and call `BLMOVE` (Redis ≥6.2) or fallback to `BRPOPLPUSH` per-queue with a short timeout (default 1s). This guarantees atomic move per-queue, priority preference within timeout granularity, and no job loss. Tradeoff: lower-priority jobs may wait up to the timeout when higher-priority queues are empty.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
### Worker Fetch | |
- Unique worker ID: "hostname-PID-idx" for each goroutine. | |
- Prioritized fetch: loop priorities in order (e.g., high then low) and call `BRPOPLPUSH` per-queue with a short timeout (default 1s). Guarantees atomic move per-queue, priority preference within timeout granularity, and no job loss. Tradeoff: lower-priority jobs may wait up to the timeout when higher-priority queues are empty. | |
- On receipt, SET heartbeat key to job JSON with EX=heartbeat_ttl. | |
### Worker Fetch | |
- Unique worker ID: "hostname-PID-idx" for each goroutine. | |
- Prioritized fetch: loop priorities in order (e.g., high then low) and call `BLMOVE` (Redis ≥6.2) or fallback to `BRPOPLPUSH` per-queue with a short timeout (default 1s). This guarantees atomic move per-queue, priority preference within timeout granularity, and no job loss. Tradeoff: lower-priority jobs may wait up to the timeout when higher-priority queues are empty. | |
- On receipt, SET heartbeat key to job JSON with EX=heartbeat_ttl. | |
🤖 Prompt for AI Agents
In docs/PRD.md around lines 132 to 137, the fetch design currently prescribes
BRPOPLPUSH; update the text to prefer BLMOVE (blocking variant of LMOVE) where
Redis supports it, and mention falling back to BRPOPLPUSH on older servers.
State the advantages of BLMOVE (atomic blocking move without timeout granularity
tradeoffs) and add a short note about feature-detection at startup or
client-level capability checks to choose BLMOVE first and only use BRPOPLPUSH as
a compatibility fallback.
- On success: LPUSH completed_list job JSON; LREM processing_list 1 job; DEL heartbeat key. | ||
- On failure: increment Retries in payload; exponential backoff; if retries <= max_retries LPUSH back to original priority queue; else LPUSH dead_letter_list; in both cases LREM from processing_list and DEL heartbeat. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
You’re going to leak memory into “completed” forever. Define retention.
Add trimming or archival guidance. Don’t hand-wave it.
- On success: LPUSH completed_list job JSON; LREM processing_list 1 job; DEL heartbeat key.
+ - On success: LPUSH completed_list job JSON; LREM processing_list 1 job; DEL heartbeat key.
+ Implement retention: e.g., `LTRIM completed_list 0 <N-1>` or move completions to a capped stream; document operational caps.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- On success: LPUSH completed_list job JSON; LREM processing_list 1 job; DEL heartbeat key. | |
- On failure: increment Retries in payload; exponential backoff; if retries <= max_retries LPUSH back to original priority queue; else LPUSH dead_letter_list; in both cases LREM from processing_list and DEL heartbeat. | |
- On success: LPUSH completed_list job JSON; LREM processing_list 1 job; DEL heartbeat key. | |
Implement retention: e.g., `LTRIM completed_list 0 <N-1>` or move completions to a capped stream; document operational caps. | |
- On failure: increment Retries in payload; exponential backoff; if retries <= max_retries LPUSH back to original priority queue; else LPUSH dead_letter_list; in both cases LREM from processing_list and DEL heartbeat. |
🤖 Prompt for AI Agents
In docs/PRD.md around lines 142–144, the process currently LPUSHes every
successful job into the completed_list with no retention or archival, which will
grow unbounded; add a concrete retention policy and actionable trimming/archival
steps: specify a max completed_list length (e.g. COMPLETED_LIST_MAX_ITEMS) and
recommend trimming after each LPUSH with LTRIM to that length, or alternatively
export/rotate older entries to an archival store (S3/Blob/DB) using a background
hopper job that moves items older than X days (CONFIG_COMPLETED_RETENTION_DAYS)
and then LTRIM/DEL them from Redis; document defaults, config names, and the
recommended background/cron frequency and safety steps (atomic RPOPLPUSH or Lua
script to move batches) so implementers can add periodic archival and size-based
trimming instead of letting completed_list grow forever.
|
||
- Periodically scan all heartbeat keys matching pattern. For each missing/expired heartbeat, recover jobs lingering in processing lists: | ||
- For each worker processing list, if list has elements and the corresponding heartbeat key is absent, pop jobs (LPOP) one by one, inspect priority within payload, and LPUSH back to the appropriate priority queue. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Reaper needs bounds and pacing, or it will DOS your Redis. Document it.
State scan cadence, batch sizes, and backoff, not just “periodically scan.”
- Periodically scan all heartbeat keys matching pattern. For each missing/expired heartbeat, recover jobs lingering in processing lists:
+ - Reaper cadence/bounds: limit SCAN per cycle, cap recovered jobs per iteration, and sleep between batches to avoid saturating Redis. Expose these as config.
- For each worker processing list, if list has elements and the corresponding heartbeat key is absent, pop jobs (LPOP) one by one, inspect priority within payload, and LPUSH back to the appropriate priority queue.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- Periodically scan all heartbeat keys matching pattern. For each missing/expired heartbeat, recover jobs lingering in processing lists: | |
- For each worker processing list, if list has elements and the corresponding heartbeat key is absent, pop jobs (LPOP) one by one, inspect priority within payload, and LPUSH back to the appropriate priority queue. | |
- Reaper cadence/bounds: limit SCAN per cycle, cap recovered jobs per iteration, and sleep between batches to avoid saturating Redis. Expose these as config. | |
- For each worker processing list, if list has elements and the corresponding heartbeat key is absent, pop jobs (LPOP) one by one, inspect priority within payload, and LPUSH back to the appropriate priority queue. |
🤖 Prompt for AI Agents
In docs/PRD.md around lines 150 to 153, the reaper description currently says
only "periodically scan" and pop jobs without bounds; update the doc to specify
concrete pacing and limits: define a scan cadence (e.g., every N
seconds/minutes), a max number of heartbeat keys inspected per scan (batch
size), a max number of jobs reaped per worker per batch, and an exponential or
fixed backoff when no stale heartbeats are found or when Redis errors occur;
also document inter-batch sleep to throttle LPOP/LPUSH operations and recommend
configurable defaults and knobs (scan_interval, batch_size, max_reap_per_worker,
backoff_base) so consumers can tune to avoid DOSing Redis.
### Metrics | ||
|
||
- Prometheus metrics exposed at http://localhost:9090/metrics by default | ||
- Prometheus metrics exposed at <http://localhost:9090/metrics> by default | ||
|
||
### Health and Readiness | ||
|
||
- Liveness: http://localhost:9090/healthz returns 200 when the process is up | ||
- Readiness: http://localhost:9090/readyz returns 200 only when Redis is reachable | ||
- Liveness: <http://localhost:9090/healthz> returns 200 when the process is up | ||
- Readiness: <http://localhost:9090/readyz> returns 200 only when Redis is reachable | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify ports across docs/configs; fix any stragglers to avoid support tickets.
Automate the check and kill the drift now.
Also applies to: 90-107, 119-124
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Find metrics ports mentioned in configs and docs =="
rg -nP 'metrics?_?port.*(9090|9191)' -g '!**/node_modules/**' -C2 || true
echo
echo "== Find literal localhost metrics URLs =="
rg -nP 'localhost:(9090|9191)/metrics' -g '!**/node_modules/**' -C2 || true
echo
echo "== Show all occurrences of 9090/9191 in README and evidence docs =="
rg -nP '9090|9191' README.md docs/evidence -n -C2 || true
echo
echo "Expected:"
echo "- README: default 9090 noted; explicit note that evidence uses 9191."
echo "- docs/evidence/README.md: clearly states 9191 override and cross-links README."
Length of output: 2649
Stop port drift: README = 9090, evidence harness = 9191 — document the intent and add a CI guard.
- Findings: README.md references localhost:9090 (metrics/health and docker run — README.md lines ~74–79, 100–101). Evidence harness intentionally uses 9191 (docs/evidence/config.alpha.yaml & config.used.yaml: observability.metrics_port: 9191; docs/evidence/README.md: curl localhost:9191/metrics; docs/evidence/run_bench.sh: fallback 9191).
- Required changes: (1) Update docs/evidence/README.md to explicitly state the 9191 override and add a cross‑link to the top-level README; (2) Add a CI lint (simple ripgrep check) that fails on undocumented mixed metrics ports to prevent future drift.
🤖 Prompt for AI Agents
In README.md lines ~72–80 and related evidence files, the top-level README still
documents Prometheus/health endpoints on :9090 while the evidence harness
intentionally uses :9191; update docs/evidence/README.md to explicitly state
that the evidence harness overrides observability ports to 9191 and add a clear
cross-link back to the top-level README so readers understand the override, and
add a CI guard (e.g., a small ripgrep-based lint job in the repo CI config) that
fails when files mention mixed metrics ports without documentation to prevent
future drift.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (8)
.vscode/settings.json (3)
12-15
: Kill the deprecated gopls flag. JSON is not a graveyard for removed settings.
"build.experimentalWorkspaceModule"
has been deprecated/removed for ages. Delete it."gopls": { "ui.semanticTokens": true, - "build.experimentalWorkspaceModule": true },
16-16
: Defaulting to-race
on every test run is self-inflicted pain.Keep it for CI; make local default fast (
["-count=1"]
) and add a separate “race” profile/task.- "go.testFlags": ["-race", "-count=1"], + "go.testFlags": ["-count=1"],Add a VS Code Test Profile or task for race runs invoking
["-race","-count=1"]
.
1-18
: This JSON is broken. Duplicate “[go]” blocks and mismatched braces — it won’t even parse. Fix it.You have two “[go]” sections (Lines 3 and 5) and an extra “}” (Line 11). Clean this up and merge the settings into a single block.
Apply this diff to produce valid JSON:
-{ - "editor.formatOnSave": false, - "[go]": { - "editor.formatOnSave": true, -"[go]": { - "editor.formatOnSave": true, - "editor.codeActionsOnSave": { - "source.organizeImports": "explicit" - } -}, - }, - "gopls": { - "ui.semanticTokens": true, - "build.experimentalWorkspaceModule": true - }, - "go.testFlags": ["-race", "-count=1"], - "go.toolsManagement.autoUpdate": true -} +{ + "editor.formatOnSave": false, + "[go]": { + "editor.formatOnSave": true, + "editor.codeActionsOnSave": { + "source.organizeImports": "explicit" + } + }, + "gopls": { + "ui.semanticTokens": true + }, + "go.testFlags": ["-race", "-count=1"], + "go.toolsManagement.autoUpdate": true +}.github/workflows/goreleaser.yml (3)
24-27
: Keep this step — it feeds .goreleaser.yaml.You’re exporting GITHUB_REPOSITORY_OWNER/NAME used by release.github.owner/name. The earlier advice to delete this is obsolete.
38-45
: Pin your damn GoReleaser version and guard manual runs.“latest” is non-deterministic. Also, workflow_dispatch should default to snapshot to avoid accidental publishes.
- name: Run GoReleaser uses: goreleaser/goreleaser-action@v6 with: distribution: goreleaser - version: latest - args: release --clean + version: v2.6.1 + args: ${{ github.event_name == 'workflow_dispatch' && 'release --clean --snapshot --skip=publish' || 'release --clean' }} env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}Optionally harden the job to only publish on tags:
jobs: release: runs-on: ubuntu-latest + if: startsWith(github.ref, 'refs/tags/')
9-12
: Tighten permissions and enable OIDC for provenance.Stop handing the workflow a bazooka. Scope it:
- Add id-token: write for signing/provenance.
- Keep only what’s needed.
permissions: - contents: write - packages: write + contents: write # needed to create GitHub Releases + packages: write # needed to push to GHCR + id-token: write # needed for keyless signing/provenance (cosign/SLSA).goreleaser.yaml (2)
12-13
: Harden builds: strip paths and VCS noise.Trim build paths and turn off VCS stamping for reproducibility.
ldflags: - - -s -w -X main.version={{.Version}} + - -s -w -trimpath -buildvcs=false -X main.version={{.Version}}
30-37
: You say “multi-arch”, but this config won’t build it. Fix the Docker section.Without buildx + platforms here, you’ll ship whatever arch the runner happens to be. Add platforms and basic OCI labels.
dockers: - image_templates: - ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:{{ .Version }} - ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:latest dockerfile: Dockerfile + use: buildx + platforms: + - linux/amd64 + - linux/arm64 build_flag_templates: - "--pull" + labels: + - "org.opencontainers.image.source={{.GitURL}}" + - "org.opencontainers.image.revision={{.FullCommit}}" + - "org.opencontainers.image.version={{.Version}}"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/goreleaser.yml
(1 hunks).goreleaser.yaml
(1 hunks).vscode/settings.json
(1 hunks)
🧰 Additional context used
🪛 Biome (2.1.2)
.vscode/settings.json
[error] 11-11: Expected a property but instead found '}'.
Expected a property here.
(parse)
🪛 YAMLlint (1.37.1)
.goreleaser.yaml
[error] 47-47: too many blank lines (1 > 0)
(empty-lines)
🔇 Additional comments (5)
.github/workflows/goreleaser.yml (3)
17-21
: Checkout config looks correct.Full history + tags are fetched. GoReleaser and changelog won’t choke.
28-37
: QEMU + Buildx: finally something done right.Multi-arch builds won’t magically happen without these. Good.
21-23
: Cache Go modules or enjoy slow builds. Your call.Enable caching in setup-go.
- uses: actions/setup-go@v5 with: go-version: '1.25.x' + cache: true
Likely an incorrect or invalid review comment.
.goreleaser.yaml (2)
15-23
: Windows zip override: good.Much better UX than tarballs on Windows.
38-43
: Release owner/name quoting: fixed.This now passes linters and works. Ship it.
- image_templates: | ||
- ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:{{ .Version }} | ||
- ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Don’t slap :latest on alpha builds.
Tagging “latest” for pre-releases is how you torch users. Gate “latest” to stable releases only (e.g., via templating/conditions or a separate docker entry that runs only when not prerelease). Verify the exact GoReleaser field to key off (e.g., .Prerelease or similar) and wire it accordingly.
🌐 Web query:
How to conditionally add the "latest" Docker tag only for non-prerelease builds in GoReleaser v2? Provide the recommended config snippet.
💡 Result:
Use GoReleaser template conditionals in image_templates to emit :latest only when .Prerelease is empty (i.e., not a prerelease). Example config snippet:
.goreleaser.yaml
dockers:
- image_templates:
- "myuser/myimage:{{ .Tag }}"
- '{{ if eq .Prerelease "" }}myuser/myimage:latest{{ end }}'
References: GoReleaser Docker docs [1], Template variables (.Prerelease) [2].
DO NOT tag prereleases with :latest — gate :latest to stable releases.
File: .goreleaser.yaml (lines 31–33) — replace the unconditional :latest with a template conditional that only emits :latest when .Prerelease is empty:
- image_templates:
- ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:{{ .Version }}
- '{{ if eq .Prerelease "" }}ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:latest{{ end }}'
This uses .Prerelease to ensure only stable releases get :latest.
🤖 Prompt for AI Agents
.goreleaser.yaml around lines 31 to 33: the image_templates list unconditionally
emits a :latest tag which will tag prereleases; modify the template to only emit
the :latest entry when .Prerelease is empty by replacing the unconditional
:latest line with a conditional template that checks eq .Prerelease "" and only
outputs the ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:latest entry in that case,
keeping the version-tag line unchanged.
prerelease: auto | ||
|
||
snapshot: | ||
name_template: SNAPSHOT-{{ .ShortCommit }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Kill the stray trailing blank line.
YAMLlint already complained. Remove it.
- name_template: SNAPSHOT-{{ .ShortCommit }}
-
+ name_template: SNAPSHOT-{{ .ShortCommit }}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
name_template: SNAPSHOT-{{ .ShortCommit }} | |
name_template: SNAPSHOT-{{ .ShortCommit }} |
🤖 Prompt for AI Agents
In .goreleaser.yaml around line 46, there is a stray trailing blank line after
the "name_template: SNAPSHOT-{{ .ShortCommit }}" entry; remove that empty line
so the file has no extra blank line at the end (save the file without the
trailing newline) to satisfy yamllint.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
.githooks/pre-commit (1)
25-27
: Don’t “warn and continue” when the linter runtime is missing. Block the commit.Skipping the gate because npx isn’t installed is useless theater. Fail hard.
Apply this diff:
else - echo "npx not found. Skipping markdownlint autofix. Install Node.js to enable autofix." >&2 + echo "npx not found. Markdown lint is required. Install Node.js (>=18) to proceed (or wire a pnpm/yarn/bun fallback)." >&2 + exit 1 fi
mapfile -d '' -t _staged < <(git diff --cached --name-only --diff-filter=ACM -z) | ||
md_files=() | ||
for f in "${_staged[@]}"; do | ||
[[ "$f" =~ \.(md|MD|mdx|MDX)$ ]] && md_files+=("$f") | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
This breaks on macOS’ default Bash (3.2). Ditch mapfile-only path and cover renames.
Bash 3.2 lacks mapfile; your hook dies on vanilla macs. Also you ignore renamed/type-changed files.
Apply this diff:
-mapfile -d '' -t _staged < <(git diff --cached --name-only --diff-filter=ACM -z)
+_staged=()
+if type -t mapfile >/dev/null 2>&1; then
+ mapfile -d '' -t _staged < <(git diff --cached --name-only --diff-filter=ACMRT -z)
+else
+ # Fallback compatible with Bash 3.2
+ while IFS= read -r -d $'\0' f; do _staged+=("$f"); done < <(git diff --cached --name-only --diff-filter=ACMRT -z)
+fi
md_files=()
for f in "${_staged[@]}"; do
[[ "$f" =~ \.(md|MD|mdx|MDX)$ ]] && md_files+=("$f")
done
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
mapfile -d '' -t _staged < <(git diff --cached --name-only --diff-filter=ACM -z) | |
md_files=() | |
for f in "${_staged[@]}"; do | |
[[ "$f" =~ \.(md|MD|mdx|MDX)$ ]] && md_files+=("$f") | |
done | |
_staged=() | |
if type -t mapfile >/dev/null 2>&1; then | |
mapfile -d '' -t _staged < <(git diff --cached --name-only --diff-filter=ACMRT -z) | |
else | |
# Fallback compatible with Bash 3.2 | |
while IFS= read -r -d $'\0' f; do | |
_staged+=("$f") | |
done < <(git diff --cached --name-only --diff-filter=ACMRT -z) | |
fi | |
md_files=() | |
for f in "${_staged[@]}"; do | |
[[ "$f" =~ \.(md|MD|mdx|MDX)$ ]] && md_files+=("$f") | |
done |
🤖 Prompt for AI Agents
.githooks/pre-commit around lines 5 to 9: the hook uses Bash-only mapfile which
breaks on macOS Bash 3.2 and also omits renamed/type-changed files; replace the
mapfile pipeline with a portable while IFS= read -r -d '' loop that reads git
diff --cached --name-only --diff-filter=ACMR -z output into an array (preserving
filenames with spaces/newlines), and change the git diff --diff-filter flags to
include R and T (ACMR) so renamed/type-changed files are captured; ensure the
resulting md_files detection loop remains the same and works with the populated
array.
echo "[pre-commit] markdownlint-cli2 --fix on staged Markdown files" | ||
if command -v npx >/dev/null 2>&1; then | ||
# Run with --fix so minor issues are auto-corrected | ||
npx -y markdownlint-cli2 --fix "${md_files[@]}" || true | ||
# Re-stage any modified files | ||
git add -- "${md_files[@]}" || true | ||
# Verify no errors remain; block commit if they do | ||
if ! npx -y markdownlint-cli2 "${md_files[@]}"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Optional: use lint-staged and stop reinventing half a pre-commit framework.
Let a battle-tested tool handle staged-only semantics and partial hunks. Keep this hook as a thin wrapper.
Minimal change inside this hook:
-echo "[pre-commit] markdownlint-cli2 --fix on staged Markdown files"
+echo "[pre-commit] lint-staged: markdownlint on staged Markdown files"
if command -v npx >/dev/null 2>&1; then
- # Run with --fix so minor issues are auto-corrected
- npx -y markdownlint-cli2 --fix "${md_files[@]}" || true
- # Re-stage any modified files
- git add -- "${md_files[@]}" || true
- # Verify no errors remain; block commit if they do
- if ! npx -y markdownlint-cli2 "${md_files[@]}"; then
- echo "Markdownlint errors remain after autofix. Aborting commit." >&2
- exit 1
- fi
+ npx -y lint-staged
else
Then configure lint-staged (package.json or .lintstagedrc) to run:
"*.{md,mdx}": "markdownlint-cli2 --fix"
and it will re-stage only the fixed hunks.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
echo "[pre-commit] markdownlint-cli2 --fix on staged Markdown files" | |
if command -v npx >/dev/null 2>&1; then | |
# Run with --fix so minor issues are auto-corrected | |
npx -y markdownlint-cli2 --fix "${md_files[@]}" || true | |
# Re-stage any modified files | |
git add -- "${md_files[@]}" || true | |
# Verify no errors remain; block commit if they do | |
if ! npx -y markdownlint-cli2 "${md_files[@]}"; then | |
echo "[pre-commit] lint-staged: markdownlint on staged Markdown files" | |
if command -v npx >/dev/null 2>&1; then | |
npx -y lint-staged | |
else | |
echo "npx not found. Skipping markdownlint autofix. Install Node.js to enable autofix." >&2 | |
fi |
🤖 Prompt for AI Agents
.githooks/pre-commit lines 14-21: the hook reimplements staged-file handling for
Markdown linting; replace this ad-hoc logic with lint-staged. Add a lint-staged
config (package.json "lint-staged" or .lintstagedrc) mapping "*.{md,mdx}":
"markdownlint-cli2 --fix" and ensure markdownlint-cli2 is a devDependency, then
simplify the hook to invoke npx -y lint-staged (so only staged hunks are fixed
and re-staged) and propagate its exit code to block the commit on failures.
if command -v npx >/dev/null 2>&1; then | ||
# Run with --fix so minor issues are auto-corrected | ||
npx -y markdownlint-cli2 --fix "${md_files[@]}" || true | ||
# Re-stage any modified files | ||
git add -- "${md_files[@]}" || true | ||
# Verify no errors remain; block commit if they do | ||
if ! npx -y markdownlint-cli2 "${md_files[@]}"; then | ||
echo "Markdownlint errors remain after autofix. Aborting commit." >&2 | ||
exit 1 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Stop auto-staging unintended junk; lint the INDEX, not the working tree.
Right now you blindly re-add files after mutating the working tree. If a dev staged only some hunks, your hook will slurp in the rest. Hard fail when there are unstaged changes in targeted files, and don’t swallow failures.
Apply this diff:
if command -v npx >/dev/null 2>&1; then
+ # Refuse to proceed if any targeted file has unstaged changes.
+ for f in "${md_files[@]}"; do
+ if ! git diff --quiet -- "$f"; then
+ echo "Unstaged changes detected in $f; aborting to avoid staging unintended hunks. Stash or fully stage Markdown changes and retry." >&2
+ exit 1
+ fi
+ done
# Run with --fix so minor issues are auto-corrected
- npx -y markdownlint-cli2 --fix "${md_files[@]}" || true
+ npx -y markdownlint-cli2 --fix "${md_files[@]}"
# Re-stage any modified files
- git add -- "${md_files[@]}" || true
+ git add -- "${md_files[@]}"
# Verify no errors remain; block commit if they do
if ! npx -y markdownlint-cli2 "${md_files[@]}"; then
echo "Markdownlint errors remain after autofix. Aborting commit." >&2
exit 1
fi
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if command -v npx >/dev/null 2>&1; then | |
# Run with --fix so minor issues are auto-corrected | |
npx -y markdownlint-cli2 --fix "${md_files[@]}" || true | |
# Re-stage any modified files | |
git add -- "${md_files[@]}" || true | |
# Verify no errors remain; block commit if they do | |
if ! npx -y markdownlint-cli2 "${md_files[@]}"; then | |
echo "Markdownlint errors remain after autofix. Aborting commit." >&2 | |
exit 1 | |
fi | |
if command -v npx >/dev/null 2>&1; then | |
# Refuse to proceed if any targeted file has unstaged changes. | |
for f in "${md_files[@]}"; do | |
if ! git diff --quiet -- "$f"; then | |
echo "Unstaged changes detected in $f; aborting to avoid staging unintended hunks. Stash or fully stage Markdown changes and retry." >&2 | |
exit 1 | |
fi | |
done | |
# Run with --fix so minor issues are auto-corrected | |
npx -y markdownlint-cli2 --fix "${md_files[@]}" | |
# Re-stage any modified files | |
git add -- "${md_files[@]}" | |
# Verify no errors remain; block commit if they do | |
if ! npx -y markdownlint-cli2 "${md_files[@]}"; then | |
echo "Markdownlint errors remain after autofix. Aborting commit." >&2 | |
exit 1 | |
fi |
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: James Ross <james@flyingrobots.dev>
Alpha Release: v0.4.0
Summary
Promotion gates (excerpt; full in docs/15_promotion_checklists.md)
Evidence
Release automation
Notes