Skip to content

feat: bugfix workflow redesign with new /fix command#136

Merged
maxritter merged 2 commits intomainfrom
dev
Apr 29, 2026
Merged

feat: bugfix workflow redesign with new /fix command#136
maxritter merged 2 commits intomainfrom
dev

Conversation

@maxritter
Copy link
Copy Markdown
Owner

@maxritter maxritter commented Apr 29, 2026

Summary

Adds /fix as the standard bugfix command — investigate, RED test, fix, audit, done — with mandatory end-to-end verification. Slims the existing /spec bugfix lane substantially. Addresses the issue where two simple bugs took 21+ minutes and 400k tokens halfway through.

/fix — new user-invocable command

  • New skill at pilot/skills/fix/ with six steps (investigate → RED → fix → audit → quality → finalise)
  • Always quick: stops cleanly and tells the user to re-invoke with /spec when complexity exceeds the quick lane (no --full flag, no auto-routing between commands — honour the user's command choice)
  • Iron Law docs: reorder setup with less context switching #4: never claim done from unit tests alone. Step 4.3 mandates running the actual program with concrete evidence
    • UI: Claude Code Chrome → Chrome DevTools MCP → playwright-cli → agent-browser
    • CLI / API / library / background job: real invocation with captured output
  • Step 1.4 adds explicit Instrument-when-needed guidance with SPEC-DEBUG: markers (cleanup grep enforced at audit)
  • Registered as user-invocable, model: opus

Slimmed full bugfix lane (/spec)

  • Codex dropped from spec-bugfix-plan (was step 7) — adversarial review on a plan with no code yet was overhead. Codex still runs at spec-verify time.
  • Codex-once rule across spec-plan + spec-verify: a session-scoped sentinel file prevents re-runs across plan iterations within one /spec invocation
  • spec-bugfix-verify: 11 → 8 steps. Dropped redundant plan-verify, runtime-verification, post-merge-verify
  • Step 2 (Behavior Contract audit): tightened from ~6.4KB to ~3.8KB. Step 2.6 now enforces mandatory E2E verification with concrete evidence — no claim-VERIFIED-on-tests-alone
  • spec-bugfix-plan: tightened red-flags step with Common Rationalizations and User Signals tables (from superpowers/systematic-debugging). Step 3.3 adds concrete multi-layer instrumentation example. Step 4 fix-approach defaults to picking the obvious fix.
  • spec-implement Task 2 (bugfix lane): drops the per-task full-suite run — single biggest token sink in bundled bugfix plans. Full suite now runs once at the Quality Gate.

Iteration cap

spec-bugfix-verify Step 8 caps fix iterations at 3. After three failed verify cycles, AskUserQuestion presents Continue/Pivot/Abandon — stops the fix-on-fix-on-fix death spiral.

Dispatcher / model defaults / Console

  • /spec dispatcher: clean split — /spec routes bugs to spec-bugfix-plan, /fix is a separate command for the quick lane
  • launcher/model_config.py: /fix and /benchmark added to default skill models (opus). Console useSettings.ts mirrors.
  • Console Settings page: General Rows now lists /fix and /benchmark with descriptive labels + sub-text. SPEC_PHASE_ROWS rendered with sub-text.
  • launcher/banner.py + installer/steps/finalize.py: /fix and /benchmark added to quick-start tips and post-install workflow list
  • pilot/settings.json: companyAnnouncements line lists /fix; spinner tip updated

Docs

  • New Docusaurus page docs/workflows/fix.md with workflow diagram, Common Issues table, mandatory E2E section, /spec vs /fix decision table
  • /spec page trimmed of bugfix-only content (now points at /fix)
  • README: new /fix section with How /fix works + How /spec handles bugs details blocks
  • Marketing site WhatsInside.tsx updated to mention /fix

Other-session changes incorporated

  • Console: parsePlanContent.ts + tests for spec section rendering and plan annotator persistence (Bug A + Bug B fixes from a parallel session)
  • pilot/hooks/tool_redirect.py + tests
  • pilot/.mcp.json + hooks/hooks.json + scripts/*.cjs minor updates
  • pilot/rules/development-practices.md + mcp-servers.md updates
  • installer/downloads.py minor
  • pilot/ui/* compiled bundles rebuilt
  • .devcontainer/* tweaks

Verification

  • ✅ 1690 Python tests pass (launcher + hooks + installer)
  • ✅ Console: 1438 tests pass + tsc --noEmit clean
  • ruff check clean on all files I touched

Summary by CodeRabbit

  • New Features

    • Introduced a dedicated /fix bugfix workflow with a RED-before-GREEN TDD flow.
    • Dev Container now provides persistent CLAUDE config storage and pre-installed sandbox tooling for running untrusted code.
  • Documentation

    • Added a /fix guide, reorganized /spec guidance, and updated install/sandboxing docs and site copy.
  • Bug Fixes

    • Improved download reliability with expanded retries and smarter error handling.
  • Chores

    • UI updates: added /fix and /benchmark to next-steps, made workflow tiles clickable, simplified hero content.

Adds /fix as the standard bugfix command — investigate, RED test, fix,
audit, done — with mandatory end-to-end verification. Slims the existing
/spec bugfix lane substantially. The workflow change addresses the issue
where two simple bugs took 21+ minutes and 400k tokens halfway through.

## /fix — new user-invocable command

- New skill at pilot/skills/fix/ with six steps (investigate → RED → fix →
  audit → quality → finalise). Always quick: stops cleanly and tells the
  user to re-invoke with /spec when complexity exceeds the quick lane.
- Iron Law #4: never claim done from unit tests alone. Step 4.3 is
  mandatory end-to-end verification — UI bugs must run browser automation
  via Claude Code Chrome / Chrome DevTools MCP / playwright-cli /
  agent-browser; CLI/API/library bugs must be exercised with real
  invocations and concrete evidence captured in the completion report.
- Step 1.4 adds explicit Instrument-when-needed guidance with SPEC-DEBUG
  markers (cleanup grep enforced at audit).
- Registered as user-invocable, model: opus.

## Slimmed full bugfix lane (/spec)

- Dropped Codex from spec-bugfix-plan entirely (was step 7) — adversarial
  review on a plan with no code yet was overhead. Codex still runs at
  spec-verify time.
- Codex-once rule across spec-plan + spec-verify: a session-scoped
  sentinel file prevents re-runs across plan iterations within one /spec
  invocation (was re-running on every annotate/verify cycle).
- spec-bugfix-verify: 11 steps → 8. Dropped redundant plan-verify (no-op),
  runtime-verification (foldable into quality), post-merge-verify
  (squash merges with no conflicts are safe).
- spec-bugfix-verify Step 2 (Behavior Contract audit): tightened from
  ~6.4KB to ~3.8KB. Sub-steps 2.5/2.6 merged into 2.4. Step 2.6 now
  enforces mandatory E2E verification with concrete evidence — no
  claim-VERIFIED-on-tests-alone.
- spec-bugfix-plan: tightened red-flags step with Common Rationalizations
  and User Signals tables (inspired by superpowers/systematic-debugging).
  Step 3.3 adds a concrete multi-layer instrumentation example. Step 4
  fix-approach-selection defaults to picking the obvious fix rather than
  manufacturing fake alternatives.
- spec-implement Task 2 (bugfix lane): drops the per-task full-suite run.
  For bundled bugfix plans this was the single biggest token sink — full
  suite now runs once at the Quality Gate, targeted module test runs
  between fix iterations.

## Iteration cap

- spec-bugfix-verify Step 8 caps fix iterations at 3. After three failed
  verify cycles, AskUserQuestion presents Continue/Pivot/Abandon — stops
  the fix-on-fix-on-fix death spiral that previously had no bound.

## Dispatcher / model defaults / Console

- /spec dispatcher restored to clean split: /spec routes bugs to
  spec-bugfix-plan (full lane), /fix is a separate user-facing command
  for the quick lane. No --full flag, no auto-routing between commands.
- launcher/model_config.py: /fix and /benchmark added to default skill
  models (opus). Console useSettings.ts mirrors.
- Console Settings page: General Rows now lists /fix and /benchmark with
  descriptive labels + sub-text. SPEC_PHASE_ROWS rendered with sub-text.
- launcher/banner.py + installer/steps/finalize.py: /fix and /benchmark
  added to quick-start tips and post-install workflow list. Spec-Driven
  feature description updated to mention both /spec and /fix.
- pilot/settings.json: companyAnnouncements line now lists /fix; spinner
  tip updated.

## Docs

- New Docusaurus page docs/workflows/fix.md with workflow diagram,
  Common Issues table, mandatory E2E section, /spec vs /fix decision
  table. Sidebars updated.
- /spec page trimmed of bugfix-only content (now points at /fix).
- README: new /fix section with How /fix works + How /spec handles bugs
  details blocks.
- Marketing site WhatsInside.tsx updated to mention /fix.

## Other-session changes incorporated

- Console: parsePlanContent.ts + tests for spec section rendering and
  plan annotator persistence (Bug A + Bug B fixes).
- pilot/hooks/tool_redirect.py + tests: extended.
- pilot/.mcp.json + hooks/hooks.json + scripts/*.cjs: minor updates.
- pilot/rules/development-practices.md + mcp-servers.md: updates.
- installer/downloads.py: minor.
- pilot/ui/* compiled bundles: rebuilt.
- .devcontainer/*: minor adjustments.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
pilot-shell Ignored Ignored Preview Apr 29, 2026 9:32am

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7cdc81ce-1cd9-4d5e-8c54-8860ace0a315

📥 Commits

Reviewing files that changed from the base of the PR and between 5147307 and 9f222e0.

⛔ Files ignored due to path filters (12)
  • launcher/banner.py is excluded by !launcher/**
  • launcher/codegraph.py is excluded by !launcher/**
  • launcher/tests/unit/test_banner.py is excluded by !launcher/**
  • launcher/tests/unit/test_codegraph.py is excluded by !launcher/**
  • pilot/settings.json is excluded by !pilot/**
  • pilot/skills/fix/manifest.json is excluded by !pilot/**
  • pilot/skills/fix/orchestrator.md is excluded by !pilot/**
  • pilot/skills/fix/steps/01-investigate.md is excluded by !pilot/**
  • pilot/skills/fix/steps/03-fix.md is excluded by !pilot/**
  • pilot/skills/fix/steps/04-verify-e2e.md is excluded by !pilot/**
  • pilot/skills/fix/steps/05-quality.md is excluded by !pilot/**
  • pilot/skills/fix/steps/06-finalise.md is excluded by !pilot/**
📒 Files selected for processing (10)
  • README.md
  • docs/docusaurus/docs/workflows/fix.md
  • docs/site/src/components/HeroSection.tsx
  • docs/site/src/components/InstallSection.tsx
  • docs/site/src/components/WhatsInside.tsx
  • docs/site/src/components/WorkflowSteps.tsx
  • docs/site/src/types/uint8array-base64.d.ts
  • installer/steps/finalize.py
  • installer/tests/unit/steps/test_finalize.py
  • installer/tests/unit/test_downloads.py

Walkthrough

This PR introduces a new /fix bugfix workflow and docs, adds sandboxing packages and a persistent devcontainer volume, updates installer UI and tests to surface /fix and /benchmark, expands download retry logic with selective HTTP error handling and reduced parallelism, and adds related docs/site/type changes.

Changes

Cohort / File(s) Summary
Dev Container & Sandboxing
\.devcontainer/Dockerfile, \.devcontainer/devcontainer.json, docs/docusaurus/docs/getting-started/installation.md
Adds bubblewrap, iptables, ipset to the container image; mounts persistent volume claude-code-config-${devcontainerId} at /root/.claude; documents sandbox recommendation and preinstalled components.
Workflow Docs & README
README.md, docs/docusaurus/docs/workflows/fix.md, docs/docusaurus/docs/workflows/spec.md, docs/docusaurus/sidebars.ts
Adds new /fix workflow doc; refactors /spec to focus on planning/design and clarify bugfix routing/iteration caps; updates README and sidebar to surface /fix.
Site UI Components
docs/site/src/components/WhatsInside.tsx, docs/site/src/components/HeroSection.tsx, docs/site/src/components/InstallSection.tsx, docs/site/src/components/WorkflowSteps.tsx
Updates copy for feature summaries; removes hero feature/badge blocks; inserts /fix into install and workflow UI, turns mode tiles into links, and adjusts layout/descriptions.
Installer Code & Tests
installer/downloads.py, installer/tests/unit/test_downloads.py, installer/steps/finalize.py, installer/tests/unit/steps/test_finalize.py
Increases single-file retry budget and backoff; retry only on 5xx HTTP errors (fail immediately on 4xx); lowers parallel workers (16→8) and adds a sequential retry pass after threadpool; adds tests for sequential retry behavior; updates finalize next-steps and tests to include /fix and /benchmark.
Type Declarations
docs/site/src/types/uint8array-base64.d.ts
Adds ambient TypeScript declarations extending Uint8Array/constructor with base64/base64url and hex encode/decode helpers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested labels

released

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: introduction of a new /fix command and redesign of the bugfix workflow, which is the primary focus reflected across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 29, 2026

Claude finished @maxritter's task in 3m 5s —— View job


PR Review: feat: bugfix workflow redesign with new /fix command

  • Gather context and read CLAUDE.md
  • Review changed files (installer, docs, pilot skills, devcontainer)
  • Analyze pilot skill steps and workflow logic
  • Provide comprehensive review feedback

Review Summary

This is a well-designed PR that addresses a genuine efficiency problem — simple bugs routing through the full /spec lane wasting tokens and time. The two-lane architecture is clean and the boundaries are explicit. Here's what I found across the reviewable files.


/fix skill — new workflow

The overall design is sound. A few specific observations:

pilot/skills/fix/orchestrator.md

The bail-out trigger section is clear and actionable. Iron Law #4 (never claim done from unit tests alone) is well-placed and the enforcement in Step 4.3 is explicit. The "stopping is success" framing is a good UX choice to prevent the natural bias toward continuing in the quick lane.

One edge case worth considering: the orchestrator says "No Iterations: counter. If your fix doesn't work after one re-attempt, stop." But Step 3.5 says "If the second attempt also fails: stop." This is consistent — but it's worth confirming these two descriptions won't be read as contradictory (one re-attempt = two total attempts). They agree, just stated differently from different vantage points.

pilot/skills/fix/steps/02-red.md — 2.4 No commit yet

The rationale for bundling test + fix in one commit (quick lane skips the revert-test proof) is correct and well-explained. The contrast with the full lane's cp+trap proof is a helpful callout for readers familiar with /spec.

pilot/skills/fix/steps/04-audit.md — 4.2 Symptom-patching grep

The pattern grep -E "^\+.*\b(try:|except|catch \(|return None|return \[\]|return \{\}|console\.log|print\()" will match intentional additions too. The step correctly says "Inspect every match" rather than "fail on every match" — this is the right approach, but worth calling out that the grep is a review aid, not a binary gate.

pilot/skills/fix/steps/06-finalise.md

The ARGUMENTS: $ARGUMENTS line at the end of the step (line 60) is the standard skill argument placeholder — correct, not an issue.

The approval gate in 6.2 correctly blocks on Step 4.3 evidence before allowing the approval summary. The check "Tests pass" is not enough at the gate is consistent with Iron Law #4.


/spec bugfix lane — changes

spec-bugfix-plan/steps/00-toggles.md

Removing CODEX_SPEC from the env read here is consistent with the Codex-once rule (Codex only runs once in spec-verify, not spec-bugfix-plan). Clean change.

spec-bugfix-plan/steps/07-codex-review.md — deleted

Dropping adversarial review on a plan-with-no-code is the right call. Codex challenging an empty plan produces overhead without signal. The Codex-once rule in spec-verify is the correct placement.

spec-bugfix-verify/steps/02-verify-fix.md

The tightening from ~83 to ~62 lines is net positive. The cp+trap explanation is cleaner. One observation: the original version spelled out the interpretation more explicitly (three bullet outcomes). The replacement is more terse:

Outcomes:
- **Test failed with the documented `Currently (bug)` error** → RED proven.
- **Test passed without fix** → test doesn't encode the bug...
- **Test errored unrelated** → not a valid signal...

This is fine — it preserves the substance, just more compact.

spec-bugfix-verify/steps/08-update-status.md — iteration cap

The cap logic is correct: Iterations >= 3 BEFORE incrementing means the AskUserQuestion fires after the 3rd failed verify cycle (not the 4th). This matches the stated "cap at 3." The three options (Continue/Pivot/Abandon) are well-chosen — "Continue" is the escape hatch for cases where the user genuinely wants one more shot.

Codex-once sentinel file (spec-verify/steps/04-launch-review.md + 07-collect-results.md)

The flag is keyed by plan-slug within the session directory ($HOME/.pilot/sessions/$SESS_ID/codex-ran-<plan-slug>.flag). This means different plans in the same session each get their own sentinel, which is correct. The sentinel also correctly prevents re-runs on plan re-iterations (annotation edits, finding-fix cycles) without blocking new plans.

One gap: the bash snippet in 04-launch-review.md shows the check but doesn't show the early-return path explicitly:

if [ -f "$CODEX_FLAG" ]; then
  echo "Codex already reviewed this plan in this session — skipping (codex-once)."
  # Skip the launch below and the Codex collection sub-step in Step 7.
fi

The comment says "skip the launch below" but there's no exit or return — this is in a prompt file read by the model, not executed as a script, so the model is expected to understand the comment. That's fine for this context but slightly ambiguous vs. the spec-plan/steps/11-plan-verification.md version which is more explicit about the skip path.


installer/downloads.py

The changes are well-targeted:

  • MAX_RETRIES 3→5 and extended backoff (1→2→5→10→20s): good for burst CDN errors
  • Explicit HTTPError catch before the general URLError catch: correctly fast-fails 4xx (file genuinely missing) while retrying 5xx. Previously 4xx would retry unnecessarily up to 3 times.
  • max_workers 16→8: reduces concurrent pressure on CDN (likely motivated by rate limiting)
  • Sequential retry pass after parallel: handles CDN 502s that don't recover within a single file's retry budget
# installer/downloads.py:229-233
failed_indices = [i for i, ok in enumerate(results) if not ok]
if failed_indices:
    time.sleep(5.0)
    for i in failed_indices:
        results[i] = download_file(file_infos[i], dest_paths[i], config)

Observation: The 5.0 second sleep is fixed regardless of how many files failed. For typical use (few failures) this is fine. There's no test covering this sequential retry path — the changed test file (test_finalize.py) doesn't test download_files_parallel. Worth adding a test for the "some files fail → sequential retry succeeds" path. Fix this →


installer/tests/unit/steps/test_finalize.py

Tests are solid. One minor observation:

# test_finalize.py:204-212
labels = [item[0] for _, items in sections for item in items]
assert "/prd" in labels
assert "/spec" in labels
prd_idx = labels.index("/prd")
spec_idx = labels.index("/spec")
assert prd_idx < spec_idx

The test_prd_appears_before_spec_in_next_steps test encodes the ordering assumption directly. If the workflows list in finalize.py is reordered in the future, this test fails without an obvious error message. Consider using a clearer assertion message: assert prd_idx < spec_idx, f"/prd (idx {prd_idx}) should appear before /spec (idx {spec_idx})". Minor.

The test_next_steps_has_two_sections hardcodes {"Getting Started": 4, "Workflows": 6}. The comment # Getting Started has 4 items, Workflows has 6 (incl. /fix and /benchmark) is helpful for future maintainers.


pilot/hooks/tool_redirect.py

The additions are substantial and well-tested. Key observations:

Dangerous git pattern detection

The pattern list at lines 104-120 is comprehensive. The _normalize_git_command function (strips git -C <path> / git -c <key=val>) has a documented limitation for quoted values with spaces — acceptable given the comment and the fallback still catches most force push forms.

# tool_redirect.py:115-116
(re.compile(r"\bgit\s+checkout\s+\.(?:\s|$)"), "git checkout ."),
(re.compile(r"\bgit\s+checkout\s+(?:\S+\s+)?--\s+\S"), "git checkout -- <file>"),

The git checkout . pattern is correctly differentiated from git checkout <branch>. The (?:\s|$) terminator prevents false matches on git checkout .git.

Nuance in git restore . pattern (line 117):

re.compile(r"\bgit\s+restore\s+(?!--staged\b)\.")

This correctly allows git restore --staged . (unstaging only) while blocking git restore . (discarding working-tree changes). The negative lookahead is precise. git restore --patch . would also be blocked — that's an interactive partial-restore, which is arguably not destructive, but acceptable to block for safety.

Search nudge throttling

The per-(category, session) throttling via sentinel JSON is a practical choice. The _nudge_already_sent / _mark_nudge_sent pattern is correct. Tests via the fresh_throttle pytest fixture (monkeypatching _throttle_sentinel_path) are the right approach.

_rg_targets_single_file (line 162-169):

def _rg_targets_single_file(segment: str) -> bool:
    tokens = segment.split()
    for token in reversed(tokens[1:]):
        if token.startswith("-"):
            continue
        return _is_single_file_path(token)
    return False

This returns False when rg has no positional args (flags only, e.g. rg --json 'pattern'), which correctly triggers the nudge. Edge case: rg with a pattern but no path (rg 'foo') — the pattern string would be checked by _is_single_file_path. Since patterns typically don't have file extensions, this returns False correctly and the nudge fires.


Documentation

docs/docusaurus/docs/workflows/fix.md

Well-written. The workflow diagram, Common Issues table, and /spec vs /fix decision table are all useful. The E2E section correctly mirrors the Iron Law with the same 4-tier browser tool resolution order.

One minor inconsistency: the doc says "Worktree isolation is not honoured — use /spec if you want a worktree" under Configurable Toggles, but the orchestrator mentions /fix takes worktree context (no plan file, "all state lives in this conversation"). The doc is accurate — /fix doesn't use worktrees, which is fine, just worth confirming the docs match the actual behavior.

docs/docusaurus/docs/workflows/spec.md (line 43)

For a bugfix workflow without a plan file, use /fix.

Clean forward reference. The bugfix section is now trimmed to essentials and delegates to /fix appropriately.


Minor / Nits

  • .devcontainer/Dockerfile: added chromium to the apt install list — consistent with the browser automation stack.
  • pilot/settings.json: spinner tip updated, company announcements line lists /fix. Fine.
  • pilot/rules/task-and-workflow.md: the /spec workflow section now shows /fix → Skill('fix') → ... in the workflow diagram. The description "Always quick. If bug exceeds quick-lane scope, STOPS..." is accurate and consistent with the orchestrator.

Verdict

Good to merge. The two-lane design is clean, the bail-out conditions are explicit and consistent, and the E2E verification mandate is enforced in both lanes. The Codex-once rule and iteration cap are solid additions.

One actionable item before merge: add a test for download_files_parallel's sequential retry pass (installer/downloads.py:229-233) — currently this code path has no coverage.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
installer/tests/unit/steps/test_finalize.py (1)

237-240: Strengthen workflow coverage with label-level assertions.

Section-length checks help, but they can still pass if expected workflow entries are replaced. Please also assert /fix and /benchmark are present in the Workflows items.

🧪 Suggested test enhancement
                 expected_lengths = {"Getting Started": 4, "Workflows": 6}
                 for title, items in sections:
                     assert len(items) == expected_lengths[title]
+                workflows = dict(sections)["Workflows"]
+                workflow_labels = [label for label, _ in workflows]
+                assert "/fix" in workflow_labels
+                assert "/benchmark" in workflow_labels
As per coding guidelines `**/tests/**`: Review test code briefly. Focus on: Test coverage for the feature being tested.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@installer/tests/unit/steps/test_finalize.py` around lines 237 - 240, The test
currently only checks section lengths; enhance it by adding explicit label-level
assertions for the Workflows section: inside the loop that iterates "for title,
items in sections" (and after the existing length assertion using
expected_lengths), when title == "Workflows" assert that "/fix" and "/benchmark"
are present in the items collection (use the same items variable), so the test
fails if those specific workflow entries are missing or replaced.
.devcontainer/devcontainer.json (1)

11-13: Consider pinning CLAUDE_CONFIG_DIR explicitly for mount-path stability.

The mount is good, but setting containerEnv.CLAUDE_CONFIG_DIR=/root/.claude makes path resolution robust if the container user/home changes later.

♻️ Suggested hardening
   "mounts": [
     "source=claude-code-config-${devcontainerId},target=/root/.claude,type=volume"
   ],
+  "containerEnv": {
+    "CLAUDE_CONFIG_DIR": "/root/.claude"
+  },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.devcontainer/devcontainer.json around lines 11 - 13, Add an explicit
container environment variable for the Claude config directory so mounts remain
stable: set containerEnv.CLAUDE_CONFIG_DIR to "/root/.claude" in the
devcontainer.json (alongside the existing "mounts" entry) so code and tools can
reliably resolve the mount path even if the container user/home changes later.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/docusaurus/docs/workflows/fix.md`:
- Line 86: The document currently contradicts itself by mentioning both "Two
quick-lane fix attempts have already failed." and a separate reference to
"three" as the bail-out threshold; pick one threshold (e.g., two or three) and
update every occurrence to that single value — specifically replace the sentence
fragment "Two quick-lane fix attempts have already failed." and any other
mentions of "three" or the bail-out threshold (including the referenced "99-99"
occurrence) so the text consistently uses the chosen numeric threshold
throughout.
- Around line 24-26: The fenced workflow code block that contains the string
"Investigate  →  RED  →  Fix  →  Audit  →  Quality Gate  →  Done" is missing a
language identifier which triggers markdownlint MD040; update the opening fence
to use a language tag (e.g., change ``` to ```text) so the block becomes ```text
... ```, ensuring the closing fence remains intact.

In `@README.md`:
- Around line 180-182: The fenced flow block containing "Investigate  →  RED  → 
Fix  →  Audit  →  Quality Gate  →  Done" is missing a language tag and triggers
markdownlint MD040; update the fenced code block that contains that arrow
sequence to include a language identifier (e.g., add "text" after the opening
triple backticks) so the block becomes a properly tagged fenced code block and
MD040 will be resolved.

---

Nitpick comments:
In @.devcontainer/devcontainer.json:
- Around line 11-13: Add an explicit container environment variable for the
Claude config directory so mounts remain stable: set
containerEnv.CLAUDE_CONFIG_DIR to "/root/.claude" in the devcontainer.json
(alongside the existing "mounts" entry) so code and tools can reliably resolve
the mount path even if the container user/home changes later.

In `@installer/tests/unit/steps/test_finalize.py`:
- Around line 237-240: The test currently only checks section lengths; enhance
it by adding explicit label-level assertions for the Workflows section: inside
the loop that iterates "for title, items in sections" (and after the existing
length assertion using expected_lengths), when title == "Workflows" assert that
"/fix" and "/benchmark" are present in the items collection (use the same items
variable), so the test fails if those specific workflow entries are missing or
replaced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 86f66904-59a0-4dd3-97c9-36ebacff4c5f

📥 Commits

Reviewing files that changed from the base of the PR and between 0e11129 and 5147307.

⛔ Files ignored due to path filters (65)
  • console/package.json is excluded by !console/**
  • console/src/ui/viewer/hooks/useSettings.ts is excluded by !console/**
  • console/src/ui/viewer/views/Settings/index.tsx is excluded by !console/**
  • console/src/ui/viewer/views/Spec/annotation/PlanAnnotator.tsx is excluded by !console/**
  • console/src/ui/viewer/views/Spec/annotation/useAnnotation.ts is excluded by !console/**
  • console/src/ui/viewer/views/Spec/index.tsx is excluded by !console/**
  • console/src/ui/viewer/views/Spec/parsePlanContent.ts is excluded by !console/**
  • console/tests/annotation/plan-annotator-persistence.test.tsx is excluded by !console/**
  • console/tests/ui/spec-section-rendering.test.ts is excluded by !console/**
  • console/tsconfig.json is excluded by !console/**
  • launcher/banner.py is excluded by !launcher/**
  • launcher/model_config.py is excluded by !launcher/**
  • launcher/tests/unit/test_model_config.py is excluded by !launcher/**
  • pilot/.mcp.json is excluded by !pilot/**
  • pilot/hooks/hooks.json is excluded by !pilot/**
  • pilot/hooks/tests/test_spec_stop_guard.py is excluded by !pilot/**
  • pilot/hooks/tests/test_tool_redirect.py is excluded by !pilot/**
  • pilot/hooks/tool_redirect.py is excluded by !pilot/**
  • pilot/rules/development-practices.md is excluded by !pilot/**
  • pilot/rules/mcp-servers.md is excluded by !pilot/**
  • pilot/rules/task-and-workflow.md is excluded by !pilot/**
  • pilot/scripts/mcp-server.cjs is excluded by !pilot/**
  • pilot/scripts/worker-service.cjs is excluded by !pilot/**
  • pilot/settings.json is excluded by !pilot/**
  • pilot/skills/fix/manifest.json is excluded by !pilot/**
  • pilot/skills/fix/orchestrator.md is excluded by !pilot/**
  • pilot/skills/fix/steps/01-investigate.md is excluded by !pilot/**
  • pilot/skills/fix/steps/02-red.md is excluded by !pilot/**
  • pilot/skills/fix/steps/03-fix.md is excluded by !pilot/**
  • pilot/skills/fix/steps/04-audit.md is excluded by !pilot/**
  • pilot/skills/fix/steps/05-quality.md is excluded by !pilot/**
  • pilot/skills/fix/steps/06-finalise.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-plan/manifest.json is excluded by !pilot/**
  • pilot/skills/spec-bugfix-plan/orchestrator.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-plan/steps/00-toggles.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-plan/steps/01-red-flags.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-plan/steps/03-investigation.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-plan/steps/04-plan-fix.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-plan/steps/06-annotation-check.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-plan/steps/07-approval.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-plan/steps/07-codex-review.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-plan/steps/08-continuing.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-verify/manifest.json is excluded by !pilot/**
  • pilot/skills/spec-bugfix-verify/orchestrator.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-verify/steps/02-verify-fix.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-verify/steps/03-quality-checks.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-verify/steps/04-plan-verify.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-verify/steps/04-verification-scenario.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-verify/steps/05-final-worktree-sync.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-verify/steps/05-runtime-verification.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-verify/steps/06-check-feedback.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-verify/steps/07-code-review-gate.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-verify/steps/08-post-merge-verify.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-verify/steps/08-update-status.md is excluded by !pilot/**
  • pilot/skills/spec-bugfix-verify/steps/11-update-status.md is excluded by !pilot/**
  • pilot/skills/spec-implement/steps/04-tdd-loop.md is excluded by !pilot/**
  • pilot/skills/spec-plan/steps/11-plan-verification.md is excluded by !pilot/**
  • pilot/skills/spec-verify/steps/04-launch-review.md is excluded by !pilot/**
  • pilot/skills/spec-verify/steps/07-collect-results.md is excluded by !pilot/**
  • pilot/skills/spec/orchestrator.md is excluded by !pilot/**
  • pilot/skills/spec/steps/01-parse-route.md is excluded by !pilot/**
  • pilot/ui/PlanAnnotator.js is excluded by !pilot/**
  • pilot/ui/index3.js is excluded by !pilot/**
  • pilot/ui/index5.js is excluded by !pilot/**
  • pilot/ui/viewer-bundle.js is excluded by !pilot/**
📒 Files selected for processing (11)
  • .devcontainer/Dockerfile
  • .devcontainer/devcontainer.json
  • README.md
  • docs/docusaurus/docs/getting-started/installation.md
  • docs/docusaurus/docs/workflows/fix.md
  • docs/docusaurus/docs/workflows/spec.md
  • docs/docusaurus/sidebars.ts
  • docs/site/src/components/WhatsInside.tsx
  • installer/downloads.py
  • installer/steps/finalize.py
  • installer/tests/unit/steps/test_finalize.py

Comment thread docs/docusaurus/docs/workflows/fix.md Outdated
Comment thread docs/docusaurus/docs/workflows/fix.md Outdated
- Root cause is architectural, not a single line.
- Fix needs defense-in-depth at multiple layers.
- Confidence stays Low — root cause can't be pinned to file:line.
- Two quick-lane fix attempts have already failed.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unify the failed-attempt threshold (currently contradictory).

The page states both two and three failed attempts as the bail-out threshold. Please pick one value and use it consistently.

Also applies to: 99-99

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/docusaurus/docs/workflows/fix.md` at line 86, The document currently
contradicts itself by mentioning both "Two quick-lane fix attempts have already
failed." and a separate reference to "three" as the bail-out threshold; pick one
threshold (e.g., two or three) and update every occurrence to that single value
— specifically replace the sentence fragment "Two quick-lane fix attempts have
already failed." and any other mentions of "three" or the bail-out threshold
(including the referenced "99-99" occurrence) so the text consistently uses the
chosen numeric threshold throughout.

Comment thread README.md Outdated
@maxritter maxritter merged commit 5bdfbbc into main Apr 29, 2026
11 of 12 checks passed
@claude
Copy link
Copy Markdown

claude Bot commented Apr 29, 2026

Claude finished @maxritter's task in 2m 11s —— View job


Incremental Review — Push 2 (9f222e0)

Files reviewed in this push (non-encrypted): README.md, docs/docusaurus/docs/workflows/fix.md, installer/steps/finalize.py, installer/tests/unit/steps/test_finalize.py, installer/tests/unit/test_downloads.py


✅ Resolved from prior review

Sequential retry test coverage (installer/tests/unit/test_downloads.py) — the gap flagged in the first review is fully addressed. Two new tests cover the path:

  • test_download_files_parallel_sequential_retry_recovers_failed_files: a flaky file that fails the parallel pass succeeds on sequential retry; call counts are verified per-file
  • test_download_files_parallel_sequential_retry_still_fails: verifies the result stays False when the sequential retry also fails

Both correctly monkeypatch downloads.time.sleep to skip the 5-second cool-down, which is the right patching target when the module does import time; time.sleep(...).


✅ Resolved from CodeRabbit

  1. MD040 fenced code blocks```text language tag added in both README.md:178 and docs/docusaurus/docs/workflows/fix.md:26.
  2. Threshold inconsistency — "Two quick-lane fix attempts have already failed." unified to "Two failed fix attempts." across the bail-out list and the Common Issues table. No more contradiction with the orchestrator's one-re-attempt rule.
  3. Label-level assertions in test_finalize.py/fix and /benchmark are now asserted present in the Workflows section labels (lines 543–545).

New findings

Ordering test removed without replacement (installer/tests/unit/steps/test_finalize.py)

test_prd_appears_before_spec_in_next_steps was deleted because the workflows list reordering puts /spec before /prd — the old assertion would now fail. The deletion is correct. However, the new ordering (/spec, /fix, /prd, /setup-rules, /create-skill, /benchmark) has no test asserting it. The existing test_next_steps_has_two_sections only checks counts and that /fix//benchmark are present. If someone swaps /spec and /fix again, no test catches it. Minor — coverage is acceptable for a UI string list.

dict(sections) in new test assertion (test_finalize.py:543)

workflow_labels = [label for label, _ in dict(sections)["Workflows"]]

dict(sections) works because section titles are unique, but if a second "Workflows" section were ever added the dict conversion would silently drop the first one. Not a real risk with the current code; just a note for future maintainers. Would be slightly more robust as:

workflow_labels = next(items for title, items in sections if "Workflows")
workflow_labels = [label for label, _ in workflow_labels]

/spec vs /fix decision table scope (docs/docusaurus/docs/workflows/fix.md:105–108)

The table was tightened from file-count / root-cause locatability criteria to intent-based criteria ("You want a fix without ceremony", "You want it done now"). The closing paragraph now says /fix "handles the full range — from typos to multi-step debugging" and bails out only for "truly architectural" bugs. This is a deliberate scope expansion consistent with the redesign — flagging it for awareness, not as an issue.


Still open from prior review

  • CLAUDE_CONFIG_DIR env var in .devcontainer/devcontainer.json (CodeRabbit nitpick, not in this push's diff — unchanged)

Summary

Push 2 closes all three previously flagged issues (sequential retry coverage, MD040 code fences, threshold inconsistency) and adds the CodeRabbit-suggested label assertions. The two new findings above are minor. No blocking issues.

@maxritter maxritter deleted the dev branch April 29, 2026 09:38
github-actions Bot pushed a commit that referenced this pull request Apr 29, 2026
# [8.5.0](v8.4.1...v8.5.0) (2026-04-29)

### Features

* bugfix workflow redesign with new /fix command ([#136](#136)) ([5bdfbbc](5bdfbbc))
@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 8.5.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant