Skip to content

v2.12.0 — 3 skills, pr-review v2, route-intent hook, Iron Laws #23–26

Latest

Choose a tag to compare

@oliver-kriska oliver-kriska released this 16 Jun 06:27
85e7612

[2.12.0] - 2026-06-16

Workflow-completion release driven by 400-session analysis: three new skills
(/phx:recall, /phx:deps-update, /phx:watch-pr), /phx:pr-review v2 that
closes the review loop (fetch → fix → reply → resolve), a route-intent.sh
UserPromptSubmit hook replacing ~0%-firing CLAUDE.md prose routing, four new
Iron Laws (#23#26), and an eval-hardening pass that backfilled the
AskUserQuestion 4-option check, cross-file consistency tests, and untracked-file
detection. Law count 22 → 26.

Added

  • Iron Law #26 — Comments aren't commit messages (session analysis found
    Oliver asking "remove unnecessary comments" on essentially every PR, 8+
    sessions clustered June 2026). A change's reasoning — the bug, what it
    replaces, the task — belongs in the commit/PR/squash, which git persists; not
    in code comments. No issue-reference tags inline (# ENA-1234). Keep only
    durable intrinsic facts a future reader needs regardless of history:
    footguns, invariants, library quirks. Wired into CLAUDE.md, the
    inject-iron-laws.sh SubagentStart hook (code-writing subagents inherit it),
    the iron-law-judge agent as detection #19 (so /phx:review flags ticket
    tags, change-narration, and what-comments), and the init injectable
    template. Stops the comments being added during /phx:work//phx:quick
    rather than stripping them at PR time. Law count 25 → 26.
  • UserPromptSubmit routing hook (route-intent.sh) — injects one-line /phx:
    suggestions directly into Claude's context for three high-signal intents: GitHub
    PR URLs / review-feedback phrasing → /phx:pr-review, Tidewave
    <context name="current-page"> blocks → /phx:investigate, Elixir stack-trace
    pastes → /phx:investigate. Replaces CLAUDE.md prose routing rules measured at
    ~0% firing rate across 400 sessions. One suggestion per category per session,
    silent on explicit slash commands, gated on mix.exs, always exits 0
    (UserPromptSubmit exit 2 would erase the user's prompt).
  • /phx:recall — session and history archaeology (git-archaeology sessions
    ran manual git log/diff pipelines with no plugin support). Three evidence
    layers, cheapest first: .claude/solutions/ compound docs → git archaeology
    (--grep, -S pickaxe, --follow, -L) → ccrider MCP session search, gated
    with graceful degradation when the MCP is absent. ONE ccrider fetch = ONE
    subagent (3–15KB responses; writes a ≤30-line summary file). Every answer cites
    its evidence; clean misses are stated, then routed to /phx:compound so the
    next recall stops at layer 1. 100% trigger accuracy.
  • /phx:deps-update — generic dependency freshness workflow (dependency
    maintenance was a recurring session pattern with no plugin support). Inventory
    via mix hex.outdated (exit 1 = normal "outdated" signal), changelog deltas via
    the built-in mix hex.package diff <pkg> <v1>..<v2> (no project-specific mix
    tasks), updates with coupled-group enforcement (Phoenix core, Ecto, Ash, Oban,
    telemetry families move together), breaking-change fixes, and PR splitting
    (patches bundled, minors by area, majors solo). Majors require an explicit
    mix.exs edit; override: true only when the per-package constraint table
    shows a transitive blocker. Hands off security to /phx:deps-audit (Mode B) and
    verification to /phx:verify. The only mutating deps skill — audit/vet stay
    read-only. 89% trigger accuracy.
  • /phx:watch-pr — token-conscious PR/CI watching (replaces hand-rolled
    60-min foreground sleep loops observed in session analysis). A quiet
    background watcher (scripts/watch-pr.sh, Monitor-tool-first with
    run_in_background fallback) polls gh pr view --json in its own process and
    emits ONE line per genuinely-new event (review, comment, CI conclusion, merged/
    closed, watchdog, gh-failure) — raw JSON never enters Claude's context, and
    Claude takes zero turns while idle (no cache-TTL straddling). --checks-only
    delegates to gh pr checks --watch --fail-fast (exit code is the signal).
    Routes actionable reviews to /phx:pr-review and CI failures to
    /phx:investigate. 100% trigger accuracy on the new fixture.
  • /phx:pr-review v2 — closes the review loop (fetch → fix → reply → resolve).
    The old skill drafted replies but used REST endpoints that expose neither thread
    IDs nor resolved status, so it could never resolve a thread or skip handled ones.
    v2 fetches threads via GraphQL reviewThreads (thread ID + isResolved +
    isOutdated, paginated), replies via REST to the thread root, resolves via
    resolveReviewThread, and is idempotent across review rounds — GitHub's
    isResolved is the state. New flags: --bots-only (triage CI bot passes —
    Copilot/Codex/CodeRabbit detected via __typename == "Bot"), --no-resolve.
    New Iron Laws: never resolve without a reply, never claim a fix without a shown
    diff, bot findings get the same scrutiny as humans. New references:
    gh-commands.md (3 comment surfaces, pagination, bot detection),
    bot-triage.md (batch flow + Elixir false-positive patterns).
  • Three new Iron Laws (#23#25) from the 400-session analysis, wired into
    elixir-idioms, liveview-patterns, the /phx:init template, the SubagentStart
    injection hook, and iron-law-judge detection patterns:
    • #23 Mix tasks start only what they need — Mix.Task.run("app.config") +
      Application.ensure_all_started/1, never Mix.Task.run("app.start") (boots the
      full tree: endpoint binds the port, Oban starts consuming jobs). The
      mix-tasks.md reference previously taught the anti-pattern; now fixed.
    • #24 LiveView handlers match {:error, %Ecto.Changeset{}} explicitly — bare
      {:error, _} silently swallows form validation errors.
    • #25 Capture Gettext/CLDR locale before spawning Task/GenServer — locale is
      process-local; spawned processes reset to default.
  • Pre-migration safety section in ecto-patterns/references/migrations.md
    check duplicates (including soft-deleted rows) before unique indexes, with
    partial-index/data-fix/composite-key resolutions.
  • Tidewave reliability guards in tidewave-integration — worktree/port
    verification (multi-worktree setups debug the wrong server), schema introspection
    before SQL, output-size caps, browser_eval server-side fallbacks, and a
    QA-walkthrough pattern for feature smoke tests.
  • Eval: AskUserQuestion 4-option-limit check (askuserquestion_option_limit
    matcher) — the tool silently drops a 5th option; brainstorm shipped that way for
    months. Scans option lists after every AskUserQuestion mention (YAML - label:
    blocks and bullet/numbered runs), stops at headings, and skips sibling list items
    when the mention is itself inside a list. Backfilled into all 50 skill evals and
    the generator template; caught a real second instance in /phx:plan.
  • Eval: cross-file consistency tests (lab/eval/tests/test_consistency.py) —
    two bug classes per-skill scoring can't see: references teaching anti-patterns
    their own Iron Laws ban (mix-tasks.md shipped the app.start pattern Iron Law
    #23 bans), and skill scripts using cwd-relative .claude/ paths (the
    nested-state-dir bug class). The path lint caught a 4th live instance in
    scripts/fetch-claude-docs.sh.
  • make eval now sees untracked files — brand-new skills/agents were invisible
    to the git diff-based changed-file detection until first commit;
    git ls-files --others is now merged into both detection paths.

Changed

  • Workflow handoffs between phases/phx:investigate now ends with a routing
    step (quick fix vs /phx:plan vs /phx:compound); /phx:review passes the review
    file path to /phx:plan for follow-up plans; /phx:work suggests /phx:compound
    after non-obvious fixes and re-verifies stale plans from earlier sessions.

  • /phx:full deflects existing plan files — description and a usage guard route
    .claude/plans/*/plan.md arguments to /phx:work instead of re-planning.

  • intent-detection hard guard — skips entirely when the message starts with any
    slash command; no more routing suggestions on top of explicit commands.

  • /phx:work batches checkbox updates — one edit pass when several tasks complete
    together, not one Edit call per checkbox.

  • /phx:compound write-block fallback — outputs the solution doc inline and
    points at /phx:permissions instead of silently dropping knowledge;
    /phx:permissions now always recommends workflow-artifact write grants
    (.claude/plans/, .claude/solutions/, .claude/reviews/).

  • AskUserQuestion discipline in brainstorm/triage — decisions only, concrete
    impact per option; fixed brainstorm's Decision Point exceeding the tool's 4-option
    limit (5 options meant one was always silently dropped).

  • security-analyzer — new end-to-end flow checks from the 400-session analysis: IDOR
    via handle_params URL params, data-flow through multi-step transforms, failure-path
    consistency in Ecto.Multi/with chains, soft-delete leakage in authz lookups — all bug
    classes external review bots caught after plugin review passed.

  • elixir-reviewer — failure-path review section (Multi/with error branches,
    short-circuit side effects, multi-step transforms, soft-delete filters), known
    false-positive traps (nil[:key] is nil-safe via Access), and diff-scoped reading rule
    to stop turn exhaustion on large PRs.

  • verification-runner — compiles FIRST (turn 1 combines discovery + mix compile),
    maxTurns 10 → 15, earlier findings-file write; stops "compiling… let me check again"
    turn exhaustion observed on large PRs.

  • parallel-reviewer + /phx:audit — rate-limit circuit breaker: when 2+ subagents
    fail with rate-limit/API errors, synthesize from existing outputs and tell the user to
    re-run after reset instead of dead-waiting on "continue".

  • ecto-schema-designer — pre-UNIQUE-index migration safety check (duplicates +
    soft-deleted rows silently block production migrations).

Fixed

  • Iron Law verifier is now blame-awareiron-law-verifier.sh scans only the content
    the current Edit/Write introduced (new_string/content), not the whole file.
    Pre-existing violations in untouched regions no longer force unrelated refactors.
  • block-dangerous-ops.sh fails open on script errors — a corrupted hook file (e.g.
    merge-conflict markers) once blocked ALL Bash calls with no recovery; hooks.json now
    appends || exit 0 and the script documents the JSON-deny/exit-0 contract.
  • Stop hook warns about uncommitted feature-branch changes — prevents the
    lost-work-after-rebase incident class observed in session analysis.
  • liveview-architect + ecto-schema-designer missing Write — both agents still had
    the pre-v2.8.1 disallowedTools: Write, ... frontmatter and fell back to inline output
    when spawned as reviewers ("I only have Read, Grep, and Glob"). Write now allowed for
    their own findings file; Edit stays disallowed.
  • web-researcher could never write its output file — research workers were asked to
    save findings but had Write disallowed; agents burned all turns on fetches then lost the
    output. Write allowed + reserve-last-turns-for-output guard.
  • /phx:plan post-plan AskUserQuestion exceeded the 4-option limit — 5 options
    ("Review the plan" / "Adjust the plan" merged into one) meant one was always
    silently dropped. Fixed in the skill, planning-orchestrator, and both hook
    scripts that echo the list (precompact-rules.sh, plan-stop-reminder.sh).
  • scripts/fetch-claude-docs.sh wrote its cache relative to cwd — anchored to
    ${CLAUDE_PROJECT_DIR:-$PWD} like the other skill scripts.