[2.12.0] - 2026-06-16
Workflow-completion release driven by 400-session analysis: three new skills
(/phx:recall, /phx:deps-update, /phx:watch-pr), /phx:pr-review v2 that
closes the review loop (fetch → fix → reply → resolve), a route-intent.sh
UserPromptSubmit hook replacing ~0%-firing CLAUDE.md prose routing, four new
Iron Laws (#23–#26), and an eval-hardening pass that backfilled the
AskUserQuestion 4-option check, cross-file consistency tests, and untracked-file
detection. Law count 22 → 26.
Added
- Iron Law #26 — Comments aren't commit messages (session analysis found
Oliver asking "remove unnecessary comments" on essentially every PR, 8+
sessions clustered June 2026). A change's reasoning — the bug, what it
replaces, the task — belongs in the commit/PR/squash, which git persists; not
in code comments. No issue-reference tags inline (# ENA-1234). Keep only
durable intrinsic facts a future reader needs regardless of history:
footguns, invariants, library quirks. Wired into CLAUDE.md, the
inject-iron-laws.shSubagentStart hook (code-writing subagents inherit it),
theiron-law-judgeagent as detection#19(so/phx:reviewflags ticket
tags, change-narration, and what-comments), and theinitinjectable
template. Stops the comments being added during/phx:work//phx:quick
rather than stripping them at PR time. Law count 25 → 26. - UserPromptSubmit routing hook (
route-intent.sh) — injects one-line/phx:
suggestions directly into Claude's context for three high-signal intents: GitHub
PR URLs / review-feedback phrasing →/phx:pr-review, Tidewave
<context name="current-page">blocks →/phx:investigate, Elixir stack-trace
pastes →/phx:investigate. Replaces CLAUDE.md prose routing rules measured at
~0% firing rate across 400 sessions. One suggestion per category per session,
silent on explicit slash commands, gated onmix.exs, always exits 0
(UserPromptSubmit exit 2 would erase the user's prompt). /phx:recall— session and history archaeology (git-archaeology sessions
ran manualgit log/diffpipelines with no plugin support). Three evidence
layers, cheapest first:.claude/solutions/compound docs → git archaeology
(--grep,-Spickaxe,--follow,-L) → ccrider MCP session search, gated
with graceful degradation when the MCP is absent. ONE ccrider fetch = ONE
subagent (3–15KB responses; writes a ≤30-line summary file). Every answer cites
its evidence; clean misses are stated, then routed to/phx:compoundso the
next recall stops at layer 1. 100% trigger accuracy./phx:deps-update— generic dependency freshness workflow (dependency
maintenance was a recurring session pattern with no plugin support). Inventory
viamix hex.outdated(exit 1 = normal "outdated" signal), changelog deltas via
the built-inmix hex.package diff <pkg> <v1>..<v2>(no project-specific mix
tasks), updates with coupled-group enforcement (Phoenix core, Ecto, Ash, Oban,
telemetry families move together), breaking-change fixes, and PR splitting
(patches bundled, minors by area, majors solo). Majors require an explicit
mix.exsedit;override: trueonly when the per-package constraint table
shows a transitive blocker. Hands off security to/phx:deps-audit(Mode B) and
verification to/phx:verify. The only mutating deps skill — audit/vet stay
read-only. 89% trigger accuracy./phx:watch-pr— token-conscious PR/CI watching (replaces hand-rolled
60-min foregroundsleeploops observed in session analysis). A quiet
background watcher (scripts/watch-pr.sh, Monitor-tool-first with
run_in_backgroundfallback) pollsgh pr view --jsonin its own process and
emits ONE line per genuinely-new event (review, comment, CI conclusion, merged/
closed, watchdog, gh-failure) — raw JSON never enters Claude's context, and
Claude takes zero turns while idle (no cache-TTL straddling).--checks-only
delegates togh pr checks --watch --fail-fast(exit code is the signal).
Routes actionable reviews to/phx:pr-reviewand CI failures to
/phx:investigate. 100% trigger accuracy on the new fixture./phx:pr-reviewv2 — closes the review loop (fetch → fix → reply → resolve).
The old skill drafted replies but used REST endpoints that expose neither thread
IDs nor resolved status, so it could never resolve a thread or skip handled ones.
v2 fetches threads via GraphQLreviewThreads(thread ID +isResolved+
isOutdated, paginated), replies via REST to the thread root, resolves via
resolveReviewThread, and is idempotent across review rounds — GitHub's
isResolvedis the state. New flags:--bots-only(triage CI bot passes —
Copilot/Codex/CodeRabbit detected via__typename == "Bot"),--no-resolve.
New Iron Laws: never resolve without a reply, never claim a fix without a shown
diff, bot findings get the same scrutiny as humans. New references:
gh-commands.md(3 comment surfaces, pagination, bot detection),
bot-triage.md(batch flow + Elixir false-positive patterns).- Three new Iron Laws (#23–#25) from the 400-session analysis, wired into
elixir-idioms,liveview-patterns, the/phx:inittemplate, the SubagentStart
injection hook, andiron-law-judgedetection patterns:- #23 Mix tasks start only what they need —
Mix.Task.run("app.config")+
Application.ensure_all_started/1, neverMix.Task.run("app.start")(boots the
full tree: endpoint binds the port, Oban starts consuming jobs). The
mix-tasks.mdreference previously taught the anti-pattern; now fixed. - #24 LiveView handlers match
{:error, %Ecto.Changeset{}}explicitly — bare
{:error, _}silently swallows form validation errors. - #25 Capture Gettext/CLDR locale before spawning Task/GenServer — locale is
process-local; spawned processes reset to default.
- #23 Mix tasks start only what they need —
- Pre-migration safety section in
ecto-patterns/references/migrations.md—
check duplicates (including soft-deleted rows) before unique indexes, with
partial-index/data-fix/composite-key resolutions. - Tidewave reliability guards in
tidewave-integration— worktree/port
verification (multi-worktree setups debug the wrong server), schema introspection
before SQL, output-size caps,browser_evalserver-side fallbacks, and a
QA-walkthrough pattern for feature smoke tests. - Eval: AskUserQuestion 4-option-limit check (
askuserquestion_option_limit
matcher) — the tool silently drops a 5th option; brainstorm shipped that way for
months. Scans option lists after every AskUserQuestion mention (YAML- label:
blocks and bullet/numbered runs), stops at headings, and skips sibling list items
when the mention is itself inside a list. Backfilled into all 50 skill evals and
the generator template; caught a real second instance in/phx:plan. - Eval: cross-file consistency tests (
lab/eval/tests/test_consistency.py) —
two bug classes per-skill scoring can't see: references teaching anti-patterns
their own Iron Laws ban (mix-tasks.md shipped theapp.startpattern Iron Law
#23 bans), and skill scripts using cwd-relative.claude/paths (the
nested-state-dir bug class). The path lint caught a 4th live instance in
scripts/fetch-claude-docs.sh. make evalnow sees untracked files — brand-new skills/agents were invisible
to thegit diff-based changed-file detection until first commit;
git ls-files --othersis now merged into both detection paths.
Changed
-
Workflow handoffs between phases —
/phx:investigatenow ends with a routing
step (quick fix vs/phx:planvs/phx:compound);/phx:reviewpasses the review
file path to/phx:planfor follow-up plans;/phx:worksuggests/phx:compound
after non-obvious fixes and re-verifies stale plans from earlier sessions. -
/phx:fulldeflects existing plan files — description and a usage guard route
.claude/plans/*/plan.mdarguments to/phx:workinstead of re-planning. -
intent-detectionhard guard — skips entirely when the message starts with any
slash command; no more routing suggestions on top of explicit commands. -
/phx:workbatches checkbox updates — one edit pass when several tasks complete
together, not one Edit call per checkbox. -
/phx:compoundwrite-block fallback — outputs the solution doc inline and
points at/phx:permissionsinstead of silently dropping knowledge;
/phx:permissionsnow always recommends workflow-artifact write grants
(.claude/plans/,.claude/solutions/,.claude/reviews/). -
AskUserQuestion discipline in
brainstorm/triage— decisions only, concrete
impact per option; fixed brainstorm's Decision Point exceeding the tool's 4-option
limit (5 options meant one was always silently dropped). -
security-analyzer— new end-to-end flow checks from the 400-session analysis: IDOR
viahandle_paramsURL params, data-flow through multi-step transforms, failure-path
consistency inEcto.Multi/withchains, soft-delete leakage in authz lookups — all bug
classes external review bots caught after plugin review passed. -
elixir-reviewer— failure-path review section (Multi/with error branches,
short-circuit side effects, multi-step transforms, soft-delete filters), known
false-positive traps (nil[:key]is nil-safe via Access), and diff-scoped reading rule
to stop turn exhaustion on large PRs. -
verification-runner— compiles FIRST (turn 1 combines discovery +mix compile),
maxTurns 10 → 15, earlier findings-file write; stops "compiling… let me check again"
turn exhaustion observed on large PRs. -
parallel-reviewer+/phx:audit— rate-limit circuit breaker: when 2+ subagents
fail with rate-limit/API errors, synthesize from existing outputs and tell the user to
re-run after reset instead of dead-waiting on "continue". -
ecto-schema-designer— pre-UNIQUE-index migration safety check (duplicates +
soft-deleted rows silently block production migrations).
Fixed
- Iron Law verifier is now blame-aware —
iron-law-verifier.shscans only the content
the current Edit/Write introduced (new_string/content), not the whole file.
Pre-existing violations in untouched regions no longer force unrelated refactors. block-dangerous-ops.shfails open on script errors — a corrupted hook file (e.g.
merge-conflict markers) once blocked ALL Bash calls with no recovery; hooks.json now
appends|| exit 0and the script documents the JSON-deny/exit-0 contract.- Stop hook warns about uncommitted feature-branch changes — prevents the
lost-work-after-rebase incident class observed in session analysis. liveview-architect+ecto-schema-designermissing Write — both agents still had
the pre-v2.8.1disallowedTools: Write, ...frontmatter and fell back to inline output
when spawned as reviewers ("I only have Read, Grep, and Glob"). Write now allowed for
their own findings file; Edit stays disallowed.web-researchercould never write its output file — research workers were asked to
save findings but had Write disallowed; agents burned all turns on fetches then lost the
output. Write allowed + reserve-last-turns-for-output guard./phx:planpost-plan AskUserQuestion exceeded the 4-option limit — 5 options
("Review the plan" / "Adjust the plan" merged into one) meant one was always
silently dropped. Fixed in the skill,planning-orchestrator, and both hook
scripts that echo the list (precompact-rules.sh,plan-stop-reminder.sh).scripts/fetch-claude-docs.shwrote its cache relative to cwd — anchored to
${CLAUDE_PROJECT_DIR:-$PWD}like the other skill scripts.