bkit v2.1.25 — Claude 5 Model Alignment + Issue Response
bkit v2.1.25 — Claude 5 Model Alignment + Issue Response (GitHub Release notes DRAFT)
Status: FINAL DRAFT — version confirmed v2.1.25 (2026-07-02).
Basis:docs/02-design/features/claude-model-alignment.design.en.md(Option C,
user-approved) + empirical reproductions R1–R4 (.bkit/research/v2125-reproduction-log.md)
Highlights
- 4-tier role-based model matrix: 9 fable (verification
& orchestration core) / 7 opus (deep reasoning & security) / 16 sonnet
(implementers) / 2 haiku (monitors) — final tree 34 agents. 16 model pins
changed; every assignment argued per-agent (No Guessing). - ~4.5–5.3K tokens saved on every session wakeup (#129, @NEXCODE-MK): agent
descriptions compacted 30,065B → 16,919B (−44%) with a compact 8-language
trigger encoding — full EN + KO keyword lists stay, other languages keep one
anchor keyword each, and "Do NOT use for" guidance moved into agent bodies
(loaded only on invocation). bkit's own 8-language routing is untouched. - Deprecated agents off your prompt surface (#128, @NEXCODE-MK): the 6
pdca-eval-*tombstone stubs are gone fromagents/(−1,387B more, and no
accidentally-spawnable entries). Deprecation governance now lives in a
machine-readable registry (test/contract/deprecation-registry.json) under
ADR 0014 — contract L4/L5 gates fully preserved. - bkit's verification core now runs on Claude Fable 5: gap-detector,
design-validator, pdca-iterator, and all long-horizon leads (cto-lead,
sprint-orchestrator, sprint-master-planner, pm-lead, qa-lead, sprint-qa-flow)
are pinned tofable— the model class built for long-horizon orchestration and
honest self-verification. - Dual-floor compatibility, zero hard breakage: the install floor stays at
Claude Code v2.1.143. Below the new model floor (v2.1.170), bkit shows an
actionable SessionStart advisory (ENH-368) with a one-line workaround instead
of mystery spawn errors. - Cost accuracy: token-cost dashboards previously overstated opus spend 3x
(stale $15/$75 pricing). Pricing is now synced to published Claude API list
prices: fable $10/$50, opus $5/$25, sonnet $3/$15, haiku $1/$5 per MTok.
What changes for you
- Higher verification quality: gap analysis (match-rate SSoT), design
validation, and the Evaluator-Optimizer iteration loop now run on Fable 5 —
the checks that decide whether your implementation matches your design get
bkit's strongest model. - Better orchestration:
/pdca team,/sprint, PM and QA team workflows are
led by Fable-pinned leads; synthesis quality compounds across every downstream
phase. - Accurate cost reporting:
/pdca-watchand token reports reflect real
Claude 5 list prices (opus costs were overstated 3x before). - Leaner sessions: every Claude Code session with bkit carries ~44% less
always-resident agent-description text (#129) and zero deprecated tombstone
entries (#128) — reflected in cache-read billing on every API call. - Security-sensitive work stays on Opus 4.8: security-architect,
code-analyzer, and self-healing intentionally remain opus — Opus 4.8 is
strongest on cybersecurity, and Fable's safety classifier can reroute/refuse
security-adjacent or headless work.
Model matrix
| Tier | Count | Agents |
|---|---|---|
| fable — verification & orchestration core | 9 | cto-lead, sprint-orchestrator, sprint-master-planner, pm-lead, qa-lead, gap-detector, design-validator, pdca-iterator, sprint-qa-flow |
| opus — deep reasoning & security | 7 | security-architect, code-analyzer, self-healing, infra-architect, enterprise-expert, bkit-impact-analyst, cc-version-researcher |
| sonnet — implementers | 16 | bkend-expert, frontend-architect, pipeline-guide, pm-discovery, pm-lead-skill-patch, pm-prd, pm-research, pm-strategy, product-manager, qa-debug-analyst, qa-strategist, qa-test-generator, qa-test-planner, skill-needs-extractor, sprint-report-writer, starter-guide |
| haiku — monitors | 2 | qa-monitor, report-generator |
Reassignments: 9 opus→fable, 1 opus→sonnet (sprint-report-writer). 7 opus agents
preserved deliberately. The 6 deprecated pdca-eval-* stubs were removed from
agents/ per #128 / ADR 0014 (governance in the deprecation registry).
Compatibility & floors
- Recommended Claude Code: v2.1.198 — the
sonnetalias resolves to
Sonnet 5 only on CC ≥ v2.1.197. - Model floor: v2.1.170 — the
fablealias exists only from this version.
On CC 2.1.143–2.1.169 the 9 fable-pinned agents fail to spawn (empirically
reproduced, R2); bkit detects this at SessionStart and shows the advisory below. - Install minimum: v2.1.143 (unchanged — plugin-manifest
displayName).
Runtime minimum: v2.1.78 (unchanged). - Below the model floor — workaround: upgrade with
npm install -g @anthropic-ai/claude-code@latest, or temporarily
export CLAUDE_CODE_SUBAGENT_MODEL=sonnet(forces ALL subagents to sonnet
until you unset it).
Provider alias table (R1)
Alias resolution depends on your provider path — bkit makes no universal
"Sonnet 5" promise:
| Provider path | fable |
opus |
sonnet |
|---|---|---|---|
| Anthropic API (CC ≥ 2.1.197) | Fable 5 (CC ≥ 2.1.170) | Opus 4.8 | Sonnet 5 |
| Claude Platform on AWS | provider-specific full ID required | Opus 4.7 | Sonnet 4.6 |
| Bedrock / Vertex / Foundry | provider-specific full ID required | Opus 4.6 | Sonnet 4.5 |
Footguns & caveats
CLAUDE_CODE_SUBAGENT_MODELoverrides ALL frontmatter model pins — while
set, every subagent (including the 9 fable pins) runs on that model. It is the
documented below-floor workaround; remember to unset it after upgrading.- Enterprise
availableModelsexclusions fall back silently: an excluded
model does not error — the agent inherits the main conversation model instead. - Fable safety-classifier headless refusals: Fable may reroute or refuse
security-adjacent or non-interactive (claude -p) requests. This is why the
security/headless agents stay on Opus 4.8. - Provider aliases resolve to older models on AWS/Bedrock/Vertex (see table
above);fablethere needs a provider-specific full model ID. evals/*/eval.yamlmodel_baselinevalues are historical capture records and
are intentionally unchanged.
Issue response (community-driven)
- #128 (@NEXCODE-MK) — deprecated
pdca-eval-*stubs removed from the
prompt surface; deprecation registry + ADR 0014 supersede the v2.1.22
permanent-retention decision without touching contract baselines. Bonus: a
pre-existing exit-2 crash in the L4 missing-stub path and 6 pre-existing
agents-effort test failures were fixed along the way. - #129 (@NEXCODE-MK) — token diet via compact 8-language trigger encoding
(−44% agent-description surface; regression-locked at ≤700B per agent).
Locale-scoped generation (issue proposal 1) is deferred with rationale: CC
plugins are read-only marketplace checkouts — no install-time generation hook
exists. - #130 (@s99606931) —
learning-stop.jspiped-stdinisTTY === falsedead
gate fixed with the sharedreadStdinSync()helper (same precedent as
#125/#126); 9-TC regression test added; repo-wide sweep confirms zero
remainingisTTY === falsecode gates.
Fixes
- Opus pricing 3x overstatement in
lib/pdca/token-report.js(15/75 → 5/25);
haiku synced to 1/5; fable 10/50 added with_modelClass()fable branch. - 3 pre-existing doc-drift bugs:
commands/bkit.mdclaimed "36 total /
13 opus / 21 sonnet / 2 haiku" (actual: 40 files); pm-lead was listed as
sonnet(it was opus, now fable); test-checklist PM-T10 claimed all 5 PM
agents use sonnet. - Lockstep updates:
VALID_MODELS+ runtime whitelist gainedfable,
28 contract baseline JSONs regenerated (model field only), security assertions
SEC-AF-030/037/038 updated, team default ctoAgent opus → fable,
RECOMMENDED_VERSION2.1.150 → 2.1.198,FABLE_MODEL_FLOOR2.1.170 added.