Skip to content

bkit v2.1.25 — Claude 5 Model Alignment + Issue Response

Choose a tag to compare

@agent-kay-it agent-kay-it released this 02 Jul 05:49
523af2b

bkit v2.1.25 — Claude 5 Model Alignment + Issue Response (GitHub Release notes DRAFT)

Status: FINAL DRAFT — version confirmed v2.1.25 (2026-07-02).
Basis: docs/02-design/features/claude-model-alignment.design.en.md (Option C,
user-approved) + empirical reproductions R1–R4 (.bkit/research/v2125-reproduction-log.md)

Highlights

  • 4-tier role-based model matrix: 9 fable (verification
    & orchestration core) / 7 opus (deep reasoning & security) / 16 sonnet
    (implementers) / 2 haiku (monitors) — final tree 34 agents. 16 model pins
    changed; every assignment argued per-agent (No Guessing).
  • ~4.5–5.3K tokens saved on every session wakeup (#129, @NEXCODE-MK): agent
    descriptions compacted 30,065B → 16,919B (−44%) with a compact 8-language
    trigger encoding — full EN + KO keyword lists stay, other languages keep one
    anchor keyword each, and "Do NOT use for" guidance moved into agent bodies
    (loaded only on invocation). bkit's own 8-language routing is untouched.
  • Deprecated agents off your prompt surface (#128, @NEXCODE-MK): the 6
    pdca-eval-* tombstone stubs are gone from agents/ (−1,387B more, and no
    accidentally-spawnable entries). Deprecation governance now lives in a
    machine-readable registry (test/contract/deprecation-registry.json) under
    ADR 0014 — contract L4/L5 gates fully preserved.
  • bkit's verification core now runs on Claude Fable 5: gap-detector,
    design-validator, pdca-iterator, and all long-horizon leads (cto-lead,
    sprint-orchestrator, sprint-master-planner, pm-lead, qa-lead, sprint-qa-flow)
    are pinned to fable — the model class built for long-horizon orchestration and
    honest self-verification.
  • Dual-floor compatibility, zero hard breakage: the install floor stays at
    Claude Code v2.1.143. Below the new model floor (v2.1.170), bkit shows an
    actionable SessionStart advisory (ENH-368) with a one-line workaround instead
    of mystery spawn errors.
  • Cost accuracy: token-cost dashboards previously overstated opus spend 3x
    (stale $15/$75 pricing). Pricing is now synced to published Claude API list
    prices: fable $10/$50, opus $5/$25, sonnet $3/$15, haiku $1/$5 per MTok.

What changes for you

  • Higher verification quality: gap analysis (match-rate SSoT), design
    validation, and the Evaluator-Optimizer iteration loop now run on Fable 5 —
    the checks that decide whether your implementation matches your design get
    bkit's strongest model.
  • Better orchestration: /pdca team, /sprint, PM and QA team workflows are
    led by Fable-pinned leads; synthesis quality compounds across every downstream
    phase.
  • Accurate cost reporting: /pdca-watch and token reports reflect real
    Claude 5 list prices (opus costs were overstated 3x before).
  • Leaner sessions: every Claude Code session with bkit carries ~44% less
    always-resident agent-description text (#129) and zero deprecated tombstone
    entries (#128) — reflected in cache-read billing on every API call.
  • Security-sensitive work stays on Opus 4.8: security-architect,
    code-analyzer, and self-healing intentionally remain opus — Opus 4.8 is
    strongest on cybersecurity, and Fable's safety classifier can reroute/refuse
    security-adjacent or headless work.

Model matrix

Tier Count Agents
fable — verification & orchestration core 9 cto-lead, sprint-orchestrator, sprint-master-planner, pm-lead, qa-lead, gap-detector, design-validator, pdca-iterator, sprint-qa-flow
opus — deep reasoning & security 7 security-architect, code-analyzer, self-healing, infra-architect, enterprise-expert, bkit-impact-analyst, cc-version-researcher
sonnet — implementers 16 bkend-expert, frontend-architect, pipeline-guide, pm-discovery, pm-lead-skill-patch, pm-prd, pm-research, pm-strategy, product-manager, qa-debug-analyst, qa-strategist, qa-test-generator, qa-test-planner, skill-needs-extractor, sprint-report-writer, starter-guide
haiku — monitors 2 qa-monitor, report-generator

Reassignments: 9 opus→fable, 1 opus→sonnet (sprint-report-writer). 7 opus agents
preserved deliberately. The 6 deprecated pdca-eval-* stubs were removed from
agents/ per #128 / ADR 0014 (governance in the deprecation registry).

Compatibility & floors

  • Recommended Claude Code: v2.1.198 — the sonnet alias resolves to
    Sonnet 5 only on CC ≥ v2.1.197.
  • Model floor: v2.1.170 — the fable alias exists only from this version.
    On CC 2.1.143–2.1.169 the 9 fable-pinned agents fail to spawn (empirically
    reproduced, R2); bkit detects this at SessionStart and shows the advisory below.
  • Install minimum: v2.1.143 (unchanged — plugin-manifest displayName).
    Runtime minimum: v2.1.78 (unchanged).
  • Below the model floor — workaround: upgrade with
    npm install -g @anthropic-ai/claude-code@latest, or temporarily
    export CLAUDE_CODE_SUBAGENT_MODEL=sonnet (forces ALL subagents to sonnet
    until you unset it).

Provider alias table (R1)

Alias resolution depends on your provider path — bkit makes no universal
"Sonnet 5" promise:

Provider path fable opus sonnet
Anthropic API (CC ≥ 2.1.197) Fable 5 (CC ≥ 2.1.170) Opus 4.8 Sonnet 5
Claude Platform on AWS provider-specific full ID required Opus 4.7 Sonnet 4.6
Bedrock / Vertex / Foundry provider-specific full ID required Opus 4.6 Sonnet 4.5

Footguns & caveats

  • CLAUDE_CODE_SUBAGENT_MODEL overrides ALL frontmatter model pins — while
    set, every subagent (including the 9 fable pins) runs on that model. It is the
    documented below-floor workaround; remember to unset it after upgrading.
  • Enterprise availableModels exclusions fall back silently: an excluded
    model does not error — the agent inherits the main conversation model instead.
  • Fable safety-classifier headless refusals: Fable may reroute or refuse
    security-adjacent or non-interactive (claude -p) requests. This is why the
    security/headless agents stay on Opus 4.8.
  • Provider aliases resolve to older models on AWS/Bedrock/Vertex (see table
    above); fable there needs a provider-specific full model ID.
  • evals/*/eval.yaml model_baseline values are historical capture records and
    are intentionally unchanged.

Issue response (community-driven)

  • #128 (@NEXCODE-MK) — deprecated pdca-eval-* stubs removed from the
    prompt surface; deprecation registry + ADR 0014 supersede the v2.1.22
    permanent-retention decision without touching contract baselines. Bonus: a
    pre-existing exit-2 crash in the L4 missing-stub path and 6 pre-existing
    agents-effort test failures were fixed along the way.
  • #129 (@NEXCODE-MK) — token diet via compact 8-language trigger encoding
    (−44% agent-description surface; regression-locked at ≤700B per agent).
    Locale-scoped generation (issue proposal 1) is deferred with rationale: CC
    plugins are read-only marketplace checkouts — no install-time generation hook
    exists.
  • #130 (@s99606931)learning-stop.js piped-stdin isTTY === false dead
    gate fixed with the shared readStdinSync() helper (same precedent as
    #125/#126); 9-TC regression test added; repo-wide sweep confirms zero
    remaining isTTY === false code gates.

Fixes

  • Opus pricing 3x overstatement in lib/pdca/token-report.js (15/75 → 5/25);
    haiku synced to 1/5; fable 10/50 added with _modelClass() fable branch.
  • 3 pre-existing doc-drift bugs: commands/bkit.md claimed "36 total /
    13 opus / 21 sonnet / 2 haiku" (actual: 40 files); pm-lead was listed as
    sonnet (it was opus, now fable); test-checklist PM-T10 claimed all 5 PM
    agents use sonnet.
  • Lockstep updates: VALID_MODELS + runtime whitelist gained fable,
    28 contract baseline JSONs regenerated (model field only), security assertions
    SEC-AF-030/037/038 updated, team default ctoAgent opus → fable,
    RECOMMENDED_VERSION 2.1.150 → 2.1.198, FABLE_MODEL_FLOOR 2.1.170 added.