Releases: mcp-tool-shop-org/role-os
v2.9.1 — Stage A health pass + specialists design lock
Fixed — health pass (134 verified findings, adversarially confirmed)
roleos swarmandroleos auditwork again. Both crashed on every invocation. The validated manifest now drives run construction: swarm runs carry stage/domain/gate metadata (swarm statusgroups by stage,swarm approvepersists gate approvals), audit runs scale 2N+K+3 with the manifest.- Pack-level runs are built from real roles — catalog-valid steps, artifacts by role lookup, final review gate restored (Critic Reviewer →
verdict; Judge →judge-reportfor brainstorm). - Reject verdicts route to the producing role (was: back to the reviewer).
- Specialist quota window actually slides (route-tagged v2 state, tolerant migration) — no more permanent lockout. Citation gate counts distinct identifiers (house "arXiv ID + URL" format no longer flags). Generated capability-gate hook is self-contained and fail-closed in npx/global installs.
- Docs tell the truth again — install step in quick start, 61-role table, opt-in egress threat model, real npm package name on the landing page, source-verified handbook counts, genericized starter-pack, pinned npm on the OIDC publish path.
- 6 new regression suites pin the above as contracts. Suite: 1404 → 1435 tests (1432 pass, 3 deliberate skips), 0 fail.
Added
design/specialists-layer.md— design lock for the specialists progression layer (grade bands, the Record, cross-training, techniques, operating profiles, form). Research-grounded by a 40-finding study-swarm; citation-verified (19 identifiers checked, 0 fabricated, 0 misattributed). Implementation lands in future minors.
Architecture grounded in (selection): Deci/Koestner/Ryan 1999 (competence feedback vs rewards) · Moldon et al. 2021 (streaks induce junk work) · Ilharco et al. 2022 (task arithmetic) · Yadav et al. 2023 (TIES interference) · Boubdir et al. 2023 (Elo pitfalls) · Schemmer et al. 2023 (appropriate reliance) · full bibliography in the design doc.
🤖 Generated with Claude Code
v2.9.0 — Crew Dossier + Operating Posture dispatch wiring
Crew Dossier + Operating Posture
A character sheet for every role that doubles as run-time config.
- Six aptitudes (rigor / pace / range / skepticism / autonomy / candor, 0–5) mapped to real dispatch knobs, an 8-archetype disposition layer carrying a behavioral instruction, a painted portrait, and a grade — for all 64 roles (the 61 roles + 3 specialty auditors).
- Operating Posture dispatch wiring (opt-in, non-breaking):
buildRolePromptinjects the disposition's instruction + a posture line from the role's aptitudes when a dossier exists; roles without one are byte-identical to before. Runtime data ships insrc/role-dossiers.json. - A self-contained crew gallery (
dossier/dossier.html) — each role's radar shows its tuned build vs its canonical ideal. - Aptitude profiles calibrated like an instrument: a cloud-model panel (per-axis median consensus) + a different-family external-verifier pass → 64 unique, knob-faithful fingerprints.
Full suite 1404 tests green. Handbook: https://mcp-tool-shop-org.github.io/role-os/handbook/crew-dossier/
v2.8.0 — capability gate + conformance live-catalog rollout
Added
- Capability gate — deterministic least-privilege on irreversible tool calls. A gated set of irreversible / world-touching actions (npm/PyPI publish,
gh release/pr/repo edit,git push, Pages deploy), a director-authored.claude/role-os/capabilities.jsongrant manifest, andcapabilityGate(). Opt-in (ROLEOS_CAPABILITY_GATE, default OFF → pure no-op), fail-closed for the gated set, deterministic (no model). Wired intoonPreToolUse(deny path) + the generated PreToolUse hook (exit 2), alongside the advisory / fail-open conformance floor. Bounds what a wrong verdict — an honest mistake or an injected one — can DO; the preventive complement to the named-compensator rule (POLA / CaMeL).
Changed
- Wedge #1 conformance — live tool-contracts catalog rollout. The deterministic schema + computable-contract floor runs at the live
onPreToolUseseam against.claude/role-os/tool-contracts.json(advisory, fail-open), and generated hook scripts emit the current Claude Code wire protocol.
Full changelog: CHANGELOG.md.
v2.7.1 — budget consult docs
Documentation release.
The README and a new handbook page now cover budget-aware dispatch — Role OS can consult a local Token Budget Analyst for each dispatch step and attach an advisory spend forecast (opt-in ROLEOS_BUDGET_CONSULT, fail-open to a deterministic baseline, never blocks a dispatch). No code changes from 2.7.0.
role-os v2.7.0
Token Budget Analyst — production budget consult (opt-in, default-off).
consultBudgetForManifest / buildDispatchManifestWithBudget consult the budgeter specialist per dispatch step, attaching an advisory budget forecast + receipt to each step. Enable with ROLEOS_BUDGET_CONSULT=1; fail-open to the deterministic baseline max(ctx*1.5, 50000) (not Claude); advisory — it never blocks a dispatch. Also lands the budgeter dataset tooling under tools/token-budget-dataset/.
Full notes: CHANGELOG.md. 1334 tests green. Compensator: roleos specialist rollback.
role-os v2.6.0 — local panel judges against prism's full abstract
verify-citations --local-panel now judges against prism's full abstract, not just one span.
The local entailment panel previously re-checked each supported citation against only prism's source_title + the single supporting_span the groundedness lens surfaced. A faithful claim the whole abstract entails — but no single span does — was escalated as a panel disagreement. buildEvidence now prefers prism's full source_abstract (surfaced by prism v1.0+), falling back to the span on older prism builds — so faithful claims land cleanly while genuine false-confirms are still caught. gateCitations threads source_abstract through; backward-compatible, no API change.
Pairs with prism-verify 1.0.0 (which surfaces source_abstract) and tensor-engine-knowledge wave-9. Full suite: 1199 tests green. Published via npm Trusted Publishing (provenance).
Install: npm install -g role-os · npx role-os
v2.5.0 — verify-citations --local-panel
A second, family-different verifier seat for the citation gate.
roleos verify-citations --local-panel adds a local 3-seat entailment panel (Qwen3-4B + Qwen3-14B + Mistral-Nemo-12B) that re-checks each citation an external verifier marked supported, and escalates to human review on disagreement — it can only tighten the gate, never loosen it. Runs entirely on local models, zero cost.
Why it matters: the panel's measured property is zero false-confirms — it never stamps a false claim "supported." On a real 16-case arXiv citation set, one model false-confirmed a claim that inverted a paper's finding; the panel held it at insufficient.
Opt in with --local-panel (off by default; needs a local llama-swap + offload). +16 tests, 1196 total. See CHANGELOG.md for details.
v1.2.0 — Pack Promotion
Calibrated packs promoted to default entry. Auto-selection, mismatch detection, alternative suggestion, free-routing fallback. See CHANGELOG.md.
v1.1.0 — Full Spine Complete
See CHANGELOG.md for full notes. 31 roles, 7 proven team packs, 212 tests, 35 execution trials.