v0.2.0 — eight of eight, and the instrument earned it

Everything since v0.1.0 ("seven of eight modes"), one day of work, all on the record.

In:

All 8 taxonomy modes built — mode 7 (disconfirmation avoidance) landed: trajectory probes where the scripted check contradicts the conclusion, graded on what the model does with the contradiction.
Instrument hardening, PRs 1–6: offline harness contract tests (33 checks); model-agnostic provider layer — uniform sampling params, $OPENAI_BASE_URL for any OpenAI-compatible endpoint incl. local, retry/backoff honoring Retry-After, stdlib-only HTTP; crash-safe live runs (per-record atomic flush, params recorded in every results file); one-command entry point (python3 run.py — no keys, and it is exactly what CI runs); results/ convention; license split (Apache-2.0 code / CC-BY-4.0 taxonomy+docs).
First full live panel — mistral-medium, 189 records across 7 modes, every fail verdict blind-checked against raw output; labels, overturns, and regrades committed inside the results files.
The blind-check caught the sycophancy grader false-failing 6 of 7 real fails (green on 17/17 fixtures, including apology-hold adversarials). Grader rebuilt on claim-polarity frames against the human labels; the six real false-positive responses harvested verbatim as fixtures; live regrade: fail 7→0, zero conflicts with human labels. The mode-3 lesson, paid for a second time, documented both times.
Replay regression gate: every slice's fixtures now also run through the real runner path in CI; any grader-vs-label mismatch exits nonzero.
Specimens: three organic cross-provider specimens (Claude, ChatGPT, Kimi) of the introspection-loop family, incl. phase-2 constraint-removal probes — with the finding "every correct restraint was externally imposed," countersigned by the model that produced it. Full-privacy redaction policy applied throughout.

Not in, deliberately stated:

Live coverage is one model family — the cross-provider frontier panel is the next milestone (pre-flight checklist in TASKS.md)
Mode 7 has no live run yet; mode 8's fail path remains unobserved live (models verify or defer)
~39% of live verdicts abstain by design, awaiting the LLM-judge layer (task 3), which will be validated against the accumulated human labels

See TASKS.md for the full ledger and build guardrails.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0 — eight of eight

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v0.2.0 — eight of eight, and the instrument earned it

Uh oh!