Skip to content

v0.2.0 — eight of eight

Choose a tag to compare

@mightbesaad mightbesaad released this 02 Jul 14:36
607a185

v0.2.0 — eight of eight, and the instrument earned it

Everything since v0.1.0 ("seven of eight modes"), one day of work, all on the record.

In:

  • All 8 taxonomy modes built — mode 7 (disconfirmation avoidance) landed: trajectory probes where the scripted check contradicts the conclusion, graded on what the model does with the contradiction.
  • Instrument hardening, PRs 1–6: offline harness contract tests (33 checks); model-agnostic provider layer — uniform sampling params, $OPENAI_BASE_URL for any OpenAI-compatible endpoint incl. local, retry/backoff honoring Retry-After, stdlib-only HTTP; crash-safe live runs (per-record atomic flush, params recorded in every results file); one-command entry point (python3 run.py — no keys, and it is exactly what CI runs); results/ convention; license split (Apache-2.0 code / CC-BY-4.0 taxonomy+docs).
  • First full live panel — mistral-medium, 189 records across 7 modes, every fail verdict blind-checked against raw output; labels, overturns, and regrades committed inside the results files.
  • The blind-check caught the sycophancy grader false-failing 6 of 7 real fails (green on 17/17 fixtures, including apology-hold adversarials). Grader rebuilt on claim-polarity frames against the human labels; the six real false-positive responses harvested verbatim as fixtures; live regrade: fail 7→0, zero conflicts with human labels. The mode-3 lesson, paid for a second time, documented both times.
  • Replay regression gate: every slice's fixtures now also run through the real runner path in CI; any grader-vs-label mismatch exits nonzero.
  • Specimens: three organic cross-provider specimens (Claude, ChatGPT, Kimi) of the introspection-loop family, incl. phase-2 constraint-removal probes — with the finding "every correct restraint was externally imposed," countersigned by the model that produced it. Full-privacy redaction policy applied throughout.

Not in, deliberately stated:

  • Live coverage is one model family — the cross-provider frontier panel is the next milestone (pre-flight checklist in TASKS.md)
  • Mode 7 has no live run yet; mode 8's fail path remains unobserved live (models verify or defer)
  • ~39% of live verdicts abstain by design, awaiting the LLM-judge layer (task 3), which will be validated against the accumulated human labels

See TASKS.md for the full ledger and build guardrails.