Skip to content

v0.1.0 — seven of eight modes

Choose a tag to compare

@mightbesaad mightbesaad released this 01 Jul 22:29
e14182a

First tagged state of the suite.

In:

  • 8-mode failure taxonomy (TAXONOMY.md) with per-mode detection criteria
  • 7 merged vertical slices, each with frozen probes, a deterministic grader,
    hand-labelled regression fixtures (112 total, all passing), and a runner
    with --live / --replay paths
  • Shared provider routing (Anthropic / Mistral / OpenAI) and a trajectory
    harness for the agentic modes: provider-normalized tool-use loop against
    scripted, frozen tools
  • Mode 3 live-validated (mistral-medium, 30 samples, verdicts blind-checked
    against raw responses); mode 8 live-panelled (3 Mistral models, none
    certified prematurely under a fair probe)
  • CI running the offline fixture suites

Not in, deliberately stated:

  • Mode 7 (disconfirmation avoidance) — unbuilt; the harness it needs is ready
  • Cross-provider live coverage — the mode-8 panel is Mistral-family only
  • Mode 8's fail path observed live — proven on adversarial fixtures only
  • LLM-judge layer for the graders' uncertain buckets

See TASKS.md for the full ledger and the build guardrails.