Skip to content

Confidence Labels

Oak Dev-inter edited this page Apr 26, 2026 · 2 revisions

Confidence Labels

Tag every non-trivial claim. [unsure] never gets silently guessed.

Introduced in v0.3.0 as Rule 11. Mandatory on Haiku, recommended on Sonnet/Opus for security, architecture, and cross-file findings.

The four labels

Label Meaning How to earn it
[high] verified by running code, reading source, or official docs runtime confirmed / source cited / docs linked
[medium] strong pattern match from codebase, not yet runtime-verified same pattern seen 2+ times in the same repo, behavior inferred not measured
[low] inferred from naming/comments, not actual behavior best guess from symbols, no behavioral evidence
[unsure] do not know missing context, ambiguous code, conflicting signals

The rule

[unsure] items are listed separately for user decision. Never silently guessed.

A finding with no label is a finding without provenance. On Haiku, that is treated as a framework violation. The label is the claim to truth the agent is making.

Output format

πŸ”΄ HIGH severity
[high] SalesController.php:3708 SQL injection β€” $var concat in raw()
       Fix: use ? placeholder with execute([$var])

[high] FinanceController.php:894 path traversal β€” $_FILES direct use
       Fix: basename() before fopen; whitelist upload dir

🟑 MED severity
[medium] PoController.php:113 N+1 query
         Pattern match; not profiled. Profile before optimize.

❓ UNSURE β€” requires user review
[unsure] NewWmsController.php:531 possible race condition
         Need concurrency context. User to confirm transaction boundary.

Severity (HIGH/MED/LOW) is a separate axis from confidence. A finding can be [high] LOW severity or [unsure] HIGH severity. Both dimensions matter.

Per-tier enforcement

  • Opus β€” should tag on security, architecture, and cross-file findings. Trivial findings may skip tags.
  • Sonnet β€” same as Opus; tag more aggressively for security-critical output.
  • Haiku β€” mandatory on every non-trivial finding. Missing tag = framework violation. [unsure] items must be listed separately, never silently guessed. The verifier pass (see Checklists) re-reads findings and lowers confidence where unjustified.

Why this matters

Observed pattern before Rule 11: agents produced 20 findings with identical assertive tone. User accepted all as equally valid. ~40% turned out wrong.

With confidence labels: agents produce the same 20 findings, but user can triage β€” act on [high] immediately, defer [medium] to profiling, escalate [unsure] to decision. False-positive cost drops; signal-to-noise rises. This was the single largest quality lift on Haiku-class models observed during framework validation (v0.7.4).

Downgrading labels

Agents self-downgrade when a pattern's confidence drops:

  • Contradictory evidence found β†’ downgrade [high] β†’ [medium]
  • Pattern match count drops below 2 β†’ [medium] β†’ [low]
  • Still cannot determine after retry β†’ [low] β†’ [unsure]

Upgrading requires new evidence (new test, new doc citation, runtime confirmation). Label inflation is an anti-pattern.

Anti-patterns

  • ❌ Stripping [unsure] to make a report look confident
  • ❌ Tagging [high] without citing source / file:line / doc URL
  • ❌ Mixing severity and confidence into one tag ([critical], [likely] β€” both lose information)
  • ❌ Silently dropping [unsure] items β€” user must see them to decide
  • ❌ On Haiku, producing any non-trivial finding without a label

v0.10 runtime verification

kasidit-verify.py (PostToolUse + Stop hooks) cross-checks [high] claims against actual tool calls in the same turn:

  • A [high] finding pointing at file:line with no Read of that file in the turn β†’ downgrade notice printed
  • A [high] claim with no Bash / runtime confirmation for "verified" wording β†’ downgrade notice
  • "Delegating to specialist" claim paired with direct Edit / Write / Bash β†’ master orchestrator violation flagged

The hook does not modify the AI output β€” it logs and prints, the AI sees the flag in subsequent turns and adjusts. See Backend-Hooks#kasidit-verify for the payload contract.

This is the first runtime layer enforcing label discipline. Prior versions relied entirely on prompt-level convention.

See also

Kasidit

Core

Version History

Concepts

Commands

Agents

Deprecated v0.10 (stubs β†’ audit-specialist --focus=..., removed in v0.11)

Clone this wiki locally