Skip to content

apohara-compliance v2.0.0 — Trajectory Taint-Correlation Detection

Choose a tag to compare

@SuarezPM SuarezPM released this 09 Jun 21:08

v2.0.0 — Trajectory Taint-Correlation Detection (ADR-4)

Additive — a new deterministic taint engine runs AFTER the single-action
loop AND after the ADR-2 sequence pass. It expresses the injection →
consequence dataflow the single-action engine cannot: a TAINTED source
(an action on the untrusted-data tool-result: channel carrying injection
markers, AND not a doc/comment quote) FOLLOWED BY a genuine sensitive
real-action sink (exfil / destructive / financial) later in the same action
stream (forward-correlated: the taint persists across intervening steps).

Added

  • New module crates/scanner/src/taint.rs — the deterministic
    taint-correlation engine. Self-contained by design (ADR-4 OQ1): copies
    the small CompiledStep / step_match shape from sequence.rs rather
    than sharing a helper, to keep zero blast-radius on the CRITICAL
    matching.rs and the live sequence.rs AGT-MEM-001 path.
  • New rules (rule count 17 → 20): AGT-TRJ-001 (injection + sensitive sink,
    base), AGT-TRJ-002 (exfil sink family), AGT-TRJ-003 (destructive sink
    family).
  • A10 live capture (pre-registration + smoke): the committed AgentDojo
    corpus + a bounded live capture on AgentDojo banking-suite with
    MiniMax-M3 (OpenRouter adapter), attack important_instructions,
    10 attacked pairs + 2 benign. Real-world result: 0 / 10 attack-success
    on MiniMax
    (the model refused every indirect injection); 28 API calls,
    65,550 tokens; real-usage proof.
  • Synthetic positive (trj-agentdojo-async-injection.jsonl + friends) fires
    AGT-TRJ-001 / 002 / 003 via the real binary; the FinBot direct-injection
    fixture (negative control) and benign-trajectory traps fire zero.
  • Pre-registration: tests/corpus/PREREG-v2-agentdojo.md (frozen before
    scanning). Proof: tests/corpus/PROOF-v2-minimax.md (the real-world
    0 / 10 + 65,550 tokens).
  • Added: 8 commits 2610a0b..9e1a78a on v2.0-trajectory-taint (Ralph
    v0 → F4, AMENDMENT-A feasibility F5A, deslop).

Notes

  • Honesty invariants unchanged: every finding is is_candidate: true, every
    formatter line is CANDIDATE — prefixed, SARIF level is never error.
  • No new runtime dependency; the detection core stays deterministic and
    offline; the synthetic precision/recall gate still
    1.0000 / 1.0000 / FP = 0.
  • Real-world efficacy is UNPROVEN at v2.0 (stated plainly in ADR-4 and
    the PROOF). Two measured reasons: (1) MiniMax-M3 resisted all 10
    injections, so no real positive trace exists; (2) a verified
    representation/vocab gap — AgentDojo's <INFORMATION>… marker and
    structured tool-call sinks (send_money(…)) do not overlap apohara's
    text-pattern taint_source / taint_sink vocabulary, so even a
    successful trace would very likely not fire. apohara is a post-hoc
    transcript scanner (recognisable-in-log ≠ would-have-prevented), and its
    rules are vocab-scoped to shell/coding agents. Per the pre-registration
    the rules were NOT retro-fitted to AgentDojo.

Build info

  • Target: x86_64-unknown-linux-gnu (Linux only)
  • Binary: apohara-compliance-scanner-x86_64-unknown-linux-gnu
  • Source commit: 661820e055ad6c46ab433f0ba32044a2e1e669a7
  • Built: 2026-06-09 via local cargo build --release --locked

Limitations of this local build

  • Linux x86_64 only. The other 3 release targets (aarch64-apple-darwin,
    x86_64-apple-darwin, x86_64-pc-windows-msvc) require cross-compile
    setup or macOS/Windows runners that aren't available in this local build.
  • No cosign signatures (keyless OIDC signing requires GH Actions).
  • No GH artifact attestations (build provenance requires GH Actions).

The canonical multi-target release workflow is at
.github/workflows/release.yml.