apohara-compliance v2.0.0 — Trajectory Taint-Correlation Detection
v2.0.0 — Trajectory Taint-Correlation Detection (ADR-4)
Additive — a new deterministic taint engine runs AFTER the single-action
loop AND after the ADR-2 sequence pass. It expresses the injection →
consequence dataflow the single-action engine cannot: a TAINTED source
(an action on the untrusted-data tool-result: channel carrying injection
markers, AND not a doc/comment quote) FOLLOWED BY a genuine sensitive
real-action sink (exfil / destructive / financial) later in the same action
stream (forward-correlated: the taint persists across intervening steps).
Added
- New module
crates/scanner/src/taint.rs— the deterministic
taint-correlation engine. Self-contained by design (ADR-4 OQ1): copies
the smallCompiledStep/step_matchshape fromsequence.rsrather
than sharing a helper, to keep zero blast-radius on the CRITICAL
matching.rsand the livesequence.rsAGT-MEM-001 path. - New rules (rule count 17 → 20):
AGT-TRJ-001(injection + sensitive sink,
base),AGT-TRJ-002(exfil sink family),AGT-TRJ-003(destructive sink
family). - A10 live capture (pre-registration + smoke): the committed AgentDojo
corpus + a bounded live capture on AgentDojo banking-suite with
MiniMax-M3 (OpenRouter adapter), attackimportant_instructions,
10 attacked pairs + 2 benign. Real-world result: 0 / 10 attack-success
on MiniMax (the model refused every indirect injection); 28 API calls,
65,550 tokens; real-usage proof. - Synthetic positive (
trj-agentdojo-async-injection.jsonl+ friends) fires
AGT-TRJ-001 / 002 / 003 via the real binary; the FinBot direct-injection
fixture (negative control) and benign-trajectory traps fire zero. - Pre-registration:
tests/corpus/PREREG-v2-agentdojo.md(frozen before
scanning). Proof:tests/corpus/PROOF-v2-minimax.md(the real-world
0 / 10 + 65,550 tokens). - Added: 8 commits
2610a0b..9e1a78aonv2.0-trajectory-taint(Ralph
v0 → F4, AMENDMENT-A feasibility F5A, deslop).
Notes
- Honesty invariants unchanged: every finding is
is_candidate: true, every
formatter line isCANDIDATE —prefixed, SARIFlevelis nevererror. - No new runtime dependency; the detection core stays deterministic and
offline; the synthetic precision/recall gate still
1.0000 / 1.0000 / FP = 0. - Real-world efficacy is UNPROVEN at v2.0 (stated plainly in ADR-4 and
the PROOF). Two measured reasons: (1) MiniMax-M3 resisted all 10
injections, so no real positive trace exists; (2) a verified
representation/vocab gap — AgentDojo's<INFORMATION>…marker and
structured tool-call sinks (send_money(…)) do not overlap apohara's
text-patterntaint_source/taint_sinkvocabulary, so even a
successful trace would very likely not fire. apohara is a post-hoc
transcript scanner (recognisable-in-log ≠ would-have-prevented), and its
rules are vocab-scoped to shell/coding agents. Per the pre-registration
the rules were NOT retro-fitted to AgentDojo.
Build info
- Target:
x86_64-unknown-linux-gnu(Linux only) - Binary:
apohara-compliance-scanner-x86_64-unknown-linux-gnu - Source commit:
661820e055ad6c46ab433f0ba32044a2e1e669a7 - Built: 2026-06-09 via local
cargo build --release --locked
Limitations of this local build
- Linux x86_64 only. The other 3 release targets (
aarch64-apple-darwin,
x86_64-apple-darwin,x86_64-pc-windows-msvc) require cross-compile
setup or macOS/Windows runners that aren't available in this local build. - No cosign signatures (keyless OIDC signing requires GH Actions).
- No GH artifact attestations (build provenance requires GH Actions).
The canonical multi-target release workflow is at
.github/workflows/release.yml.