v0.8.2 — Metis-inspired corpus block-rate regression + airlock corpus-bench CLI
Monday cut on top of v0.8.1. Minor bump — one new release-gate primitive (MetisInspiredCorpusBlockRateGuard), one CLI subcommand (airlock corpus-bench). No breaking changes.
Honest framing (called out before the code)
This release is inspired by the Metis paper (arXiv:2605.10067, ICML 2026) but does not reproduce its POMDP attacker. Metis measures response-level Attack Success Rate (ASR) on a closed-loop LLM; agent-airlock validates tool-call arguments and never sees the model's response — the threat models do not compose. What v0.8.2 ships instead:
- A deterministic exploit-shape corpus (25 entries: 17 exploit-shape + 8 benign baseline)
- A block-rate (inverse of ASR) metric:
block_rate = blocked_count / total_prompts - A one-sided downward gate: fires when
block_rate < baseline_block_rate - drift_threshold(default 5%)
The Metis paper is cited as motivation for adopting a structured failure-mode taxonomy as a release-gate input — not as a source of prompts.
ADD-1 — MetisInspiredCorpusBlockRateGuard
- Module:
agent_airlock.regression_corpus - Default chain:
EvalRCEGuard+StdioCommandInjectionGuard - Corpus:
tests/cves/corpora/metis_inspired_corpus_2026_05_18.json - Baseline locked at first run: 0.68 block rate (17/25)
- Anchors: CVE-2026-44717 (eval RCE) + 2026-05-05 MCP STDIO injection class
- Decision dataclass mirrors v0.7.x / v0.8.x family —
allowed: boolfor chain-friendly composition - Factory:
policy_presets.metis_inspired_corpus_block_rate_regression_defaults_2026_05_18 - Doc:
docs/policies/metis-inspired-corpus-block-rate.md
ADD-2 — airlock corpus-bench CLI
python -m agent_airlock.cli.corpus_bench \
--corpus-path tests/cves/corpora/metis_inspired_corpus_2026_05_18.json \
--report jsonExit codes: 0 gate pass · 1 generic error · 2 argparse usage · 3 gate FAILED. structlog output routed to stderr so stdout stays clean for | jq.
NOTE — Suggestion 3 deferred
The 2026-05-18 Product Improvements doc proposed a policy_presets.microsoft_agt_compat interop preset for the Microsoft Agent Governance Toolkit (launched 2026-04-02). The doc itself tagged it [major-needs-decision] and deferred the prompt. v0.8.2 does NOT include it — the strategic question is logged at ROADMAP_2026.md#post-v082-strategic-question-2026-05-18 for resolution.
Stats
- 2,438 tests · 82.94% coverage (gate 82%)
- CI 7/7 green (lint, security, GitGuardian, test 3.10/3.11/3.12/3.13)
- Surface additions in
__all__: 5 new symbols + 1 new factory function
Install
pip install agent-airlock==0.8.2