Skip to content

v0.8.2 — Metis-inspired corpus block-rate regression + airlock corpus-bench CLI

Choose a tag to compare

@sattyamjjain sattyamjjain released this 18 May 17:38
· 31 commits to main since this release
ac7cce2

Monday cut on top of v0.8.1. Minor bump — one new release-gate primitive (MetisInspiredCorpusBlockRateGuard), one CLI subcommand (airlock corpus-bench). No breaking changes.

Honest framing (called out before the code)

This release is inspired by the Metis paper (arXiv:2605.10067, ICML 2026) but does not reproduce its POMDP attacker. Metis measures response-level Attack Success Rate (ASR) on a closed-loop LLM; agent-airlock validates tool-call arguments and never sees the model's response — the threat models do not compose. What v0.8.2 ships instead:

  • A deterministic exploit-shape corpus (25 entries: 17 exploit-shape + 8 benign baseline)
  • A block-rate (inverse of ASR) metric: block_rate = blocked_count / total_prompts
  • A one-sided downward gate: fires when block_rate < baseline_block_rate - drift_threshold (default 5%)

The Metis paper is cited as motivation for adopting a structured failure-mode taxonomy as a release-gate input — not as a source of prompts.

ADD-1 — MetisInspiredCorpusBlockRateGuard

  • Module: agent_airlock.regression_corpus
  • Default chain: EvalRCEGuard + StdioCommandInjectionGuard
  • Corpus: tests/cves/corpora/metis_inspired_corpus_2026_05_18.json
  • Baseline locked at first run: 0.68 block rate (17/25)
  • Anchors: CVE-2026-44717 (eval RCE) + 2026-05-05 MCP STDIO injection class
  • Decision dataclass mirrors v0.7.x / v0.8.x family — allowed: bool for chain-friendly composition
  • Factory: policy_presets.metis_inspired_corpus_block_rate_regression_defaults_2026_05_18
  • Doc: docs/policies/metis-inspired-corpus-block-rate.md

ADD-2 — airlock corpus-bench CLI

python -m agent_airlock.cli.corpus_bench \
    --corpus-path tests/cves/corpora/metis_inspired_corpus_2026_05_18.json \
    --report json

Exit codes: 0 gate pass · 1 generic error · 2 argparse usage · 3 gate FAILED. structlog output routed to stderr so stdout stays clean for | jq.

NOTE — Suggestion 3 deferred

The 2026-05-18 Product Improvements doc proposed a policy_presets.microsoft_agt_compat interop preset for the Microsoft Agent Governance Toolkit (launched 2026-04-02). The doc itself tagged it [major-needs-decision] and deferred the prompt. v0.8.2 does NOT include it — the strategic question is logged at ROADMAP_2026.md#post-v082-strategic-question-2026-05-18 for resolution.

Stats

  • 2,438 tests · 82.94% coverage (gate 82%)
  • CI 7/7 green (lint, security, GitGuardian, test 3.10/3.11/3.12/3.13)
  • Surface additions in __all__: 5 new symbols + 1 new factory function

Install

pip install agent-airlock==0.8.2