The open-source benchmark for sub-call permission control in regulated vertical AI agents.
PermBench tests whether your AI agent enforces the 5 permission boundaries that matter in legal, healthcare, and financial verticals:
cross_tenant— does it leak data across matter / patient / client boundaries?scope_upgrade_silent— does it silently elevate from read to write/delete?token_forward_inherit— do sub-agents inherit credentials they shouldn't?composite_violation— do N legal calls combine into one illegal action?high_risk_action_silent— do export / delete / send execute without escalation?
This is a placeholder repo for the PermBench v0.1 launch on 2026-06-15.
The benchmark suite, scorer, leaderboard, and 120+ failure cases are landing in the next 3 weeks. Watch this repo or follow @permforge for the launch.
| Artifact | Status |
|---|---|
| 120+ scenario test cases (legal · healthcare · financial) | in progress |
| Scoring rubric mapping to EU AI Act Annex III · ABA 5.3 · HIPAA Minimum Necessary | in progress |
| Reference adapters · OpenAI Agents SDK · LangGraph · CrewAI · AutoGen | in progress |
| Public leaderboard at https://permbench.permforge.com | in progress |
| RFC v2 (taxonomy + 10-standard comparison) | done · published in private workspace |
- permforge.com — the commercial side: sub-call permission enforcement runtime + audit evidence layer.
- PermForge is the company, PermBench is its open-source benchmark sibling.
Apache License 2.0 — see LICENSE. Free to fork, run on your own agent, and cite in your AI risk review.
- Email ·
contact@permforge.com - Site · https://permforge.com
PermForge ≠ Perforce — we are AI agent permission runtime, not version control.