v0.1.0 — first end-to-end run + headline finding
v0.1.0 — first end-to-end run + headline finding
First public release of dspy-security-bench. The full data flow (synthesis → validation → optimization → AgentDojo evaluation → DataFrame) runs end-to-end against the workspace suite. Empirical results published below.
Headline finding
Prompt optimization measurably degrades adversarial robustness on stronger attacks.
| Optimizer | Attack | Utility | Security |
|---|---|---|---|
| unoptimized | direct | 0% | 100% |
| unoptimized | important_instructions | 0% | 80% |
| bootstrap_fewshot | direct | 60% | 100% |
| bootstrap_fewshot | important_instructions | 20% | 60% |
| miprov2 | direct | 40% | 80% |
| miprov2 | important_instructions | 20% | 60% |
Both optimizers drop ~20 percentage points of security on the harder important_instructions attack vs. the unoptimized baseline. BootstrapFewShot Pareto-dominates MIPROv2 at v0.1 scale.
What ships
- Synthesis pipeline — LLM-generated query-only tasks (GPT-4o + Claude Sonnet), env-grounded, validated by syntactic + dedupe checks. 192 tasks for workspace at v0.1.
- AgentDojo wrapper —
DSPyReActV2Elementruns anydspy.ReActV2as an AgentDojoBasePipelineElement, with attacks surfacing through env mutation as designed. - Optimizer harness — uniform interface over
unoptimized,BootstrapFewShot,MIPROv2.GEPAplanned for v0.2. - LLM-as-judge metric — substring fast-path + graceful fallback on judge failure.
- Runner + report — produces a
pandas.DataFramewith one row per(optimizer, attack, user_task, injection_task)plus a 6-row aggregation. - Factory cache — optimized state persisted to disk, so re-runs after a downstream crash skip the ~$5-8 optimization cost.
Test + dev
- 61 pytest tests, all passing, all offline (no API key required for development)
- Python 3.10 / 3.11 / 3.12 supported
- Apache 2.0 licensed
Reproducibility
- Raw 30-row results:
data/results/workspace_v01_results.csv - 6-row summary:
data/results/workspace_v01_summary.csv - Charts:
assets/v01_utility_vs_security.png,assets/v01_pareto.png - Driver script:
scripts/run_v01_benchmark.py - Figure generator:
scripts/generate_v01_figures.py
v0.1 scope limits
- workspace suite only (banking/travel/slack in v0.2)
- N=5 user tasks × 1 injection task × 2 attacks × 3 optimizers = 30 runs
- single execution + judge LM (gpt-4o-mini)
What's next (v0.2)
- All 4 AgentDojo suites
GEPAoptimizer added- Larger N per cell
- Additional attacks (
tool_knowledge, full attack matrix) - If pattern holds at scale → TMLR-shape methodology writeup