Release v0.1.0 — first end-to-end run + headline finding · immu4989/dspy-security-bench

v0.1.0 — first end-to-end run + headline finding

First public release of dspy-security-bench. The full data flow (synthesis → validation → optimization → AgentDojo evaluation → DataFrame) runs end-to-end against the workspace suite. Empirical results published below.

Headline finding

Prompt optimization measurably degrades adversarial robustness on stronger attacks.

Optimizer	Attack	Utility	Security
unoptimized	direct	0%	100%
unoptimized	important_instructions	0%	80%
bootstrap_fewshot	direct	60%	100%
bootstrap_fewshot	important_instructions	20%	60%
miprov2	direct	40%	80%
miprov2	important_instructions	20%	60%

Both optimizers drop ~20 percentage points of security on the harder important_instructions attack vs. the unoptimized baseline. BootstrapFewShot Pareto-dominates MIPROv2 at v0.1 scale.

What ships

Synthesis pipeline — LLM-generated query-only tasks (GPT-4o + Claude Sonnet), env-grounded, validated by syntactic + dedupe checks. 192 tasks for workspace at v0.1.
AgentDojo wrapper — DSPyReActV2Element runs any dspy.ReActV2 as an AgentDojo BasePipelineElement, with attacks surfacing through env mutation as designed.
Optimizer harness — uniform interface over unoptimized, BootstrapFewShot, MIPROv2. GEPA planned for v0.2.
LLM-as-judge metric — substring fast-path + graceful fallback on judge failure.
Runner + report — produces a pandas.DataFrame with one row per (optimizer, attack, user_task, injection_task) plus a 6-row aggregation.
Factory cache — optimized state persisted to disk, so re-runs after a downstream crash skip the ~$5-8 optimization cost.

Test + dev

61 pytest tests, all passing, all offline (no API key required for development)
Python 3.10 / 3.11 / 3.12 supported
Apache 2.0 licensed

Reproducibility

Raw 30-row results: data/results/workspace_v01_results.csv
6-row summary: data/results/workspace_v01_summary.csv
Charts: assets/v01_utility_vs_security.png, assets/v01_pareto.png
Driver script: scripts/run_v01_benchmark.py
Figure generator: scripts/generate_v01_figures.py

v0.1 scope limits

workspace suite only (banking/travel/slack in v0.2)
N=5 user tasks × 1 injection task × 2 attacks × 3 optimizers = 30 runs
single execution + judge LM (gpt-4o-mini)

What's next (v0.2)

All 4 AgentDojo suites
GEPA optimizer added
Larger N per cell
Additional attacks (tool_knowledge, full attack matrix)
If pattern holds at scale → TMLR-shape methodology writeup

Full details

README — https://github.com/immu4989/dspy-security-bench/blob/v0.1.0/README.md
ARCHITECTURE — https://github.com/immu4989/dspy-security-bench/blob/v0.1.0/ARCHITECTURE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.0 — first end-to-end run + headline finding

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v0.1.0 — first end-to-end run + headline finding

Headline finding

What ships

Test + dev

Reproducibility

v0.1 scope limits

What's next (v0.2)

Full details

Uh oh!