Skip to content

v0.1.0 — first end-to-end run + headline finding

Choose a tag to compare

@immu4989 immu4989 released this 23 Jun 21:57
· 12 commits to main since this release

v0.1.0 — first end-to-end run + headline finding

First public release of dspy-security-bench. The full data flow (synthesis → validation → optimization → AgentDojo evaluation → DataFrame) runs end-to-end against the workspace suite. Empirical results published below.

Headline finding

Prompt optimization measurably degrades adversarial robustness on stronger attacks.

Optimizer Attack Utility Security
unoptimized direct 0% 100%
unoptimized important_instructions 0% 80%
bootstrap_fewshot direct 60% 100%
bootstrap_fewshot important_instructions 20% 60%
miprov2 direct 40% 80%
miprov2 important_instructions 20% 60%

Both optimizers drop ~20 percentage points of security on the harder important_instructions attack vs. the unoptimized baseline. BootstrapFewShot Pareto-dominates MIPROv2 at v0.1 scale.

What ships

  • Synthesis pipeline — LLM-generated query-only tasks (GPT-4o + Claude Sonnet), env-grounded, validated by syntactic + dedupe checks. 192 tasks for workspace at v0.1.
  • AgentDojo wrapperDSPyReActV2Element runs any dspy.ReActV2 as an AgentDojo BasePipelineElement, with attacks surfacing through env mutation as designed.
  • Optimizer harness — uniform interface over unoptimized, BootstrapFewShot, MIPROv2. GEPA planned for v0.2.
  • LLM-as-judge metric — substring fast-path + graceful fallback on judge failure.
  • Runner + report — produces a pandas.DataFrame with one row per (optimizer, attack, user_task, injection_task) plus a 6-row aggregation.
  • Factory cache — optimized state persisted to disk, so re-runs after a downstream crash skip the ~$5-8 optimization cost.

Test + dev

  • 61 pytest tests, all passing, all offline (no API key required for development)
  • Python 3.10 / 3.11 / 3.12 supported
  • Apache 2.0 licensed

Reproducibility

  • Raw 30-row results: data/results/workspace_v01_results.csv
  • 6-row summary: data/results/workspace_v01_summary.csv
  • Charts: assets/v01_utility_vs_security.png, assets/v01_pareto.png
  • Driver script: scripts/run_v01_benchmark.py
  • Figure generator: scripts/generate_v01_figures.py

v0.1 scope limits

  • workspace suite only (banking/travel/slack in v0.2)
  • N=5 user tasks × 1 injection task × 2 attacks × 3 optimizers = 30 runs
  • single execution + judge LM (gpt-4o-mini)

What's next (v0.2)

  • All 4 AgentDojo suites
  • GEPA optimizer added
  • Larger N per cell
  • Additional attacks (tool_knowledge, full attack matrix)
  • If pattern holds at scale → TMLR-shape methodology writeup

Full details