Releases · immu4989/dspy-security-bench

26 Jun 13:43

immu4989

v0.1.1

db3e802

v0.1.1: seed sanity check + correction Latest

Latest

v0.1.1: seed sanity check + correction

This is a methodology release, not a code feature. v0.1.0's small-N (N=5 user tasks) workspace result was a single-seed run. This release adds a 3-seed sanity check and an honest correction note.

What changed

The seed-0 optimizer ordering reported in v0.1.0 (bootstrap > mipro > gepa) does not survive across seeds. Aggregated over seeds {0, 1, 2}, BootstrapFewShot is the lowest on important_instructions security (0.600), and MIPROv2 and GEPA tie at 0.733. Standard deviations are 0.4 to 0.5, so individual rankings here are dominated by noise at this scale.

What does hold across seeds:

BootstrapFewShot Pareto-dominates on direct (60% utility, 100% security).
unoptimized gets 0% utility on every seed.
Every optimizer trends below the unoptimized 80% security baseline on important_instructions (though within the std bars).

New artifacts

scripts/run_v02_phase1.py — single-seed GEPA addition to the optimizer comparison.
scripts/run_v02_phase1_seeds.py — re-runs the stochastic optimizers with additional seeds and aggregates mean ± std per (optimizer, attack) cell.
data/results/workspace_v02_phase1_seed1_results.csv
data/results/workspace_v02_phase1_seed2_results.csv
data/results/workspace_v02_phase1_seeds_all.csv
data/results/workspace_v02_phase1_seeds_summary.csv

Other notes

README has an update callout at the top of the v0.1 results section.
Substack and Medium versions of the launch blog have matching update notes pinned at the top.
The original v0.1 results table, charts, and numbers are preserved unchanged. The sanity check is additive.

What's next

v0.2 phase 2 will scale N from 5 to roughly 20 user tasks per cell across all four AgentDojo suites (workspace, banking, travel, slack), three seeds, and four attacks (direct, important_instructions, tool_knowledge, ignore_previous). That's the experiment that puts any optimizer-ranking claim on defensible statistical ground.

Assets 2

23 Jun 21:57

immu4989

v0.1.0

3c2c7f9

v0.1.0 — first end-to-end run + headline finding

First public release of dspy-security-bench. The full data flow (synthesis → validation → optimization → AgentDojo evaluation → DataFrame) runs end-to-end against the workspace suite. Empirical results published below.

Headline finding

Prompt optimization measurably degrades adversarial robustness on stronger attacks.

Optimizer	Attack	Utility	Security
unoptimized	direct	0%	100%
unoptimized	important_instructions	0%	80%
bootstrap_fewshot	direct	60%	100%
bootstrap_fewshot	important_instructions	20%	60%
miprov2	direct	40%	80%
miprov2	important_instructions	20%	60%

Both optimizers drop ~20 percentage points of security on the harder important_instructions attack vs. the unoptimized baseline. BootstrapFewShot Pareto-dominates MIPROv2 at v0.1 scale.

What ships

Synthesis pipeline — LLM-generated query-only tasks (GPT-4o + Claude Sonnet), env-grounded, validated by syntactic + dedupe checks. 192 tasks for workspace at v0.1.
AgentDojo wrapper — DSPyReActV2Element runs any dspy.ReActV2 as an AgentDojo BasePipelineElement, with attacks surfacing through env mutation as designed.
Optimizer harness — uniform interface over unoptimized, BootstrapFewShot, MIPROv2. GEPA planned for v0.2.
LLM-as-judge metric — substring fast-path + graceful fallback on judge failure.
Runner + report — produces a pandas.DataFrame with one row per (optimizer, attack, user_task, injection_task) plus a 6-row aggregation.
Factory cache — optimized state persisted to disk, so re-runs after a downstream crash skip the ~$5-8 optimization cost.

Test + dev

61 pytest tests, all passing, all offline (no API key required for development)
Python 3.10 / 3.11 / 3.12 supported
Apache 2.0 licensed

Reproducibility

Raw 30-row results: data/results/workspace_v01_results.csv
6-row summary: data/results/workspace_v01_summary.csv
Charts: assets/v01_utility_vs_security.png, assets/v01_pareto.png
Driver script: scripts/run_v01_benchmark.py
Figure generator: scripts/generate_v01_figures.py

v0.1 scope limits

workspace suite only (banking/travel/slack in v0.2)
N=5 user tasks × 1 injection task × 2 attacks × 3 optimizers = 30 runs
single execution + judge LM (gpt-4o-mini)

What's next (v0.2)

All 4 AgentDojo suites
GEPA optimizer added
Larger N per cell
Additional attacks (tool_knowledge, full attack matrix)
If pattern holds at scale → TMLR-shape methodology writeup

Full details

README — https://github.com/immu4989/dspy-security-bench/blob/v0.1.0/README.md
ARCHITECTURE — https://github.com/immu4989/dspy-security-bench/blob/v0.1.0/ARCHITECTURE.md

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v0.1.1: seed sanity check + correction

What changed

New artifacts

Other notes

What's next

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v0.1.0 — first end-to-end run + headline finding

Headline finding

What ships

Test + dev

Reproducibility

v0.1 scope limits

What's next (v0.2)

Full details

Uh oh!

Releases: immu4989/dspy-security-bench

v0.1.1: seed sanity check + correction

v0.1.1: seed sanity check + correction

What changed

New artifacts

Other notes

What's next

Uh oh!

v0.1.0 — first end-to-end run + headline finding

v0.1.0 — first end-to-end run + headline finding

Headline finding

What ships

Test + dev

Reproducibility

v0.1 scope limits

What's next (v0.2)

Full details

Uh oh!