FEAT: Add StrongREJECT seed dataset loader by romanlutz · Pull Request #1800 · microsoft/PyRIT

romanlutz · 2026-05-25T19:28:20Z

Description

Adds the StrongREJECT (Souly et al., NeurIPS 2024) 313-prompt refusal-robustness dataset as a new remote seed loader. StrongREJECT is a widely-cited jailbreak-success benchmark, and adding it closes a gap surfaced in a broader AI red-team toolkit audit comparing PyRIT against competing frameworks.

The loader follows the _HarmBenchDataset template: single concrete _RemoteDatasetLoader subclass, pinned-commit raw GitHub URL, emits SeedObjective rows. Per-row category is preserved verbatim in harm_categories, and the upstream source column (AdvBench / DAN / HarmfulQ / MaliciousInstruct / MasterKey / "Jailbreaking via Prompt Engineering" / OpenAI System Card / custom) is preserved in metadata["strong_reject_source"] so users can filter by provenance later.

A few non-obvious design choices worth a reviewer's attention:

Tags are {"safety", "jailbreak"}, explicitly NOT "default". StrongREJECT is a jailbreak-success benchmark, not a harm-category coverage dataset, so opting it into the default set would change every scenario's default-dataset surface area without those scenarios opting in.
The companion 60-prompt strongreject_small_dataset.csv is intentionally not shipped as a sibling loader. It is a strict prompt-subset of the full set, but its metadata is hand-edited (three rows have their source rewritten to "custom" even though the same prompts are attributed to AdvBench/DAN in the full CSV). Shipping it would surface conflicting provenance for identical prompts. Users who want a smaller balanced sample can post-filter the full loader at runtime.
No scenario PR. Users compose Jailbreak --dataset-names strong_reject themselves; the StrongREJECT rubric scorer is owned by a parallel planning session.
groups=["UC Berkeley"]. The lead authors are at UC Berkeley''s Center for Human-Compatible AI (not the Center for AI Safety, which authors HarmBench).

Tests and Documentation

New tests/unit/datasets/test_strong_reject_dataset.py with 6 unit tests covering happy path, per-row metadata preservation, missing-key validation, empty-dataset validation, and class-level metadata. All pass.
Full tests/unit/datasets/ suite (500 tests) still green.
Live sanity check against the pinned CSV verified the loader produces 313 seeds across 6 categories (50/50/50/50/54/59) and 8 distinct source values (custom=221, DAN=35, AdvBench=25, MaliciousInstruct=12, HarmfulQ=11, MasterKey=3, OpenAI System Card=3, "Jailbreaking via Prompt Engineering"=3).
SeedDatasetProvider.get_all_dataset_names_async() discovers strong_reject end-to-end.
New BibTeX entry @souly2024strongreject added to doc/references.bib and doc/bibliography.md.
doc/code/datasets/1_loading_datasets.py prose updated to include StrongREJECT in the alphabetical paper list. Paired .ipynb regenerated with jupytext --to ipynb --update (markdown-only change, no execution needed).
ruff format, ruff check, and ty check all pass on changed files (verified via the pre-commit run during commit).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add StrongREJECT seed dataset loader

c4db0bb

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

hannahwestra25 reviewed May 27, 2026

View reviewed changes

Comment thread pyrit/datasets/seed_datasets/remote/strong_reject_dataset.py

hannahwestra25 approved these changes May 27, 2026

View reviewed changes

romanlutz and others added 2 commits May 30, 2026 06:44

Merge origin/main into romanlutz/plan-strongreject-benchmark

63661cc

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge origin/main into romanlutz/plan-strongreject-benchmark

8422af1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz enabled auto-merge May 30, 2026 20:45

romanlutz added this pull request to the merge queue May 30, 2026

Merged via the queue into microsoft:main with commit eabb501 May 30, 2026
48 checks passed

romanlutz deleted the romanlutz/plan-strongreject-benchmark branch May 30, 2026 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Add StrongREJECT seed dataset loader#1800

FEAT: Add StrongREJECT seed dataset loader#1800
romanlutz merged 3 commits into
microsoft:mainfrom
romanlutz:romanlutz/plan-strongreject-benchmark

romanlutz commented May 25, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

romanlutz commented May 25, 2026

Description

Tests and Documentation

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants