FEAT Add JailbreakV_28k dataset from HF by adrian-gavrila · Pull Request #1098 · microsoft/PyRIT

adrian-gavrila · 2025-09-22T22:09:48Z

Description

This PR adds support for the JailbreakV_28k dataset to PyRIT.
One notable departure from multimodal dataset fetching present here is that we need a local download of the images via a Google Drive download provided by the owners of the HF dataset. The share link to the zip file is in the function comments and this function does not work without this being downloaded locally due to the number of images missing in HF.
Unzipping if the extracted file is not present at the provided path is handled, as of right now we do not use HF at all for image download due to the large number of missing images so the zip directory is a mandatory parameter.

Addresses https://github.com/Azure/PyRIT/issues/1007

Changes Made:

Added integration for JailbreakV_28k
Normalizes and associates the datasets "policy" column with harm-category
Allows for filtering on harm categories (policy values)

Files Added/Modified:

pyrit/datasets/fetch_jailbreakv_28k_dataset.py - Main implementation
pyrit/datasets/init.py - Added exports for new functions
tests/unit/datasets/test_fetch_jailbreakv_28k_dataset.py - Unit tests
tests\integration\datasets\test_fetch_datasets.py - Integration tests added

Tests and Documentation

PyTest parametrized testing for filtering and choice of text field (dataset has jailbreak and redteaming prompts)
Dataset mocking with both text fields and policy mapped to harm_category

romanlutz

Thanks for getting started on this!

The integration test for datasets is missing, but I suspect it will require a custom one as the dataset is meant to be multimodal (see other comment).

romanlutz

Great work! Two small adjustments and we're ready to merge.

romanlutz · 2025-12-07T14:08:47Z

@AdrGav941 a lot changed in datasets the last couple of weeks. We should have really tried to merge it before the changes but didn't quite get to it. Please let me know if you want to make the changes yourself or if we should make the change.

adrian-gavrila · 2025-12-07T14:23:30Z

@romanlutz im happy to make the changes, I'm on vacation until the 19th but can get it working again when i get back!

romanlutz · 2025-12-07T14:53:22Z

@romanlutz im happy to make the changes, I'm on vacation until the 19th but can get it working again when i get back!

No hurry 🙂

…kV_28K_dataset' into add__HF_jailbreakV_28K_dataset

romanlutz

Couple minor comments, otherwise this looks good to me. Just need to try it out once to make sure it works.

romanlutz · 2026-01-04T23:58:29Z

Was just trying this out. Downloaded the zip file, put it in the home directory, and then ran it.

README.md: 7.27kB [00:00, 15.4MB/s]
mini_JailBreakV_28K.csv: 230kB [00:00, 3.45MB/s]
JailBreakV_28K/JailBreakV_28K.csv: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 23.2M/23.2M [00:02<00:00, 9.00MB/s]
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/workspace/pyrit/datasets/seed_datasets/remote/jailbreakv_28k_dataset.py", line 245, in fetch_dataset
raise ValueError(
ValueError: JailBreakV-28K fetch failed: 100.0% of items are missing images (280 out of 280 items processed). Only 0 valid pairs were created. At least 50% of items must have valid images. Please ensure the ZIP file contains the full image set.

Have you seen this before? This is on Linux (devcontainer). On Windows it works for me.

What confuses me, though, is that I got 280 pairs (560 total) with "mini" and 28000 (56000 total) with the full split, yet the zip file has the following folders for images

query_related with 6001 items (which maps to 6k rows in the CSV)
llm_transfer_attack with 6002 items (which maps to 20k rows in the CSV, 5k of them are just using the blank image, about 2.8k of them are used more than once and up to 17 times, the remaining are ~1k are used just once and curiously there are also ~2.2k that are never used at all)
figstep with 4000 items (which maps to 2k rows, apparently none of the images with name "query_image_*" are used)

I guess we can ignore the question of why they decided to put it together this way for this PR since it's not about "what to select from this" yet (that would be a follow-up task). I would, however, like to capture the metadata here:

policy is already captured via the harm categories, but the others... I imagine we'll do something in this direction in the not too distant future and being able to trace it back to the original dataset could prove helpful.

Somewhat concerning: ~~I've found that many repetitions of images have the same text prompts ("redteam_query") as well. The difference is only in the "jailbreak_query". ~~ Figured it out! The paper explains this fairly well:

So here's what I'm thinking: jailbreak_query maps to what we call SeedPrompt (i.e., the text prompt being sent) and redteam_query maps to what we call SeedObjective (in other words: the goal behind what the text+image is trying to achieve)

This leaves us with a few options:

We provide the jailbreak_query as SeedPrompt and ignore redteam_query for this dataset. That means we give people exactly the things the dataset provides to send to a target.
We provide additionally the redteam_query as SeedObjective. This is preferable even if we don't send it to the target because it'll help in scoring. The scorer works a lot better when the objective is clearly spelled out and some of the jailbreak_query contents are obfuscated (on purpose).
Additionally, provide a dataset of just the objectives. This would be enormously useful for AI-led attacks as they need good representative objectives. they reference RedTeam-2K a ton in this as the pre-step. I would love to provide that additionally as a separate dataset. See this distribution by topic (nice!):
There's a separate CSV file in the zip for this and it has 2K (as the name says) rows. I checked for the number of unique redteam_query items and those are also 2k so I'm willing to bet they match (I checked a few but not all).

I think we want to go with number 2 AND 3, but as separate fetchers.

Separate note: we don't have an attack where there's an objective and the adversarial target generates both the text AND the image for a multi-modal attack on an objective_target. I've wanted that for a while and this should happen sometime soon 😆

…eed, fixing casing issue impacting Linux

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Use _validate_enums helper instead of bespoke validation - Preserve source casing in per-seed harm_categories - Use canonical empty-result error string - Add unit tests (28K: 92% coverage, RedTeam-2K: 100% coverage) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…tebook - Add @luo2024jailbreakv BibTeX entry and cite key, reference from both loader docstrings per dataset instructions - Fix RedTeam-2K _HarmCategory.CHILD_ABUSE_CONTENT -> CHILD_ABUSE so the filter actually matches the upstream policy value ('Child Abuse' in the RedTeam-2K config, vs 'Child Abuse Content' in the 28K config) - Regenerate 1_loading_datasets.ipynb so the get_all_dataset_names_async output cell picks up jailbreakv_28k and jailbreakv_redteam_2k Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The 16GB image set lives behind a Google Form -> Google Drive link and cannot be auto-fetched in CI. Mirror the existing PromptIntel pattern in test_all_datasets.py and skip the parameterized fetch when ~/JailBreakV_28K.zip isn't present. The text-only RedTeam-2K sibling fetches from the public HF metadata and continues to run unconditionally. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…K_dataset # Conflicts: # doc/bibliography.md # doc/references.bib # tests/end_to_end/test_all_datasets.py

…o 100% - doc/references.bib: drop -28K from the BibTeX title so it renders cleanly ({JailBreakV} instead of {JailBreakV-28K}) and swap author order to match the upstream README (Xiaoyu Guo before Chaowei Xiao). - Mirror the corrected author order in both loaders' docstrings, SeedObjective, and SeedPrompt authors lists. - Add three unit tests to cover the previously-untested 9 lines in jailbreakv_28k_dataset.py (zip-extract branch, empty image_path row, _resolve_image_path exception fallback). Coverage now 100%. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz reviewed Sep 23, 2025

View reviewed changes

Comment thread pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated

Comment thread pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated

Comment thread pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated

hannahwestra25 reviewed Sep 23, 2025

View reviewed changes

Comment thread pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated

romanlutz reviewed Sep 26, 2025

View reviewed changes

romanlutz self-assigned this Sep 28, 2025

romanlutz mentioned this pull request Oct 7, 2025

FEAT: add support for multimodal data from HarmBench #1110

Merged

romanlutz reviewed Oct 22, 2025

View reviewed changes

Comment thread pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated

romanlutz reviewed Oct 22, 2025

View reviewed changes

Comment thread pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated

Comment thread pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated

romanlutz reviewed Oct 28, 2025

View reviewed changes

Comment thread tests/integration/datasets/test_fetch_datasets.py Outdated

Comment thread pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated

adrian-gavrila requested a review from romanlutz November 12, 2025 18:52

Restructuring JailbreakV dataset to work with overall dataset refactor

f117a58

adrian-gavrila force-pushed the add__HF_jailbreakV_28K_dataset branch from 8c52bc6 to f117a58 Compare December 29, 2025 22:46

adrian-gavrila and others added 4 commits December 29, 2025 17:47

Merge branch 'main' into add__HF_jailbreakV_28K_dataset

f93fbc7

Pre-commit hooks

2e5d6cb

Merge remote-tracking branch 'refs/remotes/adrgav941/add__HF_jailbrea…

6298c64

…kV_28K_dataset' into add__HF_jailbreakV_28K_dataset

Merge branch 'main' into add__HF_jailbreakV_28K_dataset

94e6727

romanlutz reviewed Jan 3, 2026

View reviewed changes

Comment thread pyrit/datasets/seed_datasets/remote/jailbreakv_28k_dataset.py Outdated

romanlutz reviewed Jan 3, 2026

View reviewed changes

Comment thread tests/integration/datasets/test_seed_dataset_provider_integration.py Outdated

romanlutz approved these changes Jan 3, 2026

View reviewed changes

Comment clarity and making category enum private

ea87e90

romanlutz reviewed Jan 6, 2026

View reviewed changes

Comment thread pyrit/datasets/seed_datasets/remote/jailbreakv_28k_dataset.py Outdated

Adrian Gavrila and others added 3 commits January 7, 2026 11:25

Removing text field specification, adding redteam query as ObjectiveS…

48ec348

…eed, fixing casing issue impacting Linux

Adding RedTeam_2K subset of JailbreakV as separate fetcher

cce6ed7

Merge branch 'main' into add__HF_jailbreakV_28K_dataset

e60297d

romanlutz mentioned this pull request Mar 28, 2026

FEAT: Add JailBreakV-28K dataset loader #1548

Closed

BaedrianG and others added 2 commits May 29, 2026 17:05

Merge branch 'main' into add__HF_jailbreakV_28K_dataset

f377c6c

Modernize JailbreakV loaders for current conventions

8ddc37a

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

adrian-gavrila force-pushed the add__HF_jailbreakV_28K_dataset branch from 86bed80 to 8ddc37a Compare May 29, 2026 21:17

BaedrianG and others added 2 commits May 29, 2026 17:37

romanlutz approved these changes Jun 1, 2026

View reviewed changes

romanlutz and others added 2 commits June 1, 2026 12:07

Merge branch 'main' into add__HF_jailbreakV_28K_dataset

4b4e77e

romanlutz reviewed Jun 1, 2026

View reviewed changes

Comment thread doc/references.bib Outdated

romanlutz reviewed Jun 1, 2026

View reviewed changes

Comment thread doc/references.bib Outdated

romanlutz and others added 2 commits June 1, 2026 17:27

Merge remote-tracking branch 'origin/main' into add__HF_jailbreakV_28…

5a9a4d0

…K_dataset # Conflicts: # doc/bibliography.md # doc/references.bib # tests/end_to_end/test_all_datasets.py

romanlutz added this pull request to the merge queue Jun 2, 2026

Merged via the queue into microsoft:main with commit 135a146 Jun 2, 2026
48 checks passed

Conversation

adrian-gavrila commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests and Documentation

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

romanlutz commented Dec 7, 2025

Uh oh!

adrian-gavrila commented Dec 7, 2025

Uh oh!

romanlutz commented Dec 7, 2025

Uh oh!

Uh oh!

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

romanlutz commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adrian-gavrila commented Sep 22, 2025 •

edited

Loading

romanlutz commented Jan 4, 2026 •

edited

Loading