Skip to content

FEAT Add HarmfulQA dataset loader#1421

Merged
romanlutz merged 19 commits intomicrosoft:mainfrom
romanlutz:romanlutz/add-harmful-qa-dataset
Mar 5, 2026
Merged

FEAT Add HarmfulQA dataset loader#1421
romanlutz merged 19 commits intomicrosoft:mainfrom
romanlutz:romanlutz/add-harmful-qa-dataset

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

Add remote dataset loader for HarmfulQA (declare-lab/HarmfulQA), containing ~2k harmful questions organized by academic topic and subtopic for testing LLM susceptibility to harm-inducing question-answering.

Copilot AI review requested due to automatic review settings March 1, 2026 14:14
@romanlutz romanlutz force-pushed the romanlutz/add-harmful-qa-dataset branch from f8de803 to e996238 Compare March 1, 2026 14:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new remote seed-dataset provider for the HuggingFace declare-lab/HarmfulQA dataset so it can be fetched and used as SeedPrompt entries within PyRIT’s dataset discovery/registration system.

Changes:

  • Introduced _HarmfulQADataset remote loader that fetches HarmfulQA from HuggingFace and converts rows into SeedPrompts.
  • Exported the new loader from pyrit.datasets.seed_datasets.remote to trigger auto-registration.
  • Added unit tests validating basic fetch + conversion behavior and dataset_name.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pyrit/datasets/seed_datasets/remote/harmful_qa_dataset.py New remote dataset loader implementation for HarmfulQA -> SeedDataset/SeedPrompt conversion.
pyrit/datasets/seed_datasets/remote/init.py Re-export/import the new loader so it’s discoverable/registered alongside other remote loaders.
tests/unit/datasets/test_harmful_qa_dataset.py Unit tests for fetching/conversion and dataset_name behavior.

@romanlutz romanlutz force-pushed the romanlutz/add-harmful-qa-dataset branch from e996238 to d441180 Compare March 1, 2026 14:26
Add remote dataset loader for HarmfulQA (declare-lab/HarmfulQA), containing ~2k
harmful questions organized by academic topic and subtopic for testing LLM
susceptibility to harm-inducing question-answering.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 13:00
@romanlutz romanlutz force-pushed the romanlutz/add-harmful-qa-dataset branch from d441180 to b4c033f Compare March 2, 2026 13:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

romanlutz and others added 2 commits March 2, 2026 05:36
The HF dataset identifier is now a class constant HF_DATASET_NAME
instead of a constructor parameter, consistent with other loaders.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 13:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

romanlutz and others added 2 commits March 2, 2026 05:53
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 14:02
romanlutz and others added 2 commits March 2, 2026 06:05
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

romanlutz and others added 3 commits March 2, 2026 14:44
…-qa-dataset

# Conflicts:
#	doc/code/datasets/1_loading_datasets.ipynb
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 3, 2026 00:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

romanlutz and others added 2 commits March 2, 2026 19:56
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 3, 2026 04:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

romanlutz and others added 2 commits March 2, 2026 21:01
Copilot AI review requested due to automatic review settings March 3, 2026 05:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@varunj-msft
Copy link
Copy Markdown
Contributor

Reviewed the PR - code follows established patterns, tests pass,, fetches all seeds from HuggingFace successfully. Looks good to me, ready to merge.

Copilot AI review requested due to automatic review settings March 4, 2026 06:03
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


You can also share your feedback on Copilot code review. Take the survey.

@hannahwestra25 hannahwestra25 self-assigned this Mar 4, 2026
romanlutz and others added 2 commits March 4, 2026 13:03
…-qa-dataset

# Conflicts:
#	doc/code/datasets/1_loading_datasets.ipynb
Includes beaver_tails and harmful_qa in the dataset listing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz romanlutz merged commit e3f03c2 into microsoft:main Mar 5, 2026
38 checks passed
riyosha pushed a commit to riyosha/PyRIT that referenced this pull request Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants