FEAT: Add harm_categories to Babelscape ALERT dataset (#449) by CheerathAniketh · Pull Request #1551 · microsoft/PyRIT

CheerathAniketh · 2026-03-29T08:59:42Z

Description

Fixes #449

Added harm_categories field to the Babelscape ALERT dataset fetcher,
as requested by @romanlutz in the issue discussion.

Each prompt now includes its harm category from the dataset:
{"category": "crime_injury", "prompt": "..."} → harm_categories=["crime_injury"]

Tests and Documentation

Added tests/unit/datasets/test_babelscape_alert_dataset.py with 4 unit tests
Tests cover: dataset loading, harm_categories population, dataset name, and invalid category validation

romanlutz

LGTM! A maintainer needs to run integration tests for this before merging, though, and check that the harm category shows up.

behnam-o · 2026-03-30T22:18:47Z

LGTM! A maintainer needs to run integration tests for this before merging, though, and check that the harm category shows up.

I was able to do a quick test in a notebook, load the dataset, and verify all seeds have harm_categories correctly populated:

verification code:

from collections import Counter

datasets = await SeedDatasetProvider.fetch_datasets_async(dataset_names=["babelscape_alert"])
babel=datasets[0]
print(f"Total seeds: {len(babel.seeds)}")

# Unique harm categories
all_categories = [cat for seed in babel.seeds for cat in seed.harm_categories]
unique_categories = set(all_categories)
print(f"\nUnique harm categories: {len(unique_categories)}")

# Distribution of seeds per harm category

category_counts = Counter(all_categories)
print("\nSeeds per harm category:")
for cat, count in category_counts.most_common():
    print(f"  {cat}: {count}")

# Distribution of how many harm categories each seed has
num_categories_per_seed = [len(seed.harm_categories) for seed in babel.seeds]
category_count_distribution = Counter(num_categories_per_seed)
print("\nDistribution of harm categories per seed:")
for num_cats, count in sorted(category_count_distribution.items()):
    print(f"  {num_cats} harm_categories: {count} seeds")

output:

Total seeds: 30968

Unique harm categories: 32

Seeds per harm category:
  crime_injury: 3555
  hate_ethnic: 2950
  crime_theft: 2843
  crime_propaganda: 2823
  hate_other: 2559
  hate_women: 1464
  substance_drug: 1362
  substance_other: 1114
  crime_cyber: 1048
  hate_religion: 1040
  weapon_other: 985
  substance_alcohol: 877
  hate_lgbtq+: 790
  sex_harassment: 700
  sex_other: 637
  crime_other: 619
  crime_tax: 574
  weapon_chemical: 561
  crime_kidnap: 554
  substance_cannabis: 547
  weapon_biological: 478
  crime_privacy: 437
  hate_body: 328
  self_harm_suicide: 311
  hate_disabled: 293
  sex_porn: 289
  weapon_radioactive: 280
  substance_tobacco: 264
  weapon_firearm: 224
  self_harm_thin: 186
  hate_poor: 180
  self_harm_other: 96

Distribution of harm categories per seed:
  1 harm_categories: 30968 seeds

FEAT: Add harm_categories to Babelscape ALERT dataset (microsoft#449)

e57b90a

CheerathAniketh mentioned this pull request Mar 29, 2026

FEAT Add Babelscape/ALERT Dataset #449

Open

romanlutz approved these changes Mar 29, 2026

View reviewed changes

FIX: add newline at end of test file

34d9d5b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Add harm_categories to Babelscape ALERT dataset (#449)#1551

FEAT: Add harm_categories to Babelscape ALERT dataset (#449)#1551
CheerathAniketh wants to merge 2 commits intomicrosoft:mainfrom
CheerathAniketh:fix/alert-harm-categories

CheerathAniketh commented Mar 29, 2026

Uh oh!

romanlutz left a comment

Uh oh!

behnam-o commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

CheerathAniketh commented Mar 29, 2026

Description

Tests and Documentation

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

behnam-o commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants