Skip to content

FEAT: Add harm_categories to Babelscape ALERT dataset (#449)#1551

Open
CheerathAniketh wants to merge 2 commits intomicrosoft:mainfrom
CheerathAniketh:fix/alert-harm-categories
Open

FEAT: Add harm_categories to Babelscape ALERT dataset (#449)#1551
CheerathAniketh wants to merge 2 commits intomicrosoft:mainfrom
CheerathAniketh:fix/alert-harm-categories

Conversation

@CheerathAniketh
Copy link
Copy Markdown

Description

Fixes #449

Added harm_categories field to the Babelscape ALERT dataset fetcher,
as requested by @romanlutz in the issue discussion.

Each prompt now includes its harm category from the dataset:
{"category": "crime_injury", "prompt": "..."}harm_categories=["crime_injury"]

Tests and Documentation

  • Added tests/unit/datasets/test_babelscape_alert_dataset.py with 4 unit tests
  • Tests cover: dataset loading, harm_categories population, dataset name, and invalid category validation

Copy link
Copy Markdown
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A maintainer needs to run integration tests for this before merging, though, and check that the harm category shows up.

@behnam-o
Copy link
Copy Markdown
Contributor

LGTM! A maintainer needs to run integration tests for this before merging, though, and check that the harm category shows up.

I was able to do a quick test in a notebook, load the dataset, and verify all seeds have harm_categories correctly populated:

verification code:

from collections import Counter

datasets = await SeedDatasetProvider.fetch_datasets_async(dataset_names=["babelscape_alert"])
babel=datasets[0]
print(f"Total seeds: {len(babel.seeds)}")

# Unique harm categories
all_categories = [cat for seed in babel.seeds for cat in seed.harm_categories]
unique_categories = set(all_categories)
print(f"\nUnique harm categories: {len(unique_categories)}")

# Distribution of seeds per harm category

category_counts = Counter(all_categories)
print("\nSeeds per harm category:")
for cat, count in category_counts.most_common():
    print(f"  {cat}: {count}")

# Distribution of how many harm categories each seed has
num_categories_per_seed = [len(seed.harm_categories) for seed in babel.seeds]
category_count_distribution = Counter(num_categories_per_seed)
print("\nDistribution of harm categories per seed:")
for num_cats, count in sorted(category_count_distribution.items()):
    print(f"  {num_cats} harm_categories: {count} seeds")

output:

Total seeds: 30968

Unique harm categories: 32

Seeds per harm category:
  crime_injury: 3555
  hate_ethnic: 2950
  crime_theft: 2843
  crime_propaganda: 2823
  hate_other: 2559
  hate_women: 1464
  substance_drug: 1362
  substance_other: 1114
  crime_cyber: 1048
  hate_religion: 1040
  weapon_other: 985
  substance_alcohol: 877
  hate_lgbtq+: 790
  sex_harassment: 700
  sex_other: 637
  crime_other: 619
  crime_tax: 574
  weapon_chemical: 561
  crime_kidnap: 554
  substance_cannabis: 547
  weapon_biological: 478
  crime_privacy: 437
  hate_body: 328
  self_harm_suicide: 311
  hate_disabled: 293
  sex_porn: 289
  weapon_radioactive: 280
  substance_tobacco: 264
  weapon_firearm: 224
  self_harm_thin: 186
  hate_poor: 180
  self_harm_other: 96

Distribution of harm categories per seed:
  1 harm_categories: 30968 seeds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FEAT Add Babelscape/ALERT Dataset

3 participants