FEAT: Add harm_categories to Babelscape ALERT dataset (#449)#1551
Open
CheerathAniketh wants to merge 2 commits intomicrosoft:mainfrom
Open
FEAT: Add harm_categories to Babelscape ALERT dataset (#449)#1551CheerathAniketh wants to merge 2 commits intomicrosoft:mainfrom
CheerathAniketh wants to merge 2 commits intomicrosoft:mainfrom
Conversation
romanlutz
approved these changes
Mar 29, 2026
Contributor
romanlutz
left a comment
There was a problem hiding this comment.
LGTM! A maintainer needs to run integration tests for this before merging, though, and check that the harm category shows up.
Contributor
I was able to do a quick test in a notebook, load the dataset, and verify all seeds have harm_categories correctly populated: verification code: from collections import Counter
datasets = await SeedDatasetProvider.fetch_datasets_async(dataset_names=["babelscape_alert"])
babel=datasets[0]
print(f"Total seeds: {len(babel.seeds)}")
# Unique harm categories
all_categories = [cat for seed in babel.seeds for cat in seed.harm_categories]
unique_categories = set(all_categories)
print(f"\nUnique harm categories: {len(unique_categories)}")
# Distribution of seeds per harm category
category_counts = Counter(all_categories)
print("\nSeeds per harm category:")
for cat, count in category_counts.most_common():
print(f" {cat}: {count}")
# Distribution of how many harm categories each seed has
num_categories_per_seed = [len(seed.harm_categories) for seed in babel.seeds]
category_count_distribution = Counter(num_categories_per_seed)
print("\nDistribution of harm categories per seed:")
for num_cats, count in sorted(category_count_distribution.items()):
print(f" {num_cats} harm_categories: {count} seeds")output: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #449
Added
harm_categoriesfield to the Babelscape ALERT dataset fetcher,as requested by @romanlutz in the issue discussion.
Each prompt now includes its harm category from the dataset:
{"category": "crime_injury", "prompt": "..."}→harm_categories=["crime_injury"]Tests and Documentation
tests/unit/datasets/test_babelscape_alert_dataset.pywith 4 unit tests