New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated the wino_bias dataset #1930
Conversation
datasets/wino_bias/wino_bias.py
Outdated
name='dev_type1_pro', | ||
description = "winoBias dev_type1_pro_stereotype data in cornll format", | ||
data_url = _URL + "/dev_type1_pro_stereotype.v4_auto_conll" | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only have the configurations "type1_pro", "type1_anti", "type2_pro" and "type2_anti".
Then each configuration can have train/dev/test splits. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Will update that. Thanks!
Hi @JieyuZhao ! Have you had a chance to add the different configurations ? |
Hi @lhoestq Yes, I've updated the code. Now the configuration will have dev/test splits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks !
This looks perfect this way.
Now we just need to update the dataset_infos.json (it contains the metadata of the dataset) and add dummy data to be able to test this script automatically.
To update the dataset_infos.json you just need delete the current one at ./datasets/wino_biais/dataset_infos.json
, and then run this command:
datasets-cli test ./datasets/wino_biais --save_infos --all_configs --ignore_verifications
To add the dummy data there's also a tool to add them automatically.
First delete the folder at ./datasets/wino_biais/dummy
and then run
datasets-cli dummy_data ./datasets/wino_biais --auto_generate --match_text_files "*conll" --n_lines 15
Let me know if you have questions :)
Also don't forget to run make style
to format the code properly.
Thanks for the instruction! I've updated the metadata and the dummy data and also do the formatting. Please let me know if more is needed. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you !
Updated the wino_bias.py script.