Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated the wino_bias dataset #1930

Merged
merged 7 commits into from Apr 7, 2021
Merged

updated the wino_bias dataset #1930

merged 7 commits into from Apr 7, 2021

Conversation

JieyuZhao
Copy link
Contributor

Updated the wino_bias.py script.

  • updated the data_url
  • added different configurations for different data splits
  • added the coreference_cluster to the data features

name='dev_type1_pro',
description = "winoBias dev_type1_pro_stereotype data in cornll format",
data_url = _URL + "/dev_type1_pro_stereotype.v4_auto_conll"
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only have the configurations "type1_pro", "type1_anti", "type2_pro" and "type2_anti".
Then each configuration can have train/dev/test splits. What do you think ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Will update that. Thanks!

@lhoestq
Copy link
Member

lhoestq commented Mar 29, 2021

Hi @JieyuZhao ! Have you had a chance to add the different configurations ?
Thanks again for your help on this !

@JieyuZhao
Copy link
Contributor Author

Hi @JieyuZhao ! Have you had a chance to add the different configurations ?
Thanks again for your help on this !

Hi @lhoestq Yes, I've updated the code. Now the configuration will have dev/test splits.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks !
This looks perfect this way.

Now we just need to update the dataset_infos.json (it contains the metadata of the dataset) and add dummy data to be able to test this script automatically.

To update the dataset_infos.json you just need delete the current one at ./datasets/wino_biais/dataset_infos.json, and then run this command:

datasets-cli test ./datasets/wino_biais --save_infos --all_configs --ignore_verifications

To add the dummy data there's also a tool to add them automatically.
First delete the folder at ./datasets/wino_biais/dummy and then run

datasets-cli dummy_data ./datasets/wino_biais --auto_generate --match_text_files "*conll" --n_lines 15

Let me know if you have questions :)
Also don't forget to run make style to format the code properly.

@JieyuZhao
Copy link
Contributor Author

Cool thanks !
This looks perfect this way.

Now we just need to update the dataset_infos.json (it contains the metadata of the dataset) and add dummy data to be able to test this script automatically.

To update the dataset_infos.json you just need delete the current one at ./datasets/wino_biais/dataset_infos.json, and then run this command:

datasets-cli test ./datasets/wino_biais --save_infos --all_configs --ignore_verifications

To add the dummy data there's also a tool to add them automatically.
First delete the folder at ./datasets/wino_biais/dummy and then run

datasets-cli dummy_data ./datasets/wino_biais --auto_generate --match_text_files "*conll" --n_lines 15

Let me know if you have questions :)
Also don't forget to run make style to format the code properly.

Thanks for the instruction! I've updated the metadata and the dummy data and also do the formatting. Please let me know if more is needed. :)

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you !

@lhoestq lhoestq merged commit b72b9bf into huggingface:master Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants