Add Jigsaw unintended Bias #2935

Iwontbecreative · 2021-09-17T16:12:31Z

Hi,

Here's a first attempt at this dataset. Would be great if it could be merged relatively quickly as it is needed for Bigscience-related stuff.

This requires manual download, and I had some trouble generating dummy_data in this setting, so welcoming feedback there.

Iwontbecreative · 2021-09-17T16:37:25Z

Note that the tests seem to fail because of a bug in an Exception at the moment, see: #2936 for the fix

lhoestq

Nice :) thanks a lot for adding this dataset !

a few comments:

datasets/jigsaw_unintended_bias/README.md

lhoestq · 2021-09-21T09:45:12Z

datasets/jigsaw_unintended_bias/jigsaw_unintended_bias.py

@@ -0,0 +1,156 @@
+# coding=utf-8
+# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.


Suggested change

# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.

# Copyright 2021 The HuggingFace Datasets Authors and the current dataset script contributor.

I had replaced this went back seeing as templates/new_dataset_script,py includes 2020. Might be worth updating this on your end.

indeed ! thanks

datasets/jigsaw_unintended_bias/jigsaw_unintended_bias.py

Iwontbecreative · 2021-09-22T00:12:28Z

@lhoestq implemented your changes, I think this might be ready for another look.

lhoestq

Awesome thank you !

I added my final comments (just some nitpicking)

lhoestq · 2021-09-23T15:07:30Z

datasets/jigsaw_unintended_bias/jigsaw_unintended_bias.py

+        # It is in charge of opening the given file and yielding (key, example) tuples from the dataset
+        # The key is not important, it's more here for legacy reason (legacy from tfds)
+
+        data = pd.read_csv(path)


This will load all the data in memory (1GB), could it be possible to iterate on the csv file line by line maybe ?

Done, chunks of 50k now which should be small enough.

datasets/jigsaw_unintended_bias/README.md

Iwontbecreative · 2021-09-23T15:50:57Z

Thanks @lhoestq, implemented the changes, let me know if anything else pops up.

lhoestq

Nice thank you ! LGTM :)

Iwontbecreative added 5 commits September 17, 2021 11:46

Add working processing script

b6d9767

Add dummy data attempt

883dbc7

Missing updates

d42adbc

Style nit

0e73672

Add tags

e47d249

Iwontbecreative mentioned this pull request Sep 21, 2021

Add 10 orig task templates for Jigsaw Unintended Bias bigscience-workshop/promptsource#451

Merged

lhoestq reviewed Sep 21, 2021

View reviewed changes

Iwontbecreative added 3 commits September 21, 2021 16:47

Merge remote-tracking branch 'upstream/master' into jigsaw_unintended

e741382

Update README

7cc9598

Add required citation info

60cd207

lhoestq reviewed Sep 23, 2021

View reviewed changes

Avoid loading entire dataset in memory at once

05c5ddc

lhoestq approved these changes Sep 24, 2021

View reviewed changes

lhoestq merged commit d3649fc into huggingface:master Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Jigsaw unintended Bias #2935

Add Jigsaw unintended Bias #2935

Iwontbecreative commented Sep 17, 2021

Iwontbecreative commented Sep 17, 2021

lhoestq left a comment

lhoestq Sep 21, 2021

Iwontbecreative Sep 21, 2021

lhoestq Sep 23, 2021

Iwontbecreative commented Sep 22, 2021

lhoestq left a comment

lhoestq Sep 23, 2021

Iwontbecreative Sep 23, 2021

Iwontbecreative commented Sep 23, 2021

lhoestq left a comment •

edited

		@@ -0,0 +1,156 @@
		# coding=utf-8
		# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.

	# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
	# Copyright 2021 The HuggingFace Datasets Authors and the current dataset script contributor.

Add Jigsaw unintended Bias #2935

Add Jigsaw unintended Bias #2935

Conversation

Iwontbecreative commented Sep 17, 2021

Iwontbecreative commented Sep 17, 2021

lhoestq left a comment

Choose a reason for hiding this comment

lhoestq Sep 21, 2021

Choose a reason for hiding this comment

Iwontbecreative Sep 21, 2021

Choose a reason for hiding this comment

lhoestq Sep 23, 2021

Choose a reason for hiding this comment

Iwontbecreative commented Sep 22, 2021

lhoestq left a comment

Choose a reason for hiding this comment

lhoestq Sep 23, 2021

Choose a reason for hiding this comment

Iwontbecreative Sep 23, 2021

Choose a reason for hiding this comment

Iwontbecreative commented Sep 23, 2021

lhoestq left a comment • edited

Choose a reason for hiding this comment

lhoestq left a comment •

edited