[Names Bias] Code for creating sets of names #4836

EricMichaelSmith · 2022-10-21T19:50:16Z

Release the code used to process and produce the two sets of names used in https://arxiv.org/pdf/2109.03300.pdf , given the lists of names in the original papers that we used to create these sets.

davides

LGTM overall, just a few minor things to clarify.

davides · 2022-10-21T22:09:28Z

projects/dialogue_bias/util.py

+import pandas as pd
+
+
+RACES_ETHNICITIES = ['hispanic', 'white', 'black', 'api', 'aian', '2prace']


Are the spellings of the last two items here intentional?

Yeah, that's how Tzioumis spelled them - adding a comment for this

davides · 2022-10-21T22:15:36Z

projects/dialogue_bias/util.py

+    names_to_new_lists = {}
+    for name_list, names in orig_name_lists.items():
+        for name in names:
+            proc_name = name.replace('-', '')


It seems like this will concatenate hyphenated names. Why is that required?

It's to match the formatting of the baby-name lists - adding a comment for this

davides · 2022-10-21T22:27:17Z

projects/dialogue_bias/util.py

+        .sort_values('obs_of_this_ethnicity', ascending=False)
+    )
+    tzioumis_plurality_names = percent_plurality_names_df.iloc[
+        :200


Not sure I follow, why 200? Can you please add a comment to clarify?

Semi-arbitrary, to prevent the lists of names from being too big - adding a comment

davides · 2022-10-21T22:28:36Z

projects/dialogue_bias/util.py

+        ]
+        if mapped_ethnicity == 'aa':
+            # Avoid the same name in two lists by removing it from this one
+            female_race_gender_name_list.remove('Yolanda')


Is there a programmatic way to detect duplicates? That way it would just work for updated versions of the source datasets (assuming the schema stays the same).

good call - haha this was obviously a hack to account for the current duplicates. Adding a TODO here

Add name processing code

58caff3

EricMichaelSmith requested review from melanierk, davides and adinawilliams October 21, 2022 19:50

facebook-github-bot added the CLA Signed label Oct 21, 2022

EricMichaelSmith added 2 commits October 21, 2022 15:53

Update util.py

b4f423c

Update util.py

a237efb

EricMichaelSmith requested a review from meganung October 21, 2022 19:58

davides reviewed Oct 21, 2022

View reviewed changes

Update util.py

6c4106c

EricMichaelSmith requested a review from davides October 24, 2022 14:47

davides approved these changes Oct 24, 2022

View reviewed changes

Merge branch 'main' into names-bias-name-set-code

0429a33

EricMichaelSmith merged commit b9317bd into main Oct 31, 2022

EricMichaelSmith deleted the names-bias-name-set-code branch October 31, 2022 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Names Bias] Code for creating sets of names #4836

[Names Bias] Code for creating sets of names #4836

EricMichaelSmith commented Oct 21, 2022

davides left a comment

davides Oct 21, 2022

EricMichaelSmith Oct 24, 2022

davides Oct 21, 2022

EricMichaelSmith Oct 24, 2022

davides Oct 21, 2022

EricMichaelSmith Oct 24, 2022

davides Oct 21, 2022

EricMichaelSmith Oct 24, 2022 •

edited

		import pandas as pd


		RACES_ETHNICITIES = ['hispanic', 'white', 'black', 'api', 'aian', '2prace']

[Names Bias] Code for creating sets of names #4836

[Names Bias] Code for creating sets of names #4836

Conversation

EricMichaelSmith commented Oct 21, 2022

davides left a comment

Choose a reason for hiding this comment

davides Oct 21, 2022

Choose a reason for hiding this comment

EricMichaelSmith Oct 24, 2022

Choose a reason for hiding this comment

davides Oct 21, 2022

Choose a reason for hiding this comment

EricMichaelSmith Oct 24, 2022

Choose a reason for hiding this comment

davides Oct 21, 2022

Choose a reason for hiding this comment

EricMichaelSmith Oct 24, 2022

Choose a reason for hiding this comment

davides Oct 21, 2022

Choose a reason for hiding this comment

EricMichaelSmith Oct 24, 2022 • edited

Choose a reason for hiding this comment

EricMichaelSmith Oct 24, 2022 •

edited