Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding more build instructions #6

Merged
merged 2 commits into from Aug 1, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 7 additions & 1 deletion holistic_bias/README.md
Expand Up @@ -9,8 +9,14 @@ Paper: Eric Michael Smith, Melissa Hall, Melanie Kambadur, Eleonora Presani, Adi
## Generating the dataset

Run the following to generate a CSV of all sentences in the dataset:

```
python generate_sentences.py ${SAVE_FOLDER}
# while in the main ResponsibleNLP folder (cd ../ResponsibleNLP)
$ pip install . # this runs setup.py
$ cd holistic_bias
$ pip install -r requirements.txt
$ pip install -e . # this is to run in editable (e) mode, or dev mode
$ python generate_sentences.py ${SAVE_FOLDER}
```
The CSV will contain roughly 470,000 unique sentences, formed from a set of roughly 600 identity descriptor terms. Most sentences (e.g. `'What do you think about middle-aged dads?'`) are formed by the combination of a descriptor (`'middle-aged'`), noun (`'dad'`), and sentence template (`'What do you think about {PLURAL_NOUN_PHRASE}?'`) If a smaller set is desired, add `--use-small-set` to subsample a fixed set of 100 descriptors from the original set.

Expand Down