Zari model checkpoints
This directory contains the Zari checkpoints from our work, Measuring and Reducing Gendered Correlations in Pre-trained NLP Models, presented as a blog post and written up as a paper. Zari checkpoints are derived from BERT and ALBERT model checkpoints, trained to reduce gendered correlations being learned in pre-training. To do this, we use two techniques:
- Dropout models were initialized from the relevant publicly-available checkpoint and pre-training continued over Wikipedia, with increased dropout rate.
- CDA models were pre-trained from scratch over Wikipedia. Word substitutions for data augmentation are determined using the word lists provided at corefBias (Zhao et al. (2018)).
Four pre-trained models are provided in this release:
- bert-dropout
Trained from BERT Large Uncased,
with
attention_probs_dropout_prob=0.15andhidden_dropout_prob=0.20. - bert-cda Trained with BERT Large Uncased config, with an augmented dataset, for 1M steps.
- albert-dropout
Trained from Albert Large,
with
attention_probs_dropout_prob=0.05andhidden_dropout_prob=0.05. - albert-cda Trained with Albert Large config, with an augmented dataset, for 125k steps.
If you use these models in your work, kindly cite:
@misc{zari,
title={Measuring and Reducing Gendered Correlations in Pre-trained Models},
author={Kellie Webster and Xuezhi Wang and Ian Tenney and Alex Beutel and Emily Pitler and Ellie Pavlick and Jilin Chen and Slav Petrov},
year={2020},
eprint={2010.06032},
archivePrefix={arXiv},
primaryClass={cs.CL}
}