Skip to content

kanekomasahiro/context-debias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Debiasing Pre-trained Contextualised Embeddings

Masahiro Kaneko, Danushka Bollegala

Code and debiased word embeddings for the paper: "Debiasing Pre-trained Contextualised Embeddings" (In EACL 2021). If you use any part of this work, make sure you include the following citation:

@inproceedings{kaneko-bollegala-2021-context,
    title={Debiasing Pre-trained Contextualised Embeddings},
    author={Masahiro Kaneko and Danushka Bollegala},
    booktitle = {Proc. of the 16th European Chapter of the Association for Computational Linguistics (EACL)},
    year={2021}
}

Requirements

  • python==3.7.3
  • torch==1.5.0
  • nltk==3.5
  • transformers==2.8.0
  • tensorboard==2.0.2

Installation

cd transformers
pip install .

To debias your contextualised embeddings

curl -o data/news-commentary-v15.en.gz -OL https://data.statmt.org/news-commentary/v15/training-monolingual/news-commentary-v15.en.gz
gunzip data/news-commentary-v15.en.gz
cd script
./preprocess.sh [bert/roberta/albert/dbert/electra] ../data/news-commentary-v15.en
./debias.sh [bert/roberta/albert/dbert/electra] gpu_id

Our debiased conttextualised embeddings

You can directly download our all-token debiased contextualised embeddings.

License

See the LICENSE file.