Debiasing-Multilingual-Word-Embeddings-A-Case-Study-of-Three-Indian-Languages

Here we have the code and data for the following paper: Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages by Srijan Bansal, Vishal Garimella, Ayush Suhane, Animesh Mukherjee. Proceedings of ACM HyperText 2021

Setup Enviroment

Create a conda environment with python 3.7.11
Install requirments using: pip install -r requirements.txt inside the conda env

Download data

run bash download_embedding.sh The embeddings will be saved in data/embedding/ folder Download options:

All: All the datasets, embeddings will be generated
Essential: Fasttext embeddings, aligned text files, MKB datasets will be downloaded. Rest of the embeddings can be generated using scripts
MKB: contains language pairs
MKB_pickle: contains MKB dataset
aligned_files: contains aligned files
Bilingual: bilingual alinged embeddings (can be generated by generate_cross_lingual_embeddings.py)
EQR: debiased embeddings (can be generated by running debias_run.py)
LID: debiased embeddings (can be generated by running debias_run.py)

Scripts

debias_run_runner.py: wrapper code for debias_run.py.
- Run using python debias_run_runner.py
debias_run.py: Generates debiased embeddings
- Run using python debias_run.py lang1 lang2 pre num_lang_pairs
- ex: python debias_run.py en hi LID 10
generate_cross_lingual_embeddings_runner.py: wrapper code for generate_cross_lingual_embeddings.py
- Run using python generate_cross_lingual_embeddings.py
generate_cross_lingual_embeddings.py: Aligns embedding from language lang1 to lang2
- Run using python generate_cross_lingual_embeddings.py lang1 lang2
- ex: python generate_cross_lingual_embeddings.py bn hi
MKB/alignment_runner.py: wrapper code for alignment.py
- Run using python MKB/alignment.py
MKB/alignment.py: Evaluates debiased embeddings
- Run using python alignment.py pre1 pre2 lang1 lang2
- ex: python alignment.py fin fin bn en
MKB/bilingual_dict.py:
- Generate pkl files. It might throw error due to problem with googletranslator api. Instead pkl files can be downloaded using MKB_pickle from download_embedding.sh

How to run

Download datasets and embeddings using: download_embedding.sh with essential option
Run generate_cross_lingual_embeddings_runner.py to generate all bilingual embeddings.
- Input: MKD dataset, aligned text files from data/MKB and data/aligned_files
- Output: Aligned embeddings will be saved to data/embedding/Bilingual/
Run debias_run_runner.py to generate all debiased embeddings.
- Input: Bilingual embeddings from data/embedding/Bilingual/
- Output: Debiased embeddings will be saved to data/embedding/LID/, data/embedding/EQR/ etc
Run MKB/alignment_runner.py to evaluate the debiased embeddings. Results will be saved to MKB/Alignment_runner.csv
- Input: Debiased embeddings from data/embedding/LID/, data/embedding/EQR/ etc
- Output: Debiased embeddings score will be saved to Alignment_Runner.csv

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
MKB		MKB
cross_emb		cross_emb
data		data
.gitignore		.gitignore
README.md		README.md
debias_run.py		debias_run.py
debias_run_runner.py		debias_run_runner.py
download_embedding.sh		download_embedding.sh
extrinsic_bias.py		extrinsic_bias.py
extrinsic_bias_runner.py		extrinsic_bias_runner.py
gender_specific_seed.json		gender_specific_seed.json
generate_cross_lingual_embeddings.py		generate_cross_lingual_embeddings.py
generate_cross_lingual_embeddings_runner.py		generate_cross_lingual_embeddings_runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MKB

MKB

cross_emb

cross_emb

data

data

.gitignore

.gitignore

README.md

README.md

debias_run.py

debias_run.py

debias_run_runner.py

debias_run_runner.py

download_embedding.sh

download_embedding.sh

extrinsic_bias.py

extrinsic_bias.py

extrinsic_bias_runner.py

extrinsic_bias_runner.py

gender_specific_seed.json

gender_specific_seed.json

generate_cross_lingual_embeddings.py

generate_cross_lingual_embeddings.py

generate_cross_lingual_embeddings_runner.py

generate_cross_lingual_embeddings_runner.py

Repository files navigation

Debiasing-Multilingual-Word-Embeddings-A-Case-Study-of-Three-Indian-Languages

Setup Enviroment

Download data

Scripts

How to run

About

Releases

Packages

Languages

srijan-bansal/Debiasing-Multilingual-Word-Embeddings-A-Case-Study-of-Three-Indian-Languages

Folders and files

Latest commit

History

Repository files navigation

Debiasing-Multilingual-Word-Embeddings-A-Case-Study-of-Three-Indian-Languages

Setup Enviroment

Download data

Scripts

How to run

About

Resources

Stars

Watchers

Forks

Languages