This repository is an extension of the work presented in Evaluating Gender Bias in Machine Translation by Gabriel Stanovsky, Noah A. Smith, and Luke Zettlemoyer (ACL 2019), and Gender Coreference and Bias Evaluation at WMT 2020 by Tom Kocmi, Tomasz Limisiewicz, and Gabriel Stanovsky (WMT2020).
Our project builds upon the foundational research by addressing additional biases and incorporating support for Portuguese, reflecting our commitment to enhancing fairness in machine translation across diverse languages.
- fast_align: install and point an environment variable called
FAST_ALIGN_BASE
to its root folder (the one containing thebuild
folder).
-
Create a Conda environment:
conda create -n mypython3 python=3.8 source activate mypython3 conda install anaconda
-
Clone the
mt_gender
andfast_align
repositories:git clone https://github.com/gabrielStanovsky/mt_gender.git git clone https://github.com/clab/fast_align.git conda install cmake
-
Compile
fast_align
:cd fast_align mkdir -p build cd build cmake .. make
-
Check if it was installed properly:
cd ../../ && fast_align/build/fast_align
-
Set the environment variable
FAST_ALIGN_BASE
to the root folder offast_align
:export FAST_ALIGN_BASE=/path/to/fast_align
In this updated version of the project, the following significant enhancements have been made:
- Error Correction: Numerous errors identified in the original project have been corrected to enhance the stability and accuracy of the evaluations.
- Language Support: Added comprehensive support for the Portuguese language, facilitating the assessment of gender bias in Portuguese translations, thereby broadening the applicability of the project.
- Project unbIAs: These changes were made as part of the initiative under the unbIAs project, which aims to reduce biases in artificial intelligence systems. This alignment with unbIAs underscores our commitment to promoting fairness in AI technologies.
After completing the installation steps:
-
Ensure all dependencies are installed by running:
pip install -r requirements.txt
-
Configure the necessary environment variables as described in the Installation section.
-
For the general gender accuracy number, run:
cd /content/mt_gender/src && ../scripts/evaluate_all_languages.sh ../data/aggregates/en.txt ../../winomtout &> ../../winomtout/baseline
-
For the general gender accuracy number, run:
cd /content/mt_gender/src && ../scripts/evaluate_all_languages.sh ../data/aggregates/en_pro.txt ../../winomtout &> ../../winomtout/pro
-
For the general gender accuracy number, run:
cd /content/mt_gender/src && ../scripts/evaluate_all_languages.sh ../data/aggregates/en_anti.txt ../../winomtout &> ../../winomtout/anti
For detailed step-by-step instructions, refer to the provided notebook (WinoMT_Scores_add_portuguese.ipynb), which includes specific configurations and examples.
This project uses the following license: MIT.