Skip to content

pmichel31415/11731-assignment-2-baseline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Baseline for Assignment 2 of 11-731 "Machine Translation and Sequence-to-sequence Models" Fall 2019

This repo contains a simplified implementation of the Transformer (Vaswani et al, 2019) to serve as a baseline.

Requirements

This code was written for python >=3.6 and pytorch 1.1, although higher versions of pytorch might also work. You will need a few additional packages. Here's how you can set up the environment (assuming you have python >=3.6):

# [OPTIONAL ]Setup virtual environment
pip install virtualenv
virtualenv env
source env/bin/activate
# Install pytorch 1.1.0 (you can also try the latest version although the code hasn't been tested with it)
pip install torch==1.1.0
# Install TQDM (for fancy progress bars)
pip install tqdm
# Install sentencepiece (this is used for subword segmentation)
pip install sentencepiece
# Install sacrebleu (this is used to evaluate BLEU score on detokenized text)
pip install sacrebleu

Preparing the Data

The data has been compiled from 3 language pairs from the Ukuxhumana dataset. You are tasked with translating from English to 3 low resource South African languages: Afrikaans (af), Xitsonga (ts) and Northern Sotho (nso). Download and extract the data in the root of this repo:

wget http://www.phontron.com/data/cs11731-2019-assignment2.zip
unzip cs11731-2019-assignment2.zip

For my baseline I learned a joint BPE model of vocabulary size 8000 for each language pair. The code for all things related to subwords is in baseline/subwords.py, take a look to understand how it works. You can reproduce my preprocessing by running bash scripts/prepare_bpe.sh.

Training the Baseline

You can train a baseline system for each language pair by running the commands in scripts/train_{af,ts,nso}.sh. Depending on the language pair the models should take between 1 to 3 hours to train, although of course this may vary depending on your GPU.

Results

As shown in the training scripts, you can decode from a trained model by running python baseline/translate.py with the appropriate arguments. BLEU score is evaluated by calling

cat [out file] | sacrebleu -w 2 [ref file]

Note that both output and reference file should be detokenized. Sacrebleu will perform its own tokenization. This is to make sure that the BLEU scores are comparable irrespective of the tokenization you use in your model. The -w 2 means that we want BLEU score with 2 decimals, although keep in mind that BLEU differences smaller than 0.5 are unlikely to be statistically significant (on a scale of 100).

For the simple baselines provided here, the scores are as follows:

en-af en-ts en-nso
Dev 31.39 32.10 18.27
Test 31.38 33.27 21.20

Variance in Scores

Since there is some variance in the final BLEU score depending on the random seed/environment, we also provide the average BLEU over 5 random reruns for each language pair (standard deviation in parentheses):

en-af en-ts en-nso
Dev 31.12 (±1.35) 32.86 (±1.28) 17.68 (±0.47)
Test 30.80 (±0.76) 32.63 (±1.31) 21.14 (±0.44)

This code is adapted from a tutorial I wrote for the JSALT 2019 summer school: https://github.com/pmichel31415/jsalt-2019-mt-tutorial

About

Baseline for assignment 2 of 11-731 "Machine Translation and Sequence-to-sequence Models" Fall 2019

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published