PidginBaseline : Towards Supervised and Unsupervised Neural Machine Translation Baselines for Nigerian Pidgin

This repository contains the code for the paper titled - Towards Supervised and Unsupervised Neural Machine Translation Baselines for Nigerian Pidgin - and presented at the International Learning on Language Representations (ICLR) 2020 workshop on African NLP, April 2020, Addis Ababa, Ethiopia.

Link to paper - https://arxiv.org/abs/2003.12660

Running the Code:

git clone https://github.com/orevaoghene/pidgin-baseline
cd pidigin-baseline
pip install -r requirements.txt
./get_data.sh

The above commands will:

Clone the repository
Change your present working directory to the cloned repository
Install all requirements
Download and preprocess the train, test and dev sets.

Now that you have the data, you can now specify your required training configuration in the config.yaml file. For more information about the configurations, please refer to the Joeynmt configuration documentation The configuration files used in our experiments are available in the experiments folder.

If you would be training with byte pair encodings, you would need to run the learn_bpe shell script before training, as this will learn the byte pair encodings needed.

./learn_bpe.sh

Once you have specified the necessary configurations and learned byte pair encodings (if need be), you can start training by running the train_model shell script.

./train_model.sh

You will be required to specify an experiment name after you run the train_model shell script.

Unsupervised Baselines

To run the unsupervised baselines, follow the instructions in the PidginUNMT repository.

Results

Bleu Scores

English to Pidgin Translation:

Unsupervised Model (word-level) - 5.18
Supervised Model (word-level) - 17.73
Supervised Model (BPE) - 24.29

Pidgin to English Translation:

Unsupervised Model (word-level) - 7.93
Supervised Model (word-level) - 24.67
Supervised Model (BPE) - 13.00

Model Translations and Trained Model Weights

Please refer to the experiments folder to see the result translations by the different models, as well as to access the trained model weights.

Acknowledgments

Special thanks to the Masakhane group - website and github for literally catalysing this work.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
experiment_results		experiment_results
README.md		README.md
config.yaml		config.yaml
get_data.sh		get_data.sh
learn_bpe.sh		learn_bpe.sh
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train_model.sh		train_model.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PidginBaseline : Towards Supervised and Unsupervised Neural Machine Translation Baselines for Nigerian Pidgin

Running the Code:

Unsupervised Baselines

Results

Bleu Scores

Model Translations and Trained Model Weights

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

orevaahia/pidgin-baseline

Folders and files

Latest commit

History

Repository files navigation

PidginBaseline : Towards Supervised and Unsupervised Neural Machine Translation Baselines for Nigerian Pidgin

Running the Code:

Unsupervised Baselines

Results

Bleu Scores

Model Translations and Trained Model Weights

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages