<a href="https://colab.research.google.com/github/gyasifred/TW-FR-MT/blob/main/tutorials/Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Twi-French Machine Translation

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive



In this tutorial, we will illustrate how to

1. fine-tune the pre-trained [Helsinki-NLP/opus-mt-tw-fr](https://huggingface.co/Helsinki-NLP/opus-mt-tw-fr) from the [OPUS-MT](https://opus.nlpl.eu/Opus-MT/) repository for machine translation of Twi (the local Ghanaian language) to French.

2. perform translations with the fine-tuned OPUS-MT model.
3. estimate the machine translation quality with the [BLEU score](https://aclanthology.org/P02-1040.pdf), AzunreBLEU score, the [TER score](https://aclanthology.org/2006.amta-papers.25/) and the [SacreBLEU score](https://aclanthology.org/W18-6319.pdf).

*** The AzunreBLEU score is a variant of the BLEU score that was used for evaluating machine translation quality in the paper [English-Twi Parallel Corpus for Machine Translation](https://arxiv.org/pdf/2103.15625.pdf) indicating that the focus is on "adequacy" instead of "fluency" in the translations.


## [Optional] Download already-fine-tuned Twi models.


Fine-tuned Twi OPUS-MT models can be downloaded from [Google Drive](https://drive.google.com/drive/folders/13irIvPsqnryP_NJ5y6PneFKrQDJs5Qm6?usp=sharing).


This tutorial uses GPU; to use GPU, please go to EDIT on the menu bar, notebook settings, and choose GPU for the hardware accelerator, then save.

## Clone TW-FR-MT Github Repository



In [2]:
!git clone https://github.com/gyasifred/TW-FR-MT.git

Cloning into 'TW-FR-MT'...
remote: Enumerating objects: 229, done.[K
remote: Counting objects: 100% (229/229), done.[K
remote: Compressing objects: 100% (166/166), done.[K
remote: Total 229 (delta 114), reused 145 (delta 58), pack-reused 0[K
Receiving objects: 100% (229/229), 1.10 MiB | 1.00 MiB/s, done.
Resolving deltas: 100% (114/114), done.


In [3]:
%cd TW-FR-MT/

/content/TW-FR-MT


In [4]:
!ls

__init__.py  MT_systems  requirements.txt  TW_FR_EN_corpus
LICENSE      README.md	 tutorials


## Install Dependencies


In [5]:
!pip install -r requirements.txt

Collecting scikit-learn==1.0.2 (from -r requirements.txt (line 1))
  Downloading scikit_learn-1.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.5/26.5 MB[0m [31m51.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers (from -r requirements.txt (line 2))
  Downloading transformers-4.30.2-py3-none-any.whl (7.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m80.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentencepiece (from -r requirements.txt (line 3))
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m87.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate==0.3.0 (from -r requirements.txt (line 4))
  Downloading evaluate-0.3.0-py3-none-any.whl (72 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Training, Validation, and Test sets




Get the help message for how to use any script by passing the -h flag.

In [None]:
#!python /content/TW-FR-MT/TW_FR_EN_corpus/scripts/train_test_split.py -h

In [23]:
!python /content/TW-FR-MT/TW_FR_EN_corpus/scripts/train_test_split.py \
/content/TW-FR-MT/TW_FR_EN_corpus/data/total/total_tw.txt \
/content/TW-FR-MT/TW_FR_EN_corpus/data/total/total_en.txt \
/content/TW-FR-MT/TW_FR_EN_corpus/data/total/total_fr.txt \
--id_tw tw \
--id_en en \
--id_fr fr \
--train_ouput_path /content/TW-FR-MT/TW_FR_EN_corpus/data/training \
--test_ouput_path /content/TW-FR-MT/TW_FR_EN_corpus/data/test \
--val_ouput_path  /content/TW-FR-MT/TW_FR_EN_corpus/data/validation \
--val_set True

SUCCESS


## Fine-tuning the OPUS-MT Model


In [None]:
#!python /content/TW-FR-MT/MT_systems/opus/fine_tune_opus.py -h

In [None]:
!python /content/TW-FR-MT/MT_systems/opus/fine_tune_opus.py \
Helsinki-NLP/opus-mt-tw-fr \
/content/TW-FR-MT/TW_FR_EN_corpus/data/training/train_tw.txt \
/content/TW-FR-MT/TW_FR_EN_corpus/data/training/train_fr.txt \
/content/TW-FR-MT/TW_FR_EN_corpus/data/validation/val_tw.txt \
/content/TW-FR-MT/TW_FR_EN_corpus/data/validation/val_fr.txt \
tw \
fr \
OPUS-mt-tw-fr-tuned-1 \
--max_length 128 \
--batch_size 8 \
--epoch 24 \
--warmup_steps 10 \
--savedir /content/drive/MyDrive/MT

MODEL PATH: /content/drive/MyDrive/MT
FINED-TURNED MODEL NAME: OPUS-mt-tw-fr-tuned-1
Downloading builder script: 100% 8.15k/8.15k [00:00<00:00, 8.12MB/s]
Downloading (…)okenizer_config.json: 100% 42.0/42.0 [00:00<00:00, 269kB/s]
Downloading (…)lve/main/config.json: 100% 1.38k/1.38k [00:00<00:00, 7.80MB/s]
Downloading (…)olve/main/source.spm: 100% 788k/788k [00:00<00:00, 12.9MB/s]
Downloading (…)olve/main/target.spm: 100% 845k/845k [00:00<00:00, 13.5MB/s]
Downloading (…)olve/main/vocab.json: 100% 1.44M/1.44M [00:00<00:00, 12.6MB/s]
Downloading pytorch_model.bin: 100% 302M/302M [00:05<00:00, 52.6MB/s]
Downloading (…)neration_config.json: 100% 293/293 [00:00<00:00, 1.93MB/s]
  4% 1071/25704 [01:18<26:42, 15.37it/s]
  0% 0/134 [00:00<?, ?it/s][A
  1% 1/134 [00:00<00:40,  3.31it/s][A
  1% 2/134 [00:00<00:38,  3.39it/s][A
  2% 3/134 [00:00<00:34,  3.81it/s][A
  3% 4/134 [00:01<00:35,  3.61it/s][A
  4% 5/134 [00:01<00:33,  3.84it/s][A
  4% 6/134 [00:01<00:32,  3.95it/s][A
  5% 7/134 [0


## Translate with the Fine-tuned OPUS-MT Model



In [7]:
!python /content/TW-FR-MT/MT_systems/opus/opus_direct_translate.py \
/content/drive/MyDrive/MT/OPUS-mt-tw-fr-tuned-1 \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_tw.txt \
--output_name tw-fr-opus-translate



## Evaluate the Translation Quality of the OPUS-MT Model

### BLEU Score

In [8]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
tw-fr-opus-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt

Downloading builder script: 100% 9.99k/9.99k [00:00<00:00, 24.2MB/s]
('2-GRAMS: 0.55', '3-GRAMS: 0.44', '4-GRAMS: 0.36')


### AzunreBLEU Score

In [9]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
tw-fr-opus-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt \
--azunre True

BLEU SCORE: 0.76


### SacreBLEU Score

In [10]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_sacrebleu.py \
tw-fr-opus-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt

SacreBLEU SCORE: 36.55


TER Score

In [11]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/ter.py \
tw-fr-opus-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt

TER SCORE: 61.05


## Pre-trained OPUST-MT model

In [12]:
!python /content/TW-FR-MT/MT_systems/opus/opus_direct_translate.py \
Helsinki-NLP/opus-mt-tw-fr \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_tw.txt \
--output_name tw-fr-opus-translate-pretrained

Downloading (…)olve/main/source.spm: 100% 788k/788k [00:00<00:00, 941kB/s]
Downloading (…)olve/main/target.spm: 100% 845k/845k [00:00<00:00, 1.00MB/s]
Downloading (…)olve/main/vocab.json: 100% 1.44M/1.44M [00:00<00:00, 6.72MB/s]
Downloading (…)okenizer_config.json: 100% 42.0/42.0 [00:00<00:00, 261kB/s]
Downloading (…)lve/main/config.json: 100% 1.38k/1.38k [00:00<00:00, 7.73MB/s]
Downloading pytorch_model.bin: 100% 302M/302M [00:00<00:00, 311MB/s]
Downloading (…)neration_config.json: 100% 293/293 [00:00<00:00, 1.72MB/s]


In [24]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
tw-fr-opus-translate-pretrained \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt

('2-GRAMS: 0.39', '3-GRAMS: 0.29', '4-GRAMS: 0.22')


In [25]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
tw-fr-opus-translate-pretrained \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt \
--azunre True

BLEU SCORE: 0.62


In [26]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_sacrebleu.py \
tw-fr-opus-translate-pretrained \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt

SacreBLEU SCORE: 20.91


TER score

In [31]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/ter.py \
tw-fr-opus-translate-pretrained \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt

TER SCORE: 91.51


## Google Translate API



### Translate with the Google Translate API


In [16]:
! python /content/TW-FR-MT/MT_systems/Google_MT/googleAPIdirect_translate.py \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_tw.txt \
ak\
fr\
--output_name twi-fr-google-translate


### Evaluate the Translation Quality of the Google API Translation


#### BLEU Score

In [27]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
twi-fr-google-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt

('2-GRAMS: 0.60', '3-GRAMS: 0.49', '4-GRAMS: 0.41')


#### AzunreBLEU Score

In [28]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
twi-fr-google-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt \
--azunre True

BLEU SCORE: 0.79


#### SacreBLEU Score

In [29]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_sacrebleu.py \
twi-fr-google-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt

SacreBLEU SCORE: 44.40


TER Score

In [30]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/ter.py \
twi-fr-google-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt

TER SCORE: 55.59


## OPUS Pivot Translation


To perform pivot translation, you will need to fine-tune additional models.<br>
You can alternatively download fine-tuned models from our project from Google [Drive](https://drive.google.com/drive/folders/13irIvPsqnryP_NJ5y6PneFKrQDJs5Qm6?usp=sharing).


$<source>$ -> $<English>$ ->$<Target>$


We will demonstrate pivot Twi -> English -> French translation.<br>
In this tutorial, we use our fine-tuned models. To use pre-trained OPUS-MT models, pass the name of the OPUS-MT model as shown below.<br>

For French -> English -> Twi using Pre-trained OPUS-MT models
```
!python /content/TW-FR-MT/MT_systems/opus/opus_pivot_translate.py \
Helsinki-NLP/opus-mt-fr-en \
Helsinki-NLP/opus-mt-en-tw \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt \
--to_console True

```




In [21]:
!python /content/TW-FR-MT/MT_systems/opus/opus_pivot_translate.py \
/content/drive/MyDrive/MT/OPUS-mt-tw-en-tuned \
/content/drive/MyDrive/MT/OPUS-mt-en-fr-tuned \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_tw.txt \
--to_console True

Source: Wayɛ krado.
Pivot: He is ready.
Target: Il est prêt.

Source: Ɔnsua ade pii.
Pivot: He wouldn't study enough.
Target: Il n'étudierait pas assez.

Source: Smith buae sɛ wanu ne ho.
Pivot: Asamoah answered he was sorry.
Target: Asamoah a répondu qu'il était désolé.

Source: Fa safoa no ma wo nua.
Pivot: Give the key to your brother.
Target: Donne la clé à ton frère.

Source: Ɛyɛ asɛm a ɛhaw yɛn nyinaa.
Pivot: That's all what's worried.
Target: C'est tout ce qui est inquiet.

Source: Wamma m'asɛmmisa no ho mmuae.
Pivot: He didn't answer my question.
Target: Il n'a pas répondu ma question.

Source: Seesei minni sika.
Pivot: I have no money now.
Target: Je n'ai plus d'argent maintenant.

Source: Ná ɛyɛ awerɛhosɛm.
Pivot: It was sad nervous.
Target: C'était triste et nerveux.

Source: Me nuabarima no wɔ sika pii a obetumi de atɔ kar.
Pivot: My brother has more money to buy a car.
Target: Mon frère a plus d'argent pour acheter une voiture.

Source: Asiane pii wɔ hɔ.
Pivot: There are a