<a href="https://colab.research.google.com/github/gyasifred/TW-FR-MT/blob/main/tutorials/Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Twi-French Machine Translation

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive



In this tutorial, we will illustrate how to
 
1. fine-tune the pre-trained [Helsinki-NLP/opus-mt-tw-fr](https://huggingface.co/Helsinki-NLP/opus-mt-tw-fr) from the [OPUS-MT](https://opus.nlpl.eu/Opus-MT/) repository for machine translation of Twi (the local Ghanaian language) to French.
 
2. perform translations with the fine-tuned OPUS-MT model.
3. estimate the machine translation quality with the [BLEU score](https://aclanthology.org/P02-1040.pdf), AzunreBLEU score, the [TER score](https://aclanthology.org/2006.amta-papers.25/) and the [SacreBLEU score](https://aclanthology.org/W18-6319.pdf).

*** The AzunreBLEU score is a variant of the BLEU score that was used for evaluating machine translation quality in the paper [English-Twi Parallel Corpus for Machine Translation](https://arxiv.org/pdf/2103.15625.pdf) indicating that the focus is on "adequacy" instead of "fluency" in the translations.


## [Optional] Download already-fine-tuned Twi models.


Fine-tuned Twi OPUS-MT models can be downloaded from [Google Drive](https://drive.google.com/drive/folders/13irIvPsqnryP_NJ5y6PneFKrQDJs5Qm6?usp=sharing). 


This tutorial uses GPU; to use GPU, please go to EDIT on the menu bar, notebook settings, and choose GPU for the hardware accelerator, then save.

## Clone TW-FR-MT Github Repository



In [None]:
!git clone https://github.com/gyasifred/TW-FR-MT.git

Cloning into 'TW-FR-MT'...
remote: Enumerating objects: 156, done.[K
remote: Counting objects: 100% (156/156), done.[K
remote: Compressing objects: 100% (111/111), done.[K
remote: Total 156 (delta 71), reused 112 (delta 42), pack-reused 0[K
Receiving objects: 100% (156/156), 1.05 MiB | 7.66 MiB/s, done.
Resolving deltas: 100% (71/71), done.


In [None]:
%cd TW-FR-MT/

/content/TW-FR-MT


In [None]:
!ls

__init__.py  MT_systems  requirements.txt  TW_FR_EN_corpus
LICENSE      README.md	 tutorials


## Install Dependencies


In [None]:
!pip install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting scikit-learn==1.0.2
  Downloading scikit_learn-1.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.5/26.5 MB[0m [31m49.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers
  Downloading transformers-4.28.1-py3-none-any.whl (7.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m91.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentencepiece
  Downloading sentencepiece-0.1.98-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m66.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate==0.3.0
  Downloading evaluate-0.3.0-py3-none-any.whl (72 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m9.3 MB/s[0m

## Training, Validation, and Test sets




Get the help message for how to use any script by passing the -h flag.

In [None]:
#!python /content/TW-FR-MT/TW_FR_EN_corpus/scripts/train_test_split.py -h

In [None]:
!python /content/TW-FR-MT/TW_FR_EN_corpus/scripts/train_test_split.py \
/content/TW-FR-MT/TW_FR_EN_corpus/data/total/total_tw.txt \
/content/TW-FR-MT/TW_FR_EN_corpus/data/total/total_en.txt \
/content/TW-FR-MT/TW_FR_EN_corpus/data/total/total_fr.txt \
--id_tw tw \
--id_en en \
--id_fr fr \
--train_ouput_path /content/TW-FR-MT/TW_FR_EN_corpus/data/training \
--test_ouput_path /content/TW-FR-MT/TW_FR_EN_corpus/data/test \
--val_ouput_path  /content/TW-FR-MT/TW_FR_EN_corpus/data/validation \
--val_set True

SUCCESS


## Fine-tuning the OPUS-MT Model


In [None]:
#!python /content/TW-FR-MT/MT_systems/opus/fine_tune_opus.py -h

In [1]:
!python /content/TW-FR-MT/MT_systems/opus/fine_tune_opus.py \
Helsinki-NLP/opus-mt-tw-fr \
/content/TW-FR-MT/TW_FR_EN_corpus/data/training/train_tw.txt \ 
/content/TW-FR-MT/TW_FR_EN_corpus/data/training/train_fr.txt \
 /content/TW-FR-MT/TW_FR_EN_corpus/data/validation/val_tw.txt \
 /content/TW-FR-MT/TW_FR_EN_corpus/data/validation/val_fr.txt \
tw \
fr \
OPUS-mt-tw-fr-tuned-1\
--max_length 128 \
--batch_size 8 \
--epoch 24 \
--warmup_steps 10 \
--savedir /content/drive/MyDrive/MT

SyntaxError: invalid syntax (3576149627.py, line 2)


## Translate with the Fine-tuned OPUS-MT Model



In [None]:
!python /content/TW-FR-MT/MT_systems/opus/opus_direct_translate.py \
/content/drive/MyDrive/MT/OPUS-mt-tw-fr-tuned-1 \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_tw.txt \
--output_name tw-fr-opus-translate

2023-04-02 01:24:58.494485: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Evaluate the Translation Quality of the OPUS-MT Model

### BLEU Score

In [None]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
tw-fr-opus-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt

2023-04-02 01:43:27.365605: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
('2-GRAMS: 0.54', '3-GRAMS: 0.42', '4-GRAMS: 0.35')


### AzunreBLEU Score

In [None]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
tw-fr-opus-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt \
--azunre True

2023-04-02 01:49:00.486137: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
BLEU SCORE: 0.75


### SacreBLEU Score

In [None]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_sacrebleu.py \
tw-fr-opus-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt 

2023-04-02 01:48:13.574975: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
SacreBLEU SCORE: 35.93


TER Score

In [None]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/ter.py \
tw-fr-opus-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt 

2023-04-02 01:47:48.756483: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TER SCORE: 67.52


## Pre-trained OPUST-MT model

In [None]:
!python /content/TW-FR-MT/MT_systems/opus/opus_direct_translate.py \
Helsinki-NLP/opus-mt-tw-fr \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_tw.txt \
--output_name tw-fr-opus-translate-pretrained

2023-04-02 01:49:20.045890: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [None]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
tw-fr-opus-translate-pretrained \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt

2023-04-02 02:10:13.626383: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
('2-GRAMS: 0.39', '3-GRAMS: 0.29', '4-GRAMS: 0.22')


In [None]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
tw-fr-opus-translate-pretrained \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt \
--azunre True

2023-04-02 02:08:48.462655: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
BLEU SCORE: 0.62


In [None]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_sacrebleu.py \
tw-fr-opus-translate-pretrained \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt 

2023-04-02 02:09:36.231139: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
SacreBLEU SCORE: 20.79


TER score

In [None]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/ter.py \
tw-fr-opus-translate-pretrained \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt

2023-04-02 02:07:46.432310: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TER SCORE: 91.43


## Google Translate API



### Translate with the Google Translate API


In [None]:
! python /content/TW-FR-MT/MT_systems/Google_MT/googleAPIdirect_translate.py \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_tw.txt \
ak\
fr\
--output_name twi-fr-google-translate


### Evaluate the Translation Quality of the Google API Translation


#### BLEU Score

In [None]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
twi-fr-google-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt 

2023-04-02 01:18:32.083340: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Downloading builder script: 100% 9.99k/9.99k [00:00<00:00, 7.04MB/s]
('2-GRAMS: 0.60', '3-GRAMS: 0.49', '4-GRAMS: 0.41')


#### AzunreBLEU Score

In [None]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
twi-fr-google-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt \
--azunre True

2023-04-02 01:18:43.691607: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
BLEU SCORE: 0.79


#### SacreBLEU Score

In [None]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_sacrebleu.py \
twi-fr-google-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt 

2023-04-02 01:18:53.536807: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
SacreBLEU SCORE: 44.39


TER Score

In [None]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/ter.py \
twi-fr-google-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt 

2023-04-02 01:23:31.978910: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TER SCORE: 56.03


## OPUS Pivot Translation


To perform pivot translation, you will need to fine-tune additional models.<br>
You can alternatively download fine-tuned models from our project from Google [Drive](https://drive.google.com/drive/folders/13irIvPsqnryP_NJ5y6PneFKrQDJs5Qm6?usp=sharing).


$<source>$ -> $<English>$ ->$<Target>$


We will demonstrate pivot Twi -> English -> French translation.<br>
In this tutorial, we use our fine-tuned models. To use pre-trained OPUS-MT models, pass the name of the OPUS-MT model as shown below.<br>

For French -> English -> Twi using Pre-trained OPUS-MT models
```
!python /content/TW-FR-MT/MT_systems/opus/opus_pivot_translate.py \
Helsinki-NLP/opus-mt-fr-en \ 
Helsinki-NLP/opus-mt-en-tw \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt \
--to_console True

```
 



In [None]:
!python /content/TW-FR-MT/MT_systems/opus/opus_pivot_translate.py \
/content/drive/MyDrive/MT/OPUS-mt-tw-en-tuned \
/content/drive/MyDrive/MT/OPUS-mt-en-fr-tuned \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_tw.txt \
--to_console True

Source: Wayɛ krado.
Pivot: He is ready.
Target: Il est prêt.

Source: Ɔnsua ade pii.
Pivot: He wouldn't study enough.
Target: Il n'étudierait pas assez.

Source: Smith buae sɛ wanu ne ho.
Pivot: Asamoah answered he was sorry.
Target: Asamoah a répondu qu'il était désolé.

Source: Fa safoa no ma wo nua.
Pivot: Give the key to your brother.
Target: Donne la clé à ton frère.

Source: Ɛyɛ asɛm a ɛhaw yɛn nyinaa.
Pivot: That's all what's worried.
Target: C'est tout ce qui est inquiet.

Source: Wamma m'asɛmmisa no ho mmuae.
Pivot: He didn't answer my question.
Target: Il n'a pas répondu ma question.

Source: Seesei minni sika.
Pivot: I have no money now.
Target: Je n'ai plus d'argent maintenant.

Source: Ná ɛyɛ awerɛhosɛm.
Pivot: It was sad nervous.
Target: C'était triste et nerveux.

Source: Me nuabarima no wɔ sika pii a obetumi de atɔ kar.
Pivot: My brother has more money to buy a car.
Target: Mon frère a plus d'argent pour acheter une voiture.

Source: Asiane pii wɔ hɔ.
Pivot: There are a