<a href="https://colab.research.google.com/github/gyasifred/TW-FR-MT/blob/main/tutorials/Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Twi-French Machine Translation

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive



In this tutorial, we will illustrate how to
 
1. fine-tune the pre-trained [Helsinki-NLP/opus-mt-tw-fr](https://huggingface.co/Helsinki-NLP/opus-mt-tw-fr) from the [OPUS-MT](https://opus.nlpl.eu/Opus-MT/) repository for machine translation of Twi (the local Ghanaian language) to French.
 
2. perform translations with the fine-tuned OPUS-MT model.
3. estimate the machine translation quality with the [BLEU score](https://aclanthology.org/P02-1040.pdf), AzunreBLEU score, the [TER score](https://aclanthology.org/2006.amta-papers.25/) and the [SacreBLEU score](https://aclanthology.org/W18-6319.pdf).

*** The AzunreBLEU score is a variant of the BLEU score that was used for evaluating machine translation quality in the paper [English-Twi Parallel Corpus for Machine Translation](https://arxiv.org/pdf/2103.15625.pdf) indicating that the focus is on "adequacy" instead of "fluency" in the translations.


## [Optional] Download already-fine-tuned Twi models.


Fine-tuned Twi OPUS-MT models can be downloaded from [Google Drive](https://drive.google.com/drive/folders/13irIvPsqnryP_NJ5y6PneFKrQDJs5Qm6?usp=sharing). 


This tutorial uses GPU; to use GPU, please go to EDIT on the menu bar, notebook settings, and choose GPU for the hardware accelerator, then save.

## Clone TW-FR-MT Github Repository



In [2]:
!git clone https://github.com/gyasifred/TW-FR-MT.git

Cloning into 'TW-FR-MT'...
remote: Enumerating objects: 94, done.[K
remote: Counting objects: 100% (94/94), done.[K
remote: Compressing objects: 100% (73/73), done.[K
remote: Total 94 (delta 28), reused 76 (delta 18), pack-reused 0[K
Unpacking objects: 100% (94/94), 1001.83 KiB | 7.00 MiB/s, done.


In [3]:
%cd TW-FR-MT/

/content/TW-FR-MT


In [4]:
!ls

__init__.py  MT_systems  requirements.txt  TW_FR_EN_corpus
LICENSE      README.md	 tutorials


## Install Dependencies


In [5]:
!pip install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting scikit-learn==1.0.2
  Downloading scikit_learn-1.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.4/26.4 MB[0m [31m40.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers==4.24.0
  Downloading transformers-4.24.0-py3-none-any.whl (5.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.5/5.5 MB[0m [31m20.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentencepiece==0.1.97
  Downloading sentencepiece-0.1.97-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m72.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate==0.3.0
  Downloading evaluate-0.3.0-py3-none-any.whl (72 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 KB[0m [31m

## Training, Validation, and Test sets




Get the help message for how to use any script by passing the -h flag.

In [None]:
#!python /content/TW-FR-MT/TW_FR_EN_corpus/scripts/train_test_split.py -h

Normalized split

In [6]:
!python /content/TW-FR-MT/TW_FR_EN_corpus/scripts/train_test_split.py \
/content/TW-FR-MT/TW_FR_EN_corpus/data/total/total_tw.txt \
/content/TW-FR-MT/TW_FR_EN_corpus/data/total/total_en.txt \
/content/TW-FR-MT/TW_FR_EN_corpus/data/total/total_fr.txt \
--id_tw tw \
--id_en en \
--id_fr fr \
--train_ouput_path /content/TW-FR-MT/TW_FR_EN_corpus/data/training \
--test_ouput_path /content/TW-FR-MT/TW_FR_EN_corpus/data/test \
--val_ouput_path  /content/TW-FR-MT/TW_FR_EN_corpus/data/validation \
--val_set True

SUCCESS


Unnomalized split

In [7]:
!python /content/TW-FR-MT/TW_FR_EN_corpus/scripts/unormalized_train_test_split.py\
/content/TW-FR-MT/TW_FR_EN_corpus/data/total/total_tw.txt \
/content/TW-FR-MT/TW_FR_EN_corpus/data/total/total_en.txt \
/content/TW-FR-MT/TW_FR_EN_corpus/data/total/total_fr.txt \
--id_tw tw \
--id_en en \
--id_fr fr \
--train_ouput_path /content/TW-FR-MT/TW_FR_EN_corpus/data/training \
--test_ouput_path /content/TW-FR-MT/TW_FR_EN_corpus/data/test \
--val_ouput_path  /content/TW-FR-MT/TW_FR_EN_corpus/data/validation \
--val_set True

SUCCESS SPLITED CORPUS WITHOUT NORMALIZATION


## Fine-tuning the OPUS-MT Model


In [None]:
#!python /content/TW-FR-MT/MT_systems/opus/fine_tune_opus.py -h

In [None]:
!python /content/TW-FR-MT/MT_systems/opus/fine_tune_opus.py \
Helsinki-NLP/opus-mt-tw-fr \
/content/TW-FR-MT/TW_FR_EN_corpus/data/training/train_tw.txt \
/content/TW-FR-MT/TW_FR_EN_corpus/data/training/train_fr.txt \
 /content/TW-FR-MT/TW_FR_EN_corpus/data/validation/val_tw.txt \
 /content/TW-FR-MT/TW_FR_EN_corpus/data/validation/val_fr.txt \
tw \
fr \
OPUS-mt-tw-fr-tuned-1\
--max_length 128 \
--batch_size 8 \
--epoch 24 \
--warmup_steps 10 \
--savedir /content/drive/MyDrive/MT

2023-02-27 21:07:35.117983: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-27 21:07:36.365125: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-27 21:07:36.365240: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
MODEL PATH: /content/drive/MyDrive/MT
FINED-TURNED MODEL NAME: OPUS-mt-tw-fr-tuned-1
Downloadi


## Translate with the Fine-tuned OPUS-MT Model



In [13]:
!python /content/TW-FR-MT/MT_systems/opus/opus_direct_translate.py \
/content/drive/MyDrive/MT/OPUS-mt-tw-fr-tuned-1 \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_tw.txt \
--output_name tw-fr-opus-translate

2023-04-02 01:24:58.494485: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Evaluate the Translation Quality of the OPUS-MT Model

### BLEU Score

In [14]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
tw-fr-opus-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt

2023-04-02 01:43:27.365605: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
('2-GRAMS: 0.54', '3-GRAMS: 0.42', '4-GRAMS: 0.35')


### AzunreBLEU Score

In [29]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
tw-fr-opus-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt \
--azunre True

2023-04-02 01:49:00.486137: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
BLEU SCORE: 0.75


### SacreBLEU Score

In [27]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_sacrebleu.py \
tw-fr-opus-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt 

2023-04-02 01:48:13.574975: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
SacreBLEU SCORE: 35.93


TER Score

In [26]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/ter.py \
/content/drive/MyDrive/data/en-tw-translations \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_tw.txt 

2023-04-02 01:47:48.756483: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TER SCORE: 67.52


## Pre-trained OPUST-MT model

In [30]:
!python /content/TW-FR-MT/MT_systems/opus/opus_direct_translate.py \
Helsinki-NLP/opus-mt-tw-fr \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_tw.txt \
--output_name tw-fr-opus-translate-pretrained

2023-04-02 01:49:20.045890: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [38]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
tw-fr-opus-translate-pretrained \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt

2023-04-02 02:10:13.626383: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
('2-GRAMS: 0.39', '3-GRAMS: 0.29', '4-GRAMS: 0.22')


In [36]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
tw-fr-opus-translate-pretrained \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt \
--azunre True

2023-04-02 02:08:48.462655: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
BLEU SCORE: 0.62


In [37]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_sacrebleu.py \
tw-fr-opus-translate-pretrained \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt 

2023-04-02 02:09:36.231139: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
SacreBLEU SCORE: 20.79


TER score

In [35]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/ter.py \
tw-fr-opus-translate-pretrained \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt

2023-04-02 02:07:46.432310: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TER SCORE: 91.43


## Google Translate API



### Translate with the Google Translate API


In [8]:
! python /content/TW-FR-MT/MT_systems/Google_MT/googleAPIdirect_translate.py \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_tw.txt \
ak\
fr\
--output_name twi-fr-google-translate


### Evaluate the Translation Quality of the Google API Translation


#### BLEU Score

In [9]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
twi-fr-google-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt 

2023-04-02 01:18:32.083340: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Downloading builder script: 100% 9.99k/9.99k [00:00<00:00, 7.04MB/s]
('2-GRAMS: 0.60', '3-GRAMS: 0.49', '4-GRAMS: 0.41')


#### AzunreBLEU Score

In [10]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_bleu.py \
twi-fr-google-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt \
--azunre True

2023-04-02 01:18:43.691607: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
BLEU SCORE: 0.79


#### SacreBLEU Score

In [11]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/get_sacrebleu.py \
twi-fr-google-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt 

2023-04-02 01:18:53.536807: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
SacreBLEU SCORE: 44.39


TER Score

In [12]:
!python /content/TW-FR-MT/MT_systems/evalution_scripts/ter.py \
twi-fr-google-translate \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/un_test_fr.txt 

2023-04-02 01:23:31.978910: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TER SCORE: 56.03


## OPUS Pivot Translation


To perform pivot translation, you will need to fine-tune additional models.
You can alternatively download fine-tuned models from our project from the Google [Drive link](https://drive.google.com/drive/folders/13irIvPsqnryP_NJ5y6PneFKrQDJs5Qm6?usp=sharing) provided above.
I will demonstrate how to use the pivot $French-English -> English-Twi$ for MT from French to Twi using the pre-trained OPUS-MT models [Helsinki-NLP/opus-mt-fr-tw](https://huggingface.co/Helsinki-NLP/opus-mt-en-tw) and [Helsinki-NLP/opus-mt-en-tw](https://huggingface.co/Helsinki-NLP/opus-mt-en-tw).
 



In [None]:
!python /content/TW-FR-MT/MT_systems/opus/opus_pivot_translate.py \
/content/drive/MyDrive/MT/OPUS-mt-fr-en-tuned \
/content/drive/MyDrive/MT/OPUS-mt-en-tw-tuned \
/content/TW-FR-MT/TW_FR_EN_corpus/data/test/test_fr.txt \
--to_console True

2023-02-27 22:14:00.895251: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-27 22:14:02.079562: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-27 22:14:02.079679: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
Source: C'est pret.
Pivot: It's ready.
Target: Yɛasiesie wo ho.

Source: Il n'etudie pas assez