# Machine translation using `simpletransformers` *Yorùbá to English*

[![simple](https://img.shields.io/badge/Simple_Transformers-v0.61.6-0c0c0c?logo=FutureLearn&logoColor=white&style=for-the-badge)](https://github.com/ThilinaRajapakse/simpletransformers)

This library is based on the Transformers library by HuggingFace. Simple Transformers lets you quickly train and evaluate Transformer models. Only 3 lines of code are needed to initialize a model, train the model, and evaluate a model.

**Supports**

- Sequence Classification
- Token Classification (NER)
- Question Answering
- Language Model Fine-Tuning
- Language Model Training
- Language Generation
- T5 Model
- Seq2Seq Tasks
- Multi-Modal Classification
- Conversational AI.
- Text Representation Generation.

![Picture title](image-20210601-123125.png)

## Notebook was made in Kaggle

## Installing simple transformers

In [None]:
!pip install simpletransformers
!pip install fsspec==2021.5.0

# Data loading
I have used original data with no preprocessing 

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split

train = pd.read_csv("../input/yorb-machine-translation/Train.csv")
test = pd.read_csv("../input/yorb-machine-translation/Test.csv")

# `Yoruba` Model
Using simple transformer seq2seq I have downloaded `Helsinki-NLP/opus-mt-mul-en` which work best in our case and using specific `Seq2SeqArgs` to set arguments of model.

**Args**
- num_train_epochs = 35
- batch_size = 32
- max_length = 120
- src_lang ="yor"
- tgt_lang ="en_XX"

In [None]:
import logging

import pandas as pd
from simpletransformers.seq2seq import (
    Seq2SeqModel,
    Seq2SeqArgs,
)


logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

train_data = train[["Yoruba","English"]]

train_data = train_data.rename(columns={"Yoruba":"input_text","English":"target_text"})
train_df, eval_df = train_test_split(train_data, test_size=0.05, random_state=42)


model_args = Seq2SeqArgs()
model_args.num_train_epochs = 35
model_args.no_save = True
model_args.evaluate_generated_text = False
model_args.evaluate_during_training = False
model_args.evaluate_during_training_verbose = True
model_args.rag_embed_batch_size = 32
model_args.max_length = 120
model_args.src_lang ="yor"
model_args.tgt_lang ="en_XX"

# Initialize model
model = Seq2SeqModel(
    encoder_decoder_type="marian",
    encoder_decoder_name="Helsinki-NLP/opus-mt-mul-en",
    args=model_args,
    use_cuda=True,
)


def count_matches(labels, preds):
    print(labels)
    print(preds)
    return sum(
        [
            1 if label == pred else 0
            for label, pred in zip(labels, preds)
        ]
    )







HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1146.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=310385901.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=706917.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=791194.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1423947.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=44.0, style=ProgressStyle(description_w…




## Model Training

In [None]:
# Train the model
model.train_model(
    train_df, eval_data=eval_df, matches=count_matches
)


HBox(children=(FloatProgress(value=0.0, max=9551.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Epoch', max=35.0, style=ProgressStyle(description_width='…

HBox(children=(FloatProgress(value=0.0, description='Running Epoch 0 of 35', max=1194.0, style=ProgressStyle(d…






HBox(children=(FloatProgress(value=0.0, description='Running Epoch 1 of 35', max=1194.0, style=ProgressStyle(d…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 2 of 35', max=1194.0, style=ProgressStyle(d…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 3 of 35', max=1194.0, style=ProgressStyle(d…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 4 of 35', max=1194.0, style=ProgressStyle(d…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 5 of 35', max=1194.0, style=ProgressStyle(d…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 6 of 35', max=1194.0, style=ProgressStyle(d…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 7 of 35', max=1194.0, style=ProgressStyle(d…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 8 of 35', max=1194.0, style=ProgressStyle(d…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 9 of 35', max=1194.0, style=ProgressStyle(d…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 10 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 11 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 12 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 13 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 14 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 15 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 16 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 17 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 18 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 19 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 20 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 21 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 22 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 23 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 24 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 25 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 26 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 27 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 28 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 29 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 30 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 31 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 32 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 33 of 35', max=1194.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 34 of 35', max=1194.0, style=ProgressStyle(…





(41790, 0.9875997785191789)

## Model Evaluation

In [None]:
# # Evaluate the model
results = model.eval_model(eval_df)

HBox(children=(FloatProgress(value=0.0, max=503.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=63.0, style=ProgressStyle(descri…




In [None]:
results

{'eval_loss': 2.804287210343376}

## Sample Prediction

In [None]:
# Use the model for prediction
print(
    model.predict(
        
            test.Yoruba.values[25]
        
    )
)

HBox(children=(FloatProgress(value=0.0, description='Generating outputs', max=6.0, style=ProgressStyle(descrip…




['Mother! Oh!', 'put it in', 'over-What', 'Sunday', 'and next', 'e. E.,']


## Predicting English from Yoruba `Test`

In [None]:
test["Label"] = model.predict(list(test.Yoruba.values))

HBox(children=(FloatProgress(value=0.0, description='Generating outputs', max=852.0, style=ProgressStyle(descr…




In [None]:
test.head()

Unnamed: 0,ID,Yoruba,Label
0,ID_AAAitMaH,"Nínú ìpè kan lẹ́yìn ìgbà náà, wọ́n sọ fún aṣoj...","In a later call, the BlaBlaBlacar▁representati..."
1,ID_AAKKdQwr,Nítorí kò sí nǹkan tí ọkùnrin ò lè ṣe láì náán...,Because there is nothing that a man cannot do ...
2,ID_ABgAyEOp,Bí i kó pariwo. Kí ó kígbe mọ́ ẹ?,Because he would sing a noise. Why would he li...
3,ID_ACFgfKQs,"Tí ó ń lé e lọ sọ́nà etí odò Akókurà, tí ó bẹ̀...","While▁following, he goes to the Akokurà River,..."
4,ID_ACNPmlhf,Èṣúńiyì mọ̀ iṣẹ́ rẹ̀ dunjú. Màmá tirí bí ó ṣe ...,Eswuniy knows his job. Mother is called on how...


# Submission

In [None]:
test[["ID","Label"]].to_csv("submission.csv",index=False)

# Results

![Picture title](image-20210618-104819.png)

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=19d07ac6-e73e-4e97-ab09-863b3b094340' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>