<a href="https://colab.research.google.com/github/epcl2/omdenaEthiopiaNLP/blob/master/XLMR_for_Amharic_Text_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine XLM Roberta for Amharic Text Classification





#### Flow of the notebook

1. Importing Python Libraries and preparing the environment
2. Loading data
3. Preparing the Dataset and Dataloader
4. Creating the Neural Network for Fine Tuning
5. Fine Tuning the Model
6. Validating the Model Performance
7. Saving the model

#### Data Details

The Dataset used is the Amharic News Classification Dataset, followed the preprocessing step in the [notebook](https://github.com/IsraelAbebe/An-Amharic-News-Text-classification-Dataset/blob/main/Amharic-News-Text-classification-Baseline.ipynb) and saved it into another csv.
Note: there is a row where the category is missing, that row is dropped as well. The processed article column and category column are saved into a csv and loaded into this notebook.

The language model used XLM Roberta (you can also use the XLMR-large or other models). XLM stands for cross-lingual model, i.e. the model has been pre-trained on many languages.



### Importing Python Libraries and preparing the environment

In [None]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.27.4-py3-none-any.whl (6.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m56.9 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.13.4-py3-none-any.whl (200 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.1/200.1 KB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m103.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.13.4 tokenizers-0.13.3 transformers-4.27.4


In [None]:
# Importing the libraries needed
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import torch
import seaborn as sns
import transformers
import json
from tqdm import tqdm
from torch.utils.data import Dataset, DataLoader
# from transformers import RobertaModel, RobertaTokenizer
from transformers import AutoTokenizer, XLMRobertaModel
import logging
logging.basicConfig(level=logging.ERROR)

In [None]:
# Setting up the device for GPU usage

from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'

In [None]:
# load in the Amharic News Dataset preprocessed with steps from the github notebook
train_df = pd.read_csv('normalised_data_2.csv')

In [None]:
train_df.shape

(51482, 2)

In [None]:
train_df.head()

Unnamed: 0,article,category
0,ብርሀን ፈይሳየኢትዮጵያ ቦክስ ፌዴሬሽን በየአመቱ የሚያዘጋጀው የክለቦች ቻ...,ስፖርት
1,የአዲስ ዘመን ጋዜጣ ቀደምት ዘገባዎች በእጅጉ ተነባቢ ዛሬም ላገኛቸው በ...,መዝናኛ
2,ቦጋለ አበበየአዲስ አበባ ከተማ አስተዳደር ስፖርት ኮሚሽን ከኢትዮጵያ አረ...,ስፖርት
3,ብርሀን ፈይሳአዲስ አበባ የኢትዮጵያ ፕሪምየር ሊግ በሼር ካምፓኒ እንዲተዳ...,ስፖርት
4,ቦጋለ አበበ የኢትዮጵያ ኦሊምፒክ ኮሚቴ አርባ አምስተኛ መደበኛ ጠቅላላ ጉ...,ስፖርት


In [None]:
# 6 categories (nan removed)
train_df['category'].unique()

array(['ስፖርት', 'መዝናኛ', 'ሀገር አቀፍ ዜና', 'ቢዝነስ', 'ዓለም አቀፍ ዜና', 'ፖለቲካ'],
      dtype=object)

In [None]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51482 entries, 0 to 51481
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   article   51474 non-null  object
 1   category  51482 non-null  object
dtypes: object(2)
memory usage: 804.5+ KB


In [None]:
# apparently there are some articles that are just an empty string
# from the github preprocessing notebook. When an empty string is saved
# and reloaded, it becomes NA as above
train_df = train_df.dropna(subset='article').reset_index(drop=True)
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51474 entries, 0 to 51473
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   article   51474 non-null  object
 1   category  51474 non-null  object
dtypes: object(2)
memory usage: 804.4+ KB


In [None]:
train_df.category = train_df.category.astype('category')
# get mappings from categories to indices
cat_to_idx = dict(enumerate(train_df['category'].cat.categories ) )
idx_to_cat = {v: k for k, v in cat_to_idx.items()}
idx_to_cat

{'ሀገር አቀፍ ዜና': 0, 'መዝናኛ': 1, 'ስፖርት': 2, 'ቢዝነስ': 3, 'ዓለም አቀፍ ዜና': 4, 'ፖለቲካ': 5}

In [None]:
# map the categories to indices according to the map above
train_df['label'] = train_df['category'].map(idx_to_cat)

In [None]:
new_df = train_df[['article', 'label']]
new_df

Unnamed: 0,article,label
0,ብርሀን ፈይሳየኢትዮጵያ ቦክስ ፌዴሬሽን በየአመቱ የሚያዘጋጀው የክለቦች ቻ...,2
1,የአዲስ ዘመን ጋዜጣ ቀደምት ዘገባዎች በእጅጉ ተነባቢ ዛሬም ላገኛቸው በ...,1
2,ቦጋለ አበበየአዲስ አበባ ከተማ አስተዳደር ስፖርት ኮሚሽን ከኢትዮጵያ አረ...,2
3,ብርሀን ፈይሳአዲስ አበባ የኢትዮጵያ ፕሪምየር ሊግ በሼር ካምፓኒ እንዲተዳ...,2
4,ቦጋለ አበበ የኢትዮጵያ ኦሊምፒክ ኮሚቴ አርባ አምስተኛ መደበኛ ጠቅላላ ጉ...,2
...,...,...
51469,በ2011 በጀት አመት የተከናወኑ የውጭ ዲፕሎማሲያዊ ተግባራት ስኬታማ እን...,5
51470,አቶ አገኘሁ ተሻገር የአማራ ክልል የሰላም ግንባታና የህዝብ ደህንነት ቢሮ...,5
51471,የአማራ ክልል ምክር ቤት የ230 ዳኞችን ሹመት አፀደቀየአማራ ክልል ምክር...,5
51472,በዘንድሮ በጀት አመት ከ4 ቢሊዮን ችግኝ በላይ ለመትከል እቅድ መያዙ ይታ...,0


### Preparing the Dataset and Dataloader

PyTorch ```Dataset``` allows you to use pre-loaded datasets as well as your own data. ```Dataset``` stores the samples and their corresponding labels, and ```DataLoader``` wraps an iterable around the Dataset to enable easy access to the samples. The Dataloader that will feed the data in batches to the neural network for training. ([Docs](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html))


#### *AmharicData* Dataset Class
- This class is defined to accept the Dataframe as input and generate tokenized output that is used by the model for training. 
- the [tokenizer](https://huggingface.co/docs/transformers/model_doc/xlm-roberta#transformers.XLMRobertaTokenizer) tokenizes the data in the `article` column of the dataframe. 
- The tokenizer uses the `encode_plus` method to perform tokenization and generate the necessary outputs, namely: `ids`, `attention_mask`
- `target` is the encoded category. 
- The *AmharicData* class is used to create datasets for training and for validation.


#### Dataloader
- Dataloader is used to for creating training and validation dataloader that load data to the neural network in a defined manner. This is needed because all the data from the dataset cannot be loaded to the memory at once, hence the amount of dataloaded to the memory and then passed to the neural network needs to be controlled.
- This control is achieved using the parameters such as `batch_size` and `max_len`.
- Training and Validation dataloaders are used in the training and validation part of the flow respectively

In [None]:
# Defining some key variables that will be used later on in the training
MAX_LEN = 256
TRAIN_BATCH_SIZE = 16
VALID_BATCH_SIZE = 4
# EPOCHS = 1
LEARNING_RATE = 1e-05
tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base', truncation=True)

Downloading (…)lve/main/config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

Downloading (…)tencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/9.10M [00:00<?, ?B/s]

A custom Dataset class must implement three functions:``` __init__, __len__```, and ```__getitem__```

In [None]:
class AmharicData(Dataset):
    def __init__(self, dataframe, tokenizer, max_len):
        self.tokenizer = tokenizer
        self.data = dataframe
        self.text = self.data['article']
        self.targets = self.data['label']
        self.max_len = max_len

    def __len__(self):
        return len(self.text)

    def __getitem__(self, index):
        text = str(self.text[index])
        # text = " ".join(text.split())

        inputs = self.tokenizer.encode_plus(
            text,
            None,
            add_special_tokens=True,
            max_length=self.max_len,
            pad_to_max_length=True,
            return_token_type_ids=True,
            truncation=True
        )
        ids = inputs['input_ids']
        mask = inputs['attention_mask']
        token_type_ids = inputs["token_type_ids"]

        return {
            'ids': torch.tensor(ids, dtype=torch.long),
            'mask': torch.tensor(mask, dtype=torch.long),
            'token_type_ids': torch.tensor(token_type_ids, dtype=torch.long),
            'targets': torch.tensor(self.targets[index], dtype=torch.float)
        }

In [None]:
# stratify split (keep the same % of each class in train & val set)
# set random state so that it is reproducible
train_data, test_data = train_test_split(new_df, test_size=0.2, random_state=0, stratify=new_df['label'])
train_data = train_data.reset_index(drop=True)
test_data = test_data.reset_index(drop=True)

print("FULL Dataset: {}".format(new_df.shape))
print("TRAIN Dataset: {}".format(train_data.shape))
print("TEST Dataset: {}".format(test_data.shape))

training_set = AmharicData(train_data, tokenizer, MAX_LEN)
testing_set = AmharicData(test_data, tokenizer, MAX_LEN)

FULL Dataset: (51474, 2)
TRAIN Dataset: (41179, 2)
TEST Dataset: (10295, 2)


In [None]:
train_params = {'batch_size': TRAIN_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

test_params = {'batch_size': VALID_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

training_loader = DataLoader(training_set, **train_params)
testing_loader = DataLoader(testing_set, **test_params)

<a id='section04'></a>
### Creating the Neural Network for Fine Tuning

#### Neural Network
 - We will be creating a neural network with the `XLMRClass`. 
 - This network will have the XLMR Language model followed by a `dropout` and finally a `Linear` layer to obtain the final outputs. 
 - Final layer outputs is what will be compared to the `News data category` to determine the accuracy of models prediction. (The size of this layer is chosen arbritarily) 
 - We will initiate an instance of the network called `model`. This instance will be used for training and then to save the final trained model for future inference. 
 
#### Loss Function and Optimizer
 - `Loss Function` and `Optimizer` and defined in the next cell.
 - The `Loss Function` is used the calculate the difference in the output created by the model and the actual output. 
 - `Optimizer` is used to update the weights of the neural network to improve its performance.

In [None]:
# this is a sample of how to load
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
model = XLMRobertaModel.from_pretrained("xlm-roberta-base")

inputs = tokenizer("ብርሀን ፈይሳየኢትዮጵያ ቦክስ ፌዴሬሽን በየአመቱ የሚያዘጋጀው የክለቦች ", return_tensors="pt")
outputs = model(**inputs)

Some weights of the model checkpoint at xlm-roberta-base were not used when initializing XLMRobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.weight']
- This IS expected if you are initializing XLMRobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [None]:
class XLMRClass(torch.nn.Module):
    def __init__(self):
        super(XLMRClass, self).__init__()
        # XLMR model
        self.l1 = XLMRobertaModel.from_pretrained("xlm-roberta-base")
        # add a fully connected layer on top
        self.pre_classifier = torch.nn.Linear(768, 128)
        self.dropout = torch.nn.Dropout(0.3)
        # classification layer
        self.classifier = torch.nn.Linear(128, 6)

    def forward(self, input_ids, attention_mask, token_type_ids):
        output_1 = self.l1(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
        hidden_state = output_1[0]              # [batch_size, seq_len, 768]
        pooler = hidden_state[:, 0]             # [batch_size, 768]                   
        pooler = self.pre_classifier(pooler)    # [batch_size, 128]
        pooler = torch.nn.ReLU()(pooler)
        pooler = self.dropout(pooler)
        output = self.classifier(pooler)
        return output

In [None]:
model = XLMRClass()
model.to(device)

Some weights of the model checkpoint at xlm-roberta-base were not used when initializing XLMRobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.weight']
- This IS expected if you are initializing XLMRobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


XLMRClass(
  (l1): XLMRobertaModel(
    (embeddings): XLMRobertaEmbeddings(
      (word_embeddings): Embedding(250002, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): XLMRobertaEncoder(
      (layer): ModuleList(
        (0-11): 12 x XLMRobertaLayer(
          (attention): XLMRobertaAttention(
            (self): XLMRobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): XLMRobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (Laye

<a id='section05'></a>
### Fine Tuning the Model
 
Here we define a training function that trains the model on the training dataset created above, specified number of times (EPOCH), An epoch defines how many times the complete data will be passed through the network. 

Following events happen in this function to fine tune the neural network:
- The dataloader passes data to the model based on the batch size. 
- Subsequent output from the model and the actual category are compared to calculate the loss. 
- Loss value is used to optimize the weights of the neurons in the network.
- After every 100 steps the loss value is printed in the console.

In [None]:
# Creating the loss function and optimizer
loss_function = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model.parameters(), lr=LEARNING_RATE)

In [None]:
def calcuate_accuracy(preds, targets):
    n_correct = (preds == targets).sum().item()
    return n_correct

In [None]:
# Defining the training function on the train dataset

def train(epoch):
    tr_loss = 0
    n_correct = 0
    nb_tr_steps = 0
    nb_tr_examples = 0
    model.train()
    for _, data in tqdm(enumerate(training_loader, 0)):
        ids = data['ids'].to(device, dtype = torch.long)
        mask = data['mask'].to(device, dtype = torch.long)
        token_type_ids = data['token_type_ids'].to(device, dtype = torch.long)
        targets = data['targets'].to(device, dtype = torch.long)

        # forward pass through the model
        outputs = model(ids, mask, token_type_ids)
        # calculate loss
        loss = loss_function(outputs, targets)
        tr_loss += loss.item()
        # to get prediction accuracy
        big_val, big_idx = torch.max(outputs.data, dim=1)
        n_correct += calcuate_accuracy(big_idx, targets)

        nb_tr_steps += 1
        nb_tr_examples+=targets.size(0)
        
        if _%100==0:
            loss_step = tr_loss/nb_tr_steps
            accu_step = (n_correct*100)/nb_tr_examples 
            print(f"Training Loss per 100 steps: {loss_step}")
            print(f"Training Accuracy per 100 steps: {accu_step}")

        # back prop
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print(f'The Total Accuracy for Epoch {epoch}: {(n_correct*100)/nb_tr_examples}')
    epoch_loss = tr_loss/nb_tr_steps
    epoch_accu = (n_correct*100)/nb_tr_examples
    print(f"Training Loss Epoch: {epoch_loss}")
    print(f"Training Accuracy Epoch: {epoch_accu}")

    return model

In [None]:
EPOCHS = 1
for epoch in range(EPOCHS):
    model = train(epoch)

0it [00:00, ?it/s]

Training Loss per 100 steps: 1.8996349573135376
Training Accuracy per 100 steps: 25.0


100it [01:21,  1.25it/s]

Training Loss per 100 steps: 1.5345033064927205
Training Accuracy per 100 steps: 41.1509900990099


200it [02:42,  1.26it/s]

Training Loss per 100 steps: 1.3543855894857377
Training Accuracy per 100 steps: 50.24875621890547


300it [04:02,  1.25it/s]

Training Loss per 100 steps: 1.236292499046389
Training Accuracy per 100 steps: 54.69269102990033


400it [05:23,  1.25it/s]

Training Loss per 100 steps: 1.141687522653927
Training Accuracy per 100 steps: 58.104738154613464


500it [06:43,  1.25it/s]

Training Loss per 100 steps: 1.069280689110061
Training Accuracy per 100 steps: 61.10279441117765


600it [08:03,  1.24it/s]

Training Loss per 100 steps: 1.007950357163012
Training Accuracy per 100 steps: 63.73752079866888


700it [09:24,  1.25it/s]

Training Loss per 100 steps: 0.962581011351097
Training Accuracy per 100 steps: 65.32631954350927


800it [10:44,  1.24it/s]

Training Loss per 100 steps: 0.9264586062243815
Training Accuracy per 100 steps: 66.56523096129838


900it [12:04,  1.26it/s]

Training Loss per 100 steps: 0.8941112883909957
Training Accuracy per 100 steps: 67.65399556048834


1000it [13:25,  1.24it/s]

Training Loss per 100 steps: 0.8656757207332434
Training Accuracy per 100 steps: 68.65634365634365


1100it [14:45,  1.24it/s]

Training Loss per 100 steps: 0.8437477983365158
Training Accuracy per 100 steps: 69.38578564940963


1200it [16:05,  1.24it/s]

Training Loss per 100 steps: 0.8244283276111458
Training Accuracy per 100 steps: 70.02497918401332


1300it [17:26,  1.23it/s]

Training Loss per 100 steps: 0.8064951842949997
Training Accuracy per 100 steps: 70.60914681014604


1400it [18:46,  1.24it/s]

Training Loss per 100 steps: 0.7893782251439377
Training Accuracy per 100 steps: 71.10546038543897


1500it [20:07,  1.25it/s]

Training Loss per 100 steps: 0.7730003220390909
Training Accuracy per 100 steps: 71.723017988008


1600it [21:27,  1.24it/s]

Training Loss per 100 steps: 0.7592999946412111
Training Accuracy per 100 steps: 72.16973766396002


1700it [22:48,  1.23it/s]

Training Loss per 100 steps: 0.7475982244525637
Training Accuracy per 100 steps: 72.48309817754262


1800it [24:08,  1.23it/s]

Training Loss per 100 steps: 0.7340259105587654
Training Accuracy per 100 steps: 72.97681843420322


1900it [25:29,  1.24it/s]

Training Loss per 100 steps: 0.7218265913187799
Training Accuracy per 100 steps: 73.3890057864282


2000it [26:49,  1.23it/s]

Training Loss per 100 steps: 0.7125621581639069
Training Accuracy per 100 steps: 73.72251374312843


2100it [28:10,  1.25it/s]

Training Loss per 100 steps: 0.701848679971746
Training Accuracy per 100 steps: 74.1432651118515


2200it [29:30,  1.24it/s]

Training Loss per 100 steps: 0.6920992477353949
Training Accuracy per 100 steps: 74.48318945933667


2300it [30:51,  1.25it/s]

Training Loss per 100 steps: 0.6835467665642824
Training Accuracy per 100 steps: 74.76368970013039


2400it [32:12,  1.25it/s]

Training Loss per 100 steps: 0.6771086687404431
Training Accuracy per 100 steps: 74.9947938359017


2500it [33:32,  1.25it/s]

Training Loss per 100 steps: 0.6698550817306067
Training Accuracy per 100 steps: 75.27738904438225


2574it [34:31,  1.24it/s]

The Total Accuracy for Epoch 0: 75.37822676607009
Training Loss Epoch: 0.6657530925589055
Training Accuracy Epoch: 75.37822676607009





<a id='section06'></a>
### Validating the Model

During the validation stage we pass the unseen data(Testing Dataset) to the model. This step determines how good the model performs on the unseen data. 

This unseen data was seperated during the Dataset creation stage. 
During the validation stage the weights of the model are not updated. Only the final output is compared to the actual value. This comparison is then used to calcuate the accuracy of the model. 

In [None]:
def valid(model, testing_loader):
    # do not update model weights
    model.eval()
    n_correct = 0; n_wrong = 0; total = 0; tr_loss=0; nb_tr_steps=0; nb_tr_examples=0
    # we don't do gradient descent
    with torch.no_grad():
        for _, data in tqdm(enumerate(testing_loader, 0)):
            ids = data['ids'].to(device, dtype = torch.long)
            mask = data['mask'].to(device, dtype = torch.long)
            token_type_ids = data['token_type_ids'].to(device, dtype=torch.long)
            targets = data['targets'].to(device, dtype = torch.long)
            # get prediction
            outputs = model(ids, mask, token_type_ids).squeeze()
            # get loss
            loss = loss_function(outputs, targets)
            tr_loss += loss.item()
            big_val, big_idx = torch.max(outputs.data, dim=1)
            # get accuracy
            n_correct += calcuate_accuracy(big_idx, targets)

            nb_tr_steps += 1
            nb_tr_examples+=targets.size(0)
            
            if _%1000==0:
                loss_step = tr_loss/nb_tr_steps
                accu_step = (n_correct*100)/nb_tr_examples
                print(f"Validation Loss per 1000 steps: {loss_step}")
                print(f"Validation Accuracy per 1000 steps: {accu_step}")
    epoch_loss = tr_loss/nb_tr_steps
    epoch_accu = (n_correct*100)/nb_tr_examples
    print(f"Validation Loss Epoch: {epoch_loss}")
    print(f"Validation Accuracy Epoch: {epoch_accu}")
    
    return epoch_accu


In [None]:
acc = valid(model, testing_loader)
print("Accuracy on test data = %0.2f%%" % acc)

3it [00:00, 12.02it/s]

Validation Loss per 100 steps: 0.6536316871643066
Validation Accuracy per 100 steps: 75.0


2574it [02:58, 14.44it/s]

Validation Loss Epoch: 0.4708679529545914
Validation Accuracy Epoch: 81.74842156386596
Accuracy on test data = 81.75%





<a id='section07'></a>
### Saving the Trained Model 

In [None]:
output_model_file = 'model1'
output_vocab_file = './'

model_to_save = model
torch.save(model_to_save, output_model_file)
tokenizer.save_vocabulary(output_vocab_file)

print('All files saved')

All files saved


In [None]:
# sample to reload the model to make sure that it works
model2 = torch.load('model1')
acc = valid(model2, testing_loader)
print("Accuracy on test data = %0.2f%%" % acc)

2it [00:00, 10.44it/s]

Validation Loss per 100 steps: 0.46105337142944336
Validation Accuracy per 100 steps: 75.0


2574it [03:10, 13.51it/s]

Validation Loss Epoch: 0.47095878469224417
Validation Accuracy Epoch: 81.74842156386596
Accuracy on test data = 81.75%





Note: this is just a sample flow of fine-tuning XLM-Roberta for text classification. Some of the choices for hyperparameters are quite arbitrary and I have not experimented with different settings. We should be able to get better results with more tuning.