# Fine Tuning Transformer for MultiClass Text Classification

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.activity.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fexperimentsandconfigs%20https%3a%2f%2fwww.googleapis.com%2fauth%2fphotos.native&response_type=code

Enter your authorization code:
4/1AX4XfWiQmFex8AlnYBCGzD8yBv0QX2fh_R2seWHa2CL7xz8REClJ5cndARA
Mounted at /content/drive


In [None]:
# Installing the transformers library and additional libraries if looking process 

!pip install -q transformers

# Code for TPU packages install
# !curl -q https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
# !python pytorch-xla-env-setup.py --apt-packages libomp5 libopenblas-dev

[K     |████████████████████████████████| 2.9 MB 5.0 MB/s 
[K     |████████████████████████████████| 56 kB 5.4 MB/s 
[K     |████████████████████████████████| 895 kB 70.9 MB/s 
[K     |████████████████████████████████| 596 kB 71.8 MB/s 
[K     |████████████████████████████████| 3.3 MB 55.5 MB/s 
[?25h

### Introduction

In this tutorial we will be fine tuning a transformer model for the **Multiclass text classification** problem. 
This is one of the most common business problems where a given piece of text/sentence/document needs to be classified into one of the categories out of the given list.

#### Flow of the notebook

The notebook will be divided into seperate sections to provide a organized walk through for the process used. This process can be modified for individual use cases. The sections are:

1. [Importing Python Libraries and preparing the environment](#section01)
2. [Importing and Pre-Processing the domain data](#section02)
3. [Preparing the Dataset and Dataloader](#section03)
4. [Creating the Neural Network for Fine Tuning](#section04)
5. [Fine Tuning the Model](#section05)
6. [Validating the Model Performance](#section06)
7. [Saving the model and artifacts for Inference in Future](#section07)

#### Technical Details

This script leverages on multiple tools designed by other teams. Details of the tools used below. Please ensure that these elements are present in your setup to successfully implement this script.

 - Data: 
	 - We are using the News aggregator dataset available at by [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/News+Aggregator)
	 - We are referring only to the first csv file from the data dump: `newsCorpora.csv`
	 - There are `422937` rows of data.  Where each row has the following data-point: 
		 - ID Numeric ID  
		 - TITLE News title  
		 - URL Url  
		 - PUBLISHER Publisher name  
		 - CATEGORY News category (b = business, t = science and technology, e = entertainment, m = health)  
		 - STORY Alphanumeric ID of the cluster that includes news about the same story  
		 - HOSTNAME Url hostname  
		 - TIMESTAMP Approximate time the news was published, as the number of milliseconds since the epoch 00:00:00 GMT, January 1, 1970


 - Language Model Used:
	 - DistilBERT this is a smaller transformer model as compared to BERT or Roberta. It is created by process of distillation applied to Bert. 
	 - [Blog-Post](https://medium.com/huggingface/distilbert-8cf3380435b5)
	 - [Research Paper](https://arxiv.org/abs/1910.01108)
     - [Documentation for python](https://huggingface.co/transformers/model_doc/distilbert.html)


 - Hardware Requirements:
	 - Python 3.6 and above
	 - Pytorch, Transformers and All the stock Python ML Libraries
	 - GPU enabled setup 


 - Script Objective:
	 - The objective of this script is to fine tune DistilBERT to be able to classify a news headline into the following categories:
		 - Business
		 - Technology
		 - Health
		 - Entertainment 


<a id='section01'></a>
### Importing Python Libraries and preparing the environment

At this step we will be importing the libraries and modules needed to run our script. Libraries are:
* Pandas
* Pytorch
* Pytorch Utils for Dataset and Dataloader
* Transformers
* DistilBERT Model and Tokenizer

Followed by that we will preapre the device for CUDA execeution. This configuration is needed if you want to leverage on onboard GPU. 

In [None]:
# Importing the libraries needed
import pandas as pd
import torch
import transformers
from torch.utils.data import Dataset, DataLoader
from transformers import DistilBertModel, DistilBertTokenizer

In [None]:
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')


Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

In [None]:
# Setting up the device for GPU usage

from torch import cuda

device = 'cuda' if cuda.is_available() else 'cpu'

<a id='section02'></a>
### Importing and Pre-Processing the domain data

We will be working with the data and preparing for fine tuning purposes. 
*Assuming that the `newCorpora.csv` is already downloaded in your `data` folder*

Import the file in a dataframe and give it the headers as per the documentation.
Cleaning the file to remove the unwanted columns and create an additional column for training.
The final Dataframe will be something like this:

|TITLE|CATEGORY|ENCODED_CAT|
|--|--|--|
|  title_1|Entertainment | 1 |
|  title_2|Entertainment | 1 |
|  title_3|Business| 2 |
|  title_4|Science| 3 |
|  title_5|Science| 3 |
|  title_6|Health| 4 |

In [None]:
path = '/content/drive/MyDrive/NLP/Final_datasets/9kmax.csv'
df = pd.read_csv(path)
df = df.drop(columns=['Unnamed: 0','Unnamed: 0.1','Unnamed: 0.1.1','Unnamed: 0.1.1.1'])
df.head()

Unnamed: 0,Text,Emotions,sw_Text
0,@ Ember with Dante... I have so much fun with ...,love,ember with dante... i have so much fun with ...
1,i feel stupid typing that,sad,i feel stupid typing that
2,@starlingpoet lol.. that's worrying,worry,starlingpoet lol.. that s worrying
3,And I-I am just trying to figure out why.,sad,and i i am just trying to figure out why.
4,@Dez4jc @goldengoodas thanks hun!! I'm working...,love,dez4jc goldengoodas thanks hun i m working...


In [None]:
df = df.sample(frac=1).reset_index(drop=True)
df.head()

Unnamed: 0,Text,Emotions,sw_Text
0,i kind of feel a little petty about this,anger,i kind of feel a little petty about this
1,I wasn't aware that I had to go to court .,anger,i wasn t aware that i had to go to court .
2,"@ekardmatt well you MY man, you and your truck...",sad,ekardmatt well you my man you and your truck...
3,i feel so wronged but what can i do,anger,i feel so wronged but what can i do
4,Fb I hate when I try support my local booksto...,sad,fb i hate when i try support my local bookstor...


In [None]:
#emo_dict = {'worry':0, 'sadness':1, 'surprise':2, 'love':3, 'neutral':4, 'anger':5}
emo_dict1 = {'neutral':0,
    'worry':1,
    'sad':2,
    'anger':3,
    'surprise':4,
    'love':5,
    'happy':6
     }
     #frustated is not here
def make_label(text):
    return emo_dict1[text]

df.Emotions = df.Emotions.apply(make_label)

In [None]:
df.sw_Text = df.sw_Text.apply(str)
df.Emotions = df.Emotions.apply(int)

In [None]:
def truncate_long_sent(DF):
    for i in range(DF.shape[0]):
        d = DF[i]
        d_l = d.split()
        if len(d_l) >= 400:
            d_l = d_l[:40]
            DF[i] = ' '.join(d_l)
truncate_long_sent(df.sw_Text)

In [None]:
df = df[['sw_Text','Emotions']].reset_index(drop=True)
df.head()

Unnamed: 0,sw_Text,Emotions
0,i kind of feel a little petty about this,3
1,i wasn t aware that i had to go to court .,3
2,ekardmatt well you my man you and your truck...,2
3,i feel so wronged but what can i do,3
4,fb i hate when i try support my local bookstor...,2


In [None]:
df.Emotions.value_counts()

6    9000
2    9000
0    9000
3    8946
1    8431
5    7241
4    6950
Name: Emotions, dtype: int64

In [None]:
df.dtypes

sw_Text     object
Emotions     int64
dtype: object

In [None]:
df.sw_Text.isna().sum()

0

<a id='section03'></a>
### Preparing the Dataset and Dataloader

We will start with defining few key variables that will be used later during the training/fine tuning stage.
Followed by creation of Dataset class - This defines how the text is pre-processed before sending it to the neural network. We will also define the Dataloader that will feed  the data in batches to the neural network for suitable training and processing. 
Dataset and Dataloader are constructs of the PyTorch library for defining and controlling the data pre-processing and its passage to neural network. For further reading into Dataset and Dataloader read the [docs at PyTorch](https://pytorch.org/docs/stable/data.html)

#### *Triage* Dataset Class
- This class is defined to accept the Dataframe as input and generate tokenized output that is used by the DistilBERT model for training. 
- We are using the DistilBERT tokenizer to tokenize the data in the `TITLE` column of the dataframe. 
- The tokenizer uses the `encode_plus` method to perform tokenization and generate the necessary outputs, namely: `ids`, `attention_mask`
- To read further into the tokenizer, [refer to this document](https://huggingface.co/transformers/model_doc/distilbert.html#distilberttokenizer)
- `target` is the encoded category on the news headline. 
- The *Triage* class is used to create 2 datasets, for training and for validation.
- *Training Dataset* is used to fine tune the model: **80% of the original data**
- *Validation Dataset* is used to evaluate the performance of the model. The model has not seen this data during training. 

#### Dataloader
- Dataloader is used to for creating training and validation dataloader that load data to the neural network in a defined manner. This is needed because all the data from the dataset cannot be loaded to the memory at once, hence the amount of dataloaded to the memory and then passed to the neural network needs to be controlled.
- This control is achieved using the parameters such as `batch_size` and `max_len`.
- Training and Validation dataloaders are used in the training and validation part of the flow respectively

In [None]:
# Defining some key variables that will be used later on in the training

NO_OF_CLASS= 7 # number of class/labels your data has
MAX_LEN = 400
TRAIN_BATCH_SIZE = 32
VALID_BATCH_SIZE = 16
EPOCHS = 5
LEARNING_RATE = 1e-05
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')

In [None]:
#df = df.rename(columns={'TITLE':'sw_Text','ENCODE_CAT':'Emotion' })

In [None]:
class MyDataset(Dataset):
    def __init__(self, dataframe):
        self.len = len(dataframe)
        self.data = dataframe
        #self.tokenizer = tokenizer
        #self.max_len = max_len
        
    def __getitem__(self, index):
        text = str(self.data.sw_Text[index])
        text = " ".join(text.split())
        target = torch.tensor(self.data.Emotions[index].astype(int))
        return {'text' : text, 'target' : target}

    
    def __len__(self):
        return self.len

In [None]:
#test_dataset = df.sample(frac=0.2,random_state=200).reset_index(drop=True)
#testing_set = MyDataset(test_dataset)

In [None]:
#test_params = {'batch_size': VALID_BATCH_SIZE,
    #            'shuffle': True,
   #             'num_workers': 0
  #              }
 #               
#testing_loader = DataLoader(testing_set, **test_params)

In [None]:
# Creating the dataset and dataloader for the neural network

train_size = 1
train_dataset=df.sample(frac=train_size,random_state=200)
test_dataset=df.drop(train_dataset.index).reset_index(drop=True)
train_dataset = train_dataset.reset_index(drop=True)


print("FULL Dataset: {}".format(df.shape))
print("TRAIN Dataset: {}".format(train_dataset.shape))
print("TEST Dataset: {}".format(test_dataset.shape))

training_set = MyDataset(train_dataset)
testing_set = MyDataset(test_dataset)

FULL Dataset: (58568, 2)
TRAIN Dataset: (58568, 2)
TEST Dataset: (0, 2)


In [None]:
train_params = {'batch_size': TRAIN_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

test_params = {'batch_size': VALID_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

training_loader = DataLoader(training_set, **train_params)
#testing_loader = DataLoader(testing_set, **test_params)

<a id='section04'></a>
### Creating the Neural Network for Fine Tuning

#### Neural Network
 - We will be creating a neural network with the `DistillBERTClass`. 
 - This network will have the DistilBERT Language model followed by a `dropout` and finally a `Linear` layer to obtain the final outputs. 
 - The data will be fed to the DistilBERT Language model as defined in the dataset. 
 - Final layer outputs is what will be compared to the `encoded category` to determine the accuracy of models prediction. 
 - We will initiate an instance of the network called `model`. This instance will be used for training and then to save the final trained model for future inference. 
 
#### Loss Function and Optimizer
 - `Loss Function` and `Optimizer` and defined in the next cell.
 - The `Loss Function` is used the calculate the difference in the output created by the model and the actual output. 
 - `Optimizer` is used to update the weights of the neural network to improve its performance.
 
#### Further Reading
- You can refer to my [Pytorch Tutorials](https://github.com/abhimishra91/pytorch-tutorials) to get an intuition of Loss Function and Optimizer.
- [Pytorch Documentation for Loss Function](https://pytorch.org/docs/stable/nn.html#loss-functions)
- [Pytorch Documentation for Optimizer](https://pytorch.org/docs/stable/optim.html)
- Refer to the links provided on the top of the notebook to read more about DistiBERT. 

In [None]:
# Creating the customized model, by adding a drop out and a dense layer on top of distil bert to get the final output for the model. 

class DistillBERTClass(torch.nn.Module):
    def __init__(self):
        super(DistillBERTClass, self).__init__()
        self.l1 = DistilBertModel.from_pretrained("distilbert-base-uncased",return_dict=False)
        self.pre_classifier = torch.nn.Linear(768, 768)
        self.dropout = torch.nn.Dropout(0.3)
        self.classifier = torch.nn.Linear(768, NO_OF_CLASS)
        self.softmax = torch.nn.Softmax(dim=1)

    def forward(self, input_ids, attention_mask):
        output_1 = self.l1(input_ids=input_ids, attention_mask=attention_mask)
        hidden_state = output_1[0]
        pooler = hidden_state[:, 0]
        pooler = self.pre_classifier(pooler)
        pooler = torch.nn.ReLU()(pooler)
        pooler = self.dropout(pooler)
        pooler = self.classifier(pooler)
        output = self.softmax(pooler)
        return output

In [None]:
device

'cuda'

In [None]:
model = DistillBERTClass()
model.to(device)

Downloading:   0%|          | 0.00/256M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.bias', 'vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_projector.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


DistillBERTClass(
  (l1): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0): TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
            (lin1): Linear(in_feat

In [None]:
# Creating the loss function and optimizer
loss_function = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params =  model.parameters(), lr=LEARNING_RATE)

<a id='section05'></a>
### Fine Tuning the Model

After all the effort of loading and preparing the data and datasets, creating the model and defining its loss and optimizer. This is probably the easier steps in the process. 

Here we define a training function that trains the model on the training dataset created above, specified number of times (EPOCH), An epoch defines how many times the complete data will be passed through the network. 

Following events happen in this function to fine tune the neural network:
- The dataloader passes data to the model based on the batch size. 
- Subsequent output from the model and the actual category are compared to calculate the loss. 
- Loss value is used to optimize the weights of the neurons in the network.
- After every 5000 steps the loss value is printed in the console.

As you can see just in 1 epoch by the final step the model was working with a miniscule loss of 0.0002485 i.e. the output is extremely close to the actual output.

In [None]:
# Function to calcuate the accuracy of the model
def calcuate_accu(big_idx, targets):
    n_correct = (big_idx==targets).sum().item()
    return n_correct

In [None]:
# Defining the training function on the 80% of the dataset for tuning the distilbert model

def train(epoch):
    tr_loss = 0
    n_correct = 0
    nb_tr_steps = 0
    nb_tr_examples = 0
    model.train()
    for _,data in enumerate(training_loader, 0):
        #we are doing this to do batch tokenization so we have 
        # better gpu speed
        text = data['text']
        text = [[x] for x in text]
        inputs = tokenizer(
            text,
            None,
            add_special_tokens=True,
            max_length=MAX_LEN,
            padding = 'longest',
            return_token_type_ids=True,
            truncation=True,
            return_tensors='pt',
            is_split_into_words=True
            )
        # used return_tensors='pt' to return pytorch tensors
        ids = inputs['input_ids'].to(device, dtype = torch.long)
        #print(ids)
        mask = inputs['attention_mask'].to(device, dtype = torch.long)
        targets = data['target'].to(device, dtype = torch.long)
        outputs = model(ids, mask)
        loss = loss_function(outputs, targets)
        tr_loss += loss.item()
        big_val, big_idx = torch.max(outputs.data, dim=1)
        n_correct += calcuate_accu(big_idx, targets)

        nb_tr_steps += 1
        nb_tr_examples+=targets.size(0)

        optimizer.zero_grad()
        loss.backward()
        # # When using GPU
        optimizer.step()

    print(f'The Total Accuracy for Epoch {epoch}: {(n_correct*100)/nb_tr_examples}')
    epoch_loss = tr_loss/nb_tr_steps
    epoch_accu = (n_correct*100)/nb_tr_examples
    print(f"Training Loss Epoch: {epoch_loss}")
    print(f"Training Accuracy Epoch: {epoch_accu}")

    return 

In [None]:
#for epoch in range(EPOCHS):
 #   train(epoch)
    

<a id='section06'></a>
### Validating the Model


During the validation stage we pass the unseen data(Testing Dataset) to the model. This step determines how good the model performs on the unseen data. 

This unseen data is the 20% of `newscorpora.csv` which was seperated during the Dataset creation stage. 
During the validation stage the weights of the model are not updated. Only the final output is compared to the actual value. This comparison is then used to calcuate the accuracy of the model. 

As you can see the model is predicting the correct category of a given headline to a 99.9% accuracy.

In [None]:
from sklearn.metrics import  precision_score, recall_score, f1_score
import numpy as np

def tensor2numpy(t):
    #print(t)
    l = []
    for tens in t:
        for val in tens:
            l.append(int(val.cpu().numpy()))
    a = np.array(l)
    return a

In [None]:
def valid(model, testing_loader):
    model.eval()
    n_correct = 0; n_wrong = 0; total = 0
    tr_loss = 0
    nb_tr_examples = 0
    nb_tr_steps  = 0
    output_list = []
    target_list = []
    with torch.no_grad():
        for _, data in enumerate(testing_loader, 0):
            #we are doing this to do batch tokenization so we have 
            # better gpu speed
            text = data['text']
            text = [[x] for x in text]
            inputs = tokenizer(
                text,
                None,
                add_special_tokens=True,
                max_length=MAX_LEN,
                padding = 'longest',
                return_token_type_ids=True,
                truncation=True,
                return_tensors='pt',
                is_split_into_words=True
                )
            # used return_tensors='pt' to return pytorch tensors
            ids = inputs['input_ids'].to(device, dtype = torch.long)
            #print(ids)
            mask = inputs['attention_mask'].to(device, dtype = torch.long)
            targets = data['target'].to(device, dtype = torch.long)
            target_list.append(targets)
            outputs = model(ids, mask).squeeze()
            big_val, big_idx = torch.max(outputs.data, dim=1)
            output_list.append(big_idx)
            loss = loss_function(outputs, targets)
            tr_loss += loss.item()
            n_correct += calcuate_accu(big_idx, targets)

            nb_tr_steps += 1
            nb_tr_examples+=targets.size(0)
    epoch_loss = tr_loss/nb_tr_steps
    epoch_accu = (n_correct*100)/nb_tr_examples
    o = tensor2numpy(output_list)
    t = tensor2numpy(target_list)
    
    f1 = f1_score(t,o,labels=[x for x in range(NO_OF_CLASS)],average=None)
    pre = precision_score(t,o,labels=[x for x in range(NO_OF_CLASS)],average=None)
    recall = recall_score(t,o,labels=[x for x in range(NO_OF_CLASS)],average=None)

    print(f"precision_score of this epoch is : {pre}")
    print(f"recall score of this epoch is : {recall}")
    print(f"f1 score of this epoch is: {f1}")
    print(f"Validation Loss Epoch: {epoch_loss}")
    print(f"Validation Accuracy Epoch: {epoch_accu}")
    
    return epoch_accu,f1, pre, recall


In [None]:
#final traininng with full Dataset
for epoch in range(EPOCHS):
    train(epoch)    
PATH = '/content/drive/MyDrive/NLP/temp_model/Final_finetuned_bert_with9k.pt'
torch.save(model.state_dict(), PATH)

The Total Accuracy for Epoch 0: 54.54002185493785
Training Loss Epoch: 1.6283260717345092
Training Accuracy Epoch: 54.54002185493785
The Total Accuracy for Epoch 1: 64.42596639803305
Training Loss Epoch: 1.5194337514500225
Training Accuracy Epoch: 64.42596639803305
The Total Accuracy for Epoch 2: 67.46004644174293
Training Loss Epoch: 1.4895430163956938
Training Accuracy Epoch: 67.46004644174293
The Total Accuracy for Epoch 3: 69.67798115011611
Training Loss Epoch: 1.4675364947462264
Training Accuracy Epoch: 69.67798115011611
The Total Accuracy for Epoch 4: 71.8037153394345
Training Loss Epoch: 1.446856635896838
Training Accuracy Epoch: 71.8037153394345


In [None]:
#stop_asdfg

In [None]:
#EPOCHS = 1
score_df  = pd.DataFrame()
for epoch in range(EPOCHS):
    train(epoch)
    PATH = '/content/drive/MyDrive/NLP/temp_model/Refined1_bert_with9k_e' + str(epoch)+'.pt'
    torch.save(model.state_dict(), PATH)
    print('Now we test')
    acc,f1, pre, recall = valid(model, testing_loader)
    score_df = score_df.append({'epoch':epoch,'acc':acc,'f1':f1,'pre':pre,'recall':recall},ignore_index=True)
    PATH1 = '/content/drive/MyDrive/NLP/temp_model/Refined1_bert_with9k_score.csv'
    score_df.to_csv(PATH1)
    print('score_saved')
    print("Accuracy on test data = %0.2f%%" % acc)

The Total Accuracy for Epoch 0: 52.82153071242583
Training Loss Epoch: 1.6496260023768037
Training Accuracy Epoch: 52.82153071242583
Now we test
precision_score of this epoch is : [0.5227859  0.45665399 0.75102041 0.75242131 0.52627841 0.72311396
 0.67318663]
recall score of this epoch is : [0.69287749 0.70771951 0.51026068 0.68864266 0.52665245 0.63183731
 0.45359692]
f1 score of this epoch is: [0.59593237 0.55511902 0.60766182 0.71912062 0.52646536 0.6744012
 0.54199475]
Validation Loss Epoch: 1.5631229196043652
Validation Accuracy Epoch: 60.167321154174495
score_saved
Accuracy on test data = 60.17%
The Total Accuracy for Epoch 1: 63.78964442737013
Training Loss Epoch: 1.52695078931164
Training Accuracy Epoch: 63.78964442737013
Now we test
precision_score of this epoch is : [0.59011329 0.49254367 0.71117358 0.76446991 0.68027888 0.69497523
 0.60326087]
recall score of this epoch is : [0.65299145 0.68120212 0.56128674 0.73905817 0.48542999 0.68863955
 0.60955519]
f1 score of this epoc

In [None]:
#print('This is the validation section to print the accuracy and see how it performs')
#print('Here we are leveraging on the dataloader crearted for the validation dataset, the approcah is using more of pytorch')

#acc = valid(model, testing_loader)
#print("Accuracy on test data = %0.2f%%" % acc)

<a id='section07'></a>
### Saving the Trained Model Artifacts for inference

This is the final step in the process of fine tuning the model. 

The model and its vocabulary are saved locally. These files are then used in the future to make inference on new inputs of news headlines.

Please remember that a trained neural network is only useful when used in actual inference after its training. 

In the lifecycle of an ML projects this is only half the job done. We will leave the inference of these models for some other day. 

In [None]:
# Saving the files for re-use

output_model_file = './content/drive/MyDrive/NLP/models/model_distilbert'
output_vocab_file = './content/drive/MyDrive/NLP/models/vocab_distilbert'

model_to_save = model
torch.save(model_to_save,'/content/drive/MyDrive/NLP/re_bert_model')
tokenizer.save_vocabulary(output_vocab_file)

print('All files saved')
print('This tutorial is completed')

IsADirectoryError: ignored

In [None]:
model.l1.save_pretrained('/content/drive/MyDrive/NLP/re_bert_model')

In [None]:
tokenizer.save_vocabulary('/content/drive/MyDrive/NLP/re_bert_model')

('/content/drive/MyDrive/NLP/re_bert_model/vocab.txt',)

In [None]:
PATH = '/content/drive/MyDrive/NLP/torch_reBERT1/Refined_bert_0.pt'
torch.save(model.state_dict(), PATH)
tokenizer.save_vocabulary('/content/drive/MyDrive/NLP/torch_reBERT1')

('/content/drive/MyDrive/NLP/torch_reBERT1/vocab.txt',)

In [None]:
PATH = '/content/drive/MyDrive/NLP/torch_reBERT1/Refined_bert_0.pt'

In [None]:
model.load_state_dict(torch.load(PATH))

model.eval()

DistillBERTClass(
  (l1): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0): TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
            (lin1): Linear(in_feat

In [None]:
def evaluate(model, testing_loader,tokenizer):
    model.eval()
    output_list = []
    target_list = []
    with torch.no_grad():
        for _, data in enumerate(testing_loader, 0):
            text = data['text']
            text = [[x] for x in text]
            inputs = tokenizer(
                text,
                None,
                add_special_tokens=True,
                max_length=400,
                padding = 'longest',
                return_token_type_ids=True,
                truncation=True,
                return_tensors='pt',
                is_split_into_words=True
                )
            # used return_tensors='pt' to return pytorch tensors
            ids = inputs['input_ids'].to(device, dtype = torch.long)
            #print(ids)
            mask = inputs['attention_mask'].to(device, dtype = torch.long)
            targets = data['target'].to(device, dtype = torch.long)
            target_list.append(targets)
            outputs = model(ids, mask).squeeze()
            big_val, big_idx = torch.max(outputs.data, dim=1)
            
            #print(big_idx,targets)

            output_list.append(big_idx)

    return output_list, target_list

In [None]:
o, t = evaluate(model,testing_loader,tokenizer)

In [None]:
out = tensor2numpy(o)
tar = tensor2numpy(t)



In [None]:
f1_score(out,tar,average='weighted')

0.638909470343268

In [None]:
#inputs['input_ids']
inputs['attention_mask']

tensor([[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

In [None]:
t[:10]

[tensor([1, 2], device='cuda:0'),
 tensor([1, 0], device='cuda:0'),
 tensor([4, 1], device='cuda:0'),
 tensor([4, 3], device='cuda:0'),
 tensor([0, 3], device='cuda:0'),
 tensor([0, 4], device='cuda:0'),
 tensor([3, 0], device='cuda:0'),
 tensor([1, 4], device='cuda:0'),
 tensor([4, 4], device='cuda:0'),
 tensor([3, 1], device='cuda:0')]

In [None]:
import numpy as np

In [None]:
def tensor2numpy(t):
    a = np.array([x.cpu().numpy() for x in t])
    a = a.flatten()
    return a

In [None]:
output = tensor2numpy(o)
target = tensor2numpy(t)

In [None]:
from sklearn.metrics import f1_score,precision_score, precision_recall_curve,recall_score, confusion_matrix

In [None]:
f1_score(target,output,labels=[x for x in range(6)],average=None)

array([0.16300129, 0.0209205 , 0.05273834, 0.25693894, 0.25635359,
       0.        ])

In [None]:
precision_score(target,output,labels=[x for x in range(6)],average=None)

array([0.18208092, 0.12820513, 0.25      , 0.19975339, 0.16872727,
       0.        ])

In [None]:
recall_score(target,output,labels=[x for x in range(6)],average=None)

array([0.14754098, 0.01138952, 0.02947846, 0.36      , 0.53333333,
       0.        ])

In [None]:
confusion_matrix(target,output,labels=[x for x in range(6)])

array([[ 63,   6,   9, 147, 202,   0],
       [ 42,   5,   6, 130, 256,   0],
       [ 78,   7,  13, 165, 178,   0],
       [ 76,   6,   8, 162, 198,   0],
       [ 60,  13,  15, 114, 232,   1],
       [ 27,   2,   1,  93, 309,   0]])

In [None]:
a = [0,0,0,0,0,0,0,0,0,0,0]
b = [0,1,2,3,4,5,0,1,3,4,5]


In [None]:
f1_score(b,a,labels=[x for x in range(6)],average=None)

array([0.30769231, 0.        , 0.        , 0.        , 0.        ,
       0.        ])

  _warn_prf(average, modifier, msg_start, len(result))


array([0.18181818, 0.        , 0.        , 0.        , 0.        ,
       0.        ])

In [None]:
df.shape

(13122, 2)

TypeError: ignored