## Model training

Here we are at the Stage where we are ready with the data and in the stage where we have to train the model but before that we need to make some modification to the Bert Large model as mentioned in the training and evaluation section in the paper. Bert large uncased model can be finetuned for specific tasks and in our case we are trying to implement a classification model with 64 classes.

The code in the notebook will be performing the following steps:

-   Load the dataset

-   Load BERT Large uncased model from Huggingface’s transformer library

-   Modify the architecture of the model

-   set up the metrics for evaluating the model

-   Train the model with three different datasets

This notebook uses hyper parameter tuning for largest models and the results produced will be having same hyperparameter for all three models.

In [None]:
from transformers import TrainingArguments, Trainer
import pickle
from transformers import (
    AutoModelForSequenceClassification,
    AutoModelForSeq2SeqLM,
    AutoConfig,
    BertModel,
)
from transformers import AutoTokenizer
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import torch
import torch.nn as nn
from transformers.modeling_outputs import SequenceClassifierOutput
from transformers import AdamW
import torch.optim as optim 

### Loading the data

In [None]:
with open('train_dataset_tokenized.pkl', 'rb') as file:
    train_dataset = pickle.load(file)

with open('val_data_tokenized.pkl', 'rb') as file:
    val_dataset = pickle.load(file)

with open('test_data_tokenized.pkl', 'rb') as file:
    test_dataset = pickle.load(file)

with open('train_dataset_full_tokenized.pkl', 'rb') as file:
    train_dataset_full = [pickle.load(file)]

with open('augmented_train_dataset_tokenized.pkl', 'rb') as file:
    train_dataset_augmented = pickle.load(file)

with open('function_pickle.pkl', 'rb') as f:
    create_training_arguments_and_optimizer = pickle.load(f)

### Setting up the training arguments

In [None]:
learning_rates = [5e-5, 4e-5, 3e-5, 2e-5]

pre_trained_BERTmodel='bert-large-uncased'
BERT_tokenizer=AutoTokenizer.from_pretrained(pre_trained_BERTmodel)

### Modifying Bert for our classification Task

In [None]:
class BertModelWithCustomLossFunction(nn.Module):
    def __init__(self):
        super(BertModelWithCustomLossFunction, self).__init__()
        self.num_labels = 64
        self.bert = BertModel.from_pretrained(
            pre_trained_BERTmodel, num_labels=self.num_labels
        )
        self.dropout = nn.Dropout(0.1)
        self.classifier = nn.Linear(1024, self.num_labels)

    def forward(self, input_ids, attention_mask, labels=None):
        outputs = self.bert(
            input_ids=input_ids,
            attention_mask=attention_mask,
        )

        output = self.dropout(outputs.pooler_output)
        logits = self.classifier(output)

        loss = None
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits.view(-1, self.num_labels), labels)

        return SequenceClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

### Create train_model function

In [None]:
def train_model(train_data, args, val_dataset, test_dataset):
    BERT_model = BertModelWithCustomLossFunction()
    optimizer = AdamW(BERT_model.parameters(), lr=args.learning_rate, weight_decay=args.weight_decay)
    trainer = Trainer(
        model=BERT_model,
        args=args,
        train_dataset=train_data,
        eval_dataset=val_dataset,
        tokenizer=BERT_tokenizer,
        compute_metrics=compute_metrics,
        optimizers=(optimizer,),
    )
    trainer.train()
    evaluation_metrics = trainer.predict(test_dataset)
    accuracy = evaluation_metrics.metrics['test_accuracy']
    torch.cuda.empty_cache()
    return accuracy

### Setting up metrics for accuracy, precision, recall and f1

In [None]:
def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    accuracy = accuracy_score(labels, preds)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='weighted')

    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1': f1
    }

### Training the model

In [None]:
import warnings
warnings.filterwarnings("ignore")

#### Training Full dataset Model

In [None]:
for train_data in train_dataset_full:
    best_accuracy = 0
    best_lr = learning_rates[0]
    for lr in learning_rates:
        args = create_training_arguments(lr)
        accuracy = train_model(train_data, args, val_dataset, test_dataset)
        if accuracy>best_accuracy:
          best_lr = lr
          best_accuracy = max(accuracy, best_accuracy)
print(f"Best Accuracy:{best_accuracy}\n Best Learning Rate: {best_lr}")

#### Training full few shot dataset model

In [None]:
best_accuracy = 0
for train_data in train_dataset:
    for lr in [best_lr]:
        args, optimizer = create_training_arguments(lr)
        accuracy = train_model(train_data, args, val_dataset, test_dataset)
        if accuracy>best_accuracy:
          best_lr = lr
          best_accuracy = max(accuracy, best_accuracy)
print(f"Best Accuracy:{best_accuracy}")

#### Training Model on Full few shot + Augmented dataset

In [None]:
best_accuracy = 0
for train_data in train_dataset_augmented:
    for lr in [best_lr]::
        args = create_training_arguments(lr)
        accuracy = train_model(train_data, args, val_dataset, test_dataset)
        if accuracy>best_accuracy:
          best_lr = lr
          best_accuracy = max(accuracy, best_accuracy)
print(f"Best Accuracy:{best_accuracy}")

## Results

Here we are in the section where we have the results. Time for you to match the results with what is mentioned in the paper and see that where your choices had led you ? were you lucky enough to get the exact same result ?

If your results were different, How close you were to the original result ?