# Sentiment Experiment


In this notebook, we use language models to predict the sentiment of a given movie review. The dataset is sampled from the [IMDB dataset of 50k movie reviews](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews). The sentences are sampled to a smaller set to help with quicker computation on Colab. The data contains a review and an associated _positive_ or _negative_ sentiment label. The training, validation, and test data is used to fine-tune the models, select the best model while training, and measure performance, respectively. To perform this task, we use a pre-trained DistilBERT model, a BERT-based language model that is 40% smaller, 60% faster, but retains 97% of BERT's performance, from Hugging Face, as follows:

1. Load a pre-trained DistilBERT model and its tokenizer using Hugging Face's `AutoTokenizer` and `AutoModelForSequenceClassification` classes, adapting the model to the task of sentiment analysis.
2. Tokenize the reviews and convert the labels into numerical classes by using the `tokenizer.encode_plus` method, which takes in a review and returns a dictionary that contains the tokenized review as a Tensor; convert labels to a numerical class using the label_dict dictionary.
3. Set the layers of the model to be trained based on the `training_type` by iterating over the named parameters of the model and setting `requires_grad=False` for the layers that are not to be trained. The classifier head on top of the final DistilBERT layer is always trained, so this layer is always set to be trainable.
4. Train the model, updating parameters by backpropagating the loss.
5. Validate and test the model by computing the Precision, Recall, and F1.

The training types are:

- frozen_embeddings: All layers are frozen.
- top_2_training: Only the last two layers are trained.
- top_4_training: Only the last four layers are trained.
- all_training: All layers are trained.


## Setup


In [None]:
!pip install transformers

import torch
import numpy as np
import pandas as pd
from sklearn.metrics import precision_score, recall_score, f1_score
from torch.utils.data import Dataset, TensorDataset, DataLoader
from torch.nn.utils.rnn import pad_sequence
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from tqdm import tqdm

try:
    from google.colab import drive
    drive.mount("/content/drive")
    %cd "drive/MyDrive/M4L/distilbert"
except:
    pass

torch.manual_seed(42)
np.random.seed(42)

BATCH_SIZE = 16
EPOCHS = 3
SAVE_PATH = "models/DistilBERT"

train_data = pd.read_csv("data/train_data.csv")
val_data = pd.read_csv("data/val_data.csv")
test_data = pd.read_csv("data/test_data.csv")

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## Loading


In [None]:
class DistillBERT:
    def __init__(
        self, model_name="distilbert-base-uncased", num_classes=2
    ):  # num_classes = 2 for binary classification
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(
            model_name, num_labels=num_classes
        )

    def get_tokenizer_and_model(self):
        return self.model, self.tokenizer

## Tokenization


In [None]:
class DatasetLoader(Dataset):
    def __init__(self, data, tokenizer):
        self.data = data
        self.tokenizer = tokenizer

    def tokenize_data(self):
        print("Processing data..")
        tokens = []
        labels = []
        label_dict = {"positive": 1, "negative": 0}

        review_list = self.data["review"].to_list()
        label_list = self.data["sentiment"].to_list()

        for review, label in tqdm(zip(review_list, label_list), total=len(review_list)):
            # tokenize the review
            encoded_review = self.tokenizer.encode_plus(
                review,
                max_length=512,  # limit the max token length to 512
                padding="max_length",
                truncation=True,
                return_tensors="pt",
            )
            tokens.append(encoded_review["input_ids"].squeeze())
            # convert the labels to the corresponding numerical classes
            labels.append(label_dict[label])

        tokens = pad_sequence(tokens, batch_first=True)
        labels = torch.tensor(labels).to(
            "cuda:0" if torch.cuda.is_available() else "cpu"
        )
        dataset = TensorDataset(tokens, labels)
        return dataset

    def get_data_loaders(self, batch_size=32, shuffle=True):
        processed_dataset = self.tokenize_data()

        data_loader = DataLoader(
            processed_dataset, shuffle=shuffle, batch_size=batch_size
        )

        return data_loader

## Training


In [None]:
class Trainer:
    def __init__(self, options):
        self.device = options["device"]
        self.train_data = options["train_data"]
        self.val_data = options["val_data"]
        self.batch_size = options["batch_size"]
        self.epochs = options["epochs"]
        self.save_path = options["save_path"]
        self.training_type = options["training_type"]
        transformer = DistillBERT()
        self.model, self.tokenizer = transformer.get_tokenizer_and_model()
        self.model.to(self.device)

    def get_performance_metrics(self, preds, labels):
        pred_flat = np.argmax(preds, axis=1).flatten()
        labels_flat = labels.flatten()
        precision = precision_score(labels_flat, pred_flat, zero_division=0)
        recall = recall_score(labels_flat, pred_flat, zero_division=0)
        f1 = f1_score(labels_flat, pred_flat, zero_division=0)
        return precision, recall, f1

    def set_training_parameters(self):
        def set_requires_grad(layer_threshold=None):
            for name, layer in self.model.named_parameters():
                # freeze layers as per the layer_threshold
                if "classifier" not in name:
                    if layer_threshold is not None:
                        layer_number = name.split(".")[2]
                        if layer_number.isdigit():
                            layer.requires_grad = int(layer_number) >= layer_threshold
                        else:
                            layer.requires_grad = False
                    else:
                        layer.requires_grad = False

        # set the layers to be trained based on the training_type
        if self.training_type == "frozen_embeddings":
            set_requires_grad()
        elif self.training_type == "top_2_training":
            set_requires_grad(4)
        elif self.training_type == "top_4_training":
            set_requires_grad(2)
        elif self.training_type == "all_training":
            for layer in self.model.parameters():
                layer.requires_grad = True
        else:
            raise ValueError(f"Invalid training_type: {self.training_type}")

    def train(self, data_loader, optimizer):
        self.model.train()
        total_recall = 0
        total_precision = 0
        total_f1 = 0
        total_loss = 0

        for batch_idx, (reviews, labels) in enumerate(tqdm(data_loader)):
            self.model.zero_grad()
            reviews = reviews.to(self.device)
            labels = labels.to(self.device)

            # get outputs from the model
            outputs = self.model(reviews, labels=labels)
            loss = outputs.loss

            # backpropagate the loss
            loss.backward()
            optimizer.step()

            # calculate metrics
            logits = outputs.logits.detach().cpu().numpy()
            label_ids = labels.to("cpu").numpy()
            precision, recall, f1 = self.get_performance_metrics(logits, label_ids)

            total_precision += precision
            total_recall += recall
            total_f1 += f1
            total_loss += loss.item()

        precision = total_precision / len(data_loader)
        recall = total_recall / len(data_loader)
        f1 = total_f1 / len(data_loader)
        loss = total_loss / len(data_loader)

        return precision, recall, f1, loss

    def eval(self, data_loader):
        self.model.eval()
        total_recall = 0
        total_precision = 0
        total_f1 = 0
        total_loss = 0

        with torch.no_grad():
            for reviews, labels in tqdm(data_loader):
                reviews = reviews.to(self.device)
                labels = labels.to(self.device)

                # get outputs from the model
                outputs = self.model(reviews, labels=labels)
                loss = outputs.loss

                # don't backpropagate the loss
                logits = outputs.logits.detach().cpu().numpy()
                label_ids = labels.to("cpu").numpy()

                # calculate metrics
                precision, recall, f1 = self.get_performance_metrics(logits, label_ids)

                total_precision += precision
                total_recall += recall
                total_f1 += f1
                total_loss += loss.item()

        precision = total_precision / len(data_loader)
        recall = total_recall / len(data_loader)
        f1 = total_f1 / len(data_loader)
        loss = total_loss / len(data_loader)

        return precision, recall, f1, loss

    def save_transformer(self):
        self.model.save_pretrained(self.save_path)
        self.tokenizer.save_pretrained(self.save_path)

    def execute(self):
        last_best = 0
        train_dataset = DatasetLoader(self.train_data, self.tokenizer)
        train_data_loader = train_dataset.get_data_loaders(self.batch_size)
        val_dataset = DatasetLoader(self.val_data, self.tokenizer)
        val_data_loader = val_dataset.get_data_loaders(self.batch_size)
        optimizer = torch.optim.AdamW(self.model.parameters(), lr=3e-5, eps=1e-8)
        self.set_training_parameters()
        for epoch_i in range(0, self.epochs):
            train_precision, train_recall, train_f1, train_loss = self.train(
                train_data_loader, optimizer
            )
            print(
                f"Epoch {epoch_i + 1}: train_loss: {train_loss:.4f} train_precision: {train_precision:.4f} train_recall: {train_recall:.4f} train_f1: {train_f1:.4f}"
            )
            val_precision, val_recall, val_f1, val_loss = self.eval(val_data_loader)
            print(
                f"Epoch {epoch_i + 1}: val_loss: {val_loss:.4f} val_precision: {val_precision:.4f} val_recall: {val_recall:.4f} val_f1: {val_f1:.4f}"
            )

            if val_f1 > last_best:
                print("Saving model..")
                self.save_transformer()
                last_best = val_f1
                print("Model saved.")

### Frozen Embeddings


In [None]:
options = {}
options["batch_size"] = BATCH_SIZE
options["device"] = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
options["train_data"] = train_data
options["val_data"] = val_data
options["save_path"] = SAVE_PATH + "_frozen_embeddings"
options["epochs"] = EPOCHS
options["training_type"] = "frozen_embeddings"
trainer = Trainer(options)
trainer.execute()

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier

Processing data..


100%|██████████| 5130/5130 [00:14<00:00, 363.78it/s]


Processing data..


100%|██████████| 270/270 [00:01<00:00, 167.94it/s]
100%|██████████| 321/321 [01:25<00:00,  3.75it/s]


Epoch 1: train_loss: 0.6780 train_precision: 0.5945 train_recall: 0.6750 train_f1: 0.5800


100%|██████████| 17/17 [00:04<00:00,  3.88it/s]


Epoch 1: val_loss: 0.6616 val_precision: 0.7134 val_recall: 0.4446 val_f1: 0.5381
Saving model..
Model saved.


100%|██████████| 321/321 [01:29<00:00,  3.60it/s]


Epoch 2: train_loss: 0.6436 train_precision: 0.7010 train_recall: 0.6986 train_f1: 0.6741


100%|██████████| 17/17 [00:04<00:00,  3.79it/s]


Epoch 2: val_loss: 0.6240 val_precision: 0.7708 val_recall: 0.6055 val_f1: 0.6690
Saving model..
Model saved.


100%|██████████| 321/321 [01:29<00:00,  3.57it/s]


Epoch 3: train_loss: 0.6024 train_precision: 0.7286 train_recall: 0.7228 train_f1: 0.7023


100%|██████████| 17/17 [00:04<00:00,  3.79it/s]


Epoch 3: val_loss: 0.5843 val_precision: 0.7337 val_recall: 0.7996 val_f1: 0.7524
Saving model..
Model saved.


### Top 2 Layers


In [None]:
options = {}
options["batch_size"] = BATCH_SIZE
options["device"] = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
options["train_data"] = train_data
options["val_data"] = val_data
options["save_path"] = SAVE_PATH + "_top_2_training"
options["epochs"] = EPOCHS
options["training_type"] = "top_2_training"
trainer = Trainer(options)
trainer.execute()

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier

Processing data..


100%|██████████| 5130/5130 [00:09<00:00, 546.97it/s]


Processing data..


100%|██████████| 270/270 [00:00<00:00, 785.98it/s]
100%|██████████| 321/321 [01:30<00:00,  3.57it/s]


Epoch 1: train_loss: 0.6793 train_precision: 0.6029 train_recall: 0.6540 train_f1: 0.5854


100%|██████████| 17/17 [00:04<00:00,  3.81it/s]


Epoch 1: val_loss: 0.6582 val_precision: 0.6471 val_recall: 0.7163 val_f1: 0.6726
Saving model..
Model saved.


100%|██████████| 321/321 [01:29<00:00,  3.57it/s]


Epoch 2: train_loss: 0.6397 train_precision: 0.7000 train_recall: 0.7267 train_f1: 0.6861


100%|██████████| 17/17 [00:04<00:00,  3.76it/s]


Epoch 2: val_loss: 0.6200 val_precision: 0.7257 val_recall: 0.7208 val_f1: 0.7083
Saving model..
Model saved.


100%|██████████| 321/321 [01:29<00:00,  3.57it/s]


Epoch 3: train_loss: 0.6014 train_precision: 0.7329 train_recall: 0.7329 train_f1: 0.7116


100%|██████████| 17/17 [00:04<00:00,  3.78it/s]


Epoch 3: val_loss: 0.5842 val_precision: 0.7117 val_recall: 0.8149 val_f1: 0.7556
Saving model..
Model saved.


### Top 4 Layers


In [None]:
options = {}
options["batch_size"] = BATCH_SIZE
options["device"] = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
options["train_data"] = train_data
options["val_data"] = val_data
options["save_path"] = SAVE_PATH + "_top_4_training"
options["epochs"] = EPOCHS
options["training_type"] = "top_4_training"
trainer = Trainer(options)
trainer.execute()

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier

Processing data..


100%|██████████| 5130/5130 [00:09<00:00, 549.94it/s]


Processing data..


100%|██████████| 270/270 [00:00<00:00, 802.19it/s]
100%|██████████| 321/321 [01:30<00:00,  3.56it/s]


Epoch 1: train_loss: 0.6788 train_precision: 0.6080 train_recall: 0.6644 train_f1: 0.5997


100%|██████████| 17/17 [00:04<00:00,  3.81it/s]


Epoch 1: val_loss: 0.6587 val_precision: 0.6181 val_recall: 0.9123 val_f1: 0.7295
Saving model..
Model saved.


100%|██████████| 321/321 [01:30<00:00,  3.57it/s]


Epoch 2: train_loss: 0.6422 train_precision: 0.6790 train_recall: 0.7100 train_f1: 0.6575


100%|██████████| 17/17 [00:04<00:00,  3.79it/s]


Epoch 2: val_loss: 0.6210 val_precision: 0.7169 val_recall: 0.8036 val_f1: 0.7498
Saving model..
Model saved.


100%|██████████| 321/321 [01:30<00:00,  3.56it/s]


Epoch 3: train_loss: 0.6043 train_precision: 0.7282 train_recall: 0.7324 train_f1: 0.7083


100%|██████████| 17/17 [00:04<00:00,  3.76it/s]

Epoch 3: val_loss: 0.5873 val_precision: 0.7543 val_recall: 0.6898 val_f1: 0.7105





### All Layers


In [None]:
options = {}
options["batch_size"] = BATCH_SIZE
options["device"] = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
options["train_data"] = train_data
options["val_data"] = val_data
options["save_path"] = SAVE_PATH + "_all_training"
options["epochs"] = EPOCHS
options["training_type"] = "all_training"
trainer = Trainer(options)
trainer.execute()

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier

Processing data..


100%|██████████| 5130/5130 [00:06<00:00, 780.28it/s]


Processing data..


100%|██████████| 270/270 [00:00<00:00, 758.50it/s]
100%|██████████| 321/321 [04:09<00:00,  1.29it/s]


Epoch 1: train_loss: 0.3464 train_precision: 0.8441 train_recall: 0.8614 train_f1: 0.8362


100%|██████████| 17/17 [00:04<00:00,  3.79it/s]


Epoch 1: val_loss: 0.2042 val_precision: 0.9003 val_recall: 0.9356 val_f1: 0.9138
Saving model..
Model saved.


100%|██████████| 321/321 [04:08<00:00,  1.29it/s]


Epoch 2: train_loss: 0.1723 train_precision: 0.9364 train_recall: 0.9322 train_f1: 0.9282


100%|██████████| 17/17 [00:04<00:00,  3.78it/s]


Epoch 2: val_loss: 0.2358 val_precision: 0.9245 val_recall: 0.8968 val_f1: 0.9018


100%|██████████| 321/321 [04:08<00:00,  1.29it/s]


Epoch 3: train_loss: 0.0714 train_precision: 0.9768 train_recall: 0.9813 train_f1: 0.9774


100%|██████████| 17/17 [00:04<00:00,  3.77it/s]


Epoch 3: val_loss: 0.2174 val_precision: 0.9181 val_recall: 0.9480 val_f1: 0.9272
Saving model..
Model saved.


## Testing


In [None]:
class Tester:
    def __init__(self, options):
        self.save_path = options["save_path"]
        self.device = options["device"]
        self.test_data = options["test_data"]
        self.batch_size = options["batch_size"]
        transformer = DistillBERT(self.save_path)
        self.model, self.tokenizer = transformer.get_tokenizer_and_model()
        self.model.to(self.device)

    def get_performance_metrics(self, preds, labels):
        pred_flat = np.argmax(preds, axis=1).flatten()
        labels_flat = labels.flatten()
        precision = precision_score(labels_flat, pred_flat, zero_division=0)
        recall = recall_score(labels_flat, pred_flat, zero_division=0)
        f1 = f1_score(labels_flat, pred_flat, zero_division=0)
        return precision, recall, f1

    def test(self, data_loader):
        self.model.eval()
        total_recall = 0
        total_precision = 0
        total_f1 = 0
        total_loss = 0

        with torch.no_grad():
            for reviews, labels in tqdm(data_loader):
                reviews = reviews.to(self.device)
                labels = labels.to(self.device)

                # Forward pass
                output = self.model(reviews)
                logits = output.logits
                loss = torch.nn.functional.cross_entropy(logits, labels)
                total_loss += loss.item()

                # Calculate metrics
                logits = logits.detach().cpu().numpy()
                labels = labels.to("cpu").numpy()
                precision, recall, f1 = self.get_performance_metrics(logits, labels)

                total_precision += precision
                total_recall += recall
                total_f1 += f1

        precision = total_precision / len(data_loader)
        recall = total_recall / len(data_loader)
        f1 = total_f1 / len(data_loader)
        loss = total_loss / len(data_loader)

        return precision, recall, f1, loss

    def execute(self):
        test_dataset = DatasetLoader(self.test_data, self.tokenizer)
        test_data_loader = test_dataset.get_data_loaders(self.batch_size)

        test_precision, test_recall, test_f1, test_loss = self.test(test_data_loader)

        print()
        print(
            f"test_loss: {test_loss:.4f} test_precision: {test_precision:.4f} test_recall: {test_recall:.4f} test_f1: {test_f1:.4f}"
        )

### Frozen Embeddings


In [None]:
options = {}
options["batch_size"] = BATCH_SIZE
options["device"] = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
options["test_data"] = test_data
options["save_path"] = SAVE_PATH + "_frozen_embeddings"
tester = Tester(options)
tester.execute()

Processing data..


100%|██████████| 600/600 [00:00<00:00, 856.25it/s]
100%|██████████| 38/38 [00:09<00:00,  4.18it/s]


test_loss: 0.5830 test_precision: 0.7100 test_recall: 0.7734 test_f1: 0.7290





### Top 2 Layers


In [None]:
options = {}
options["batch_size"] = BATCH_SIZE
options["device"] = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
options["test_data"] = test_data
options["save_path"] = SAVE_PATH + "_top_2_training"
tester = Tester(options)
tester.execute()

Processing data..


100%|██████████| 600/600 [00:00<00:00, 851.76it/s]
100%|██████████| 38/38 [00:09<00:00,  4.18it/s]


test_loss: 0.5865 test_precision: 0.6777 test_recall: 0.8467 test_f1: 0.7405





### Top 4 Layers


In [None]:
options = {}
options["batch_size"] = BATCH_SIZE
options["device"] = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
options["test_data"] = test_data
options["save_path"] = SAVE_PATH + "_top_4_training"
tester = Tester(options)
tester.execute()

Processing data..


100%|██████████| 600/600 [00:00<00:00, 697.47it/s]
100%|██████████| 38/38 [00:09<00:00,  4.14it/s]


test_loss: 0.6227 test_precision: 0.6865 test_recall: 0.7750 test_f1: 0.7139





### All Layers


In [None]:
options = {}
options["batch_size"] = BATCH_SIZE
options["device"] = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
options["test_data"] = test_data
options["save_path"] = SAVE_PATH + "_all_training"
tester = Tester(options)
tester.execute()

Processing data..


100%|██████████| 600/600 [00:01<00:00, 500.96it/s]
100%|██████████| 38/38 [00:09<00:00,  4.08it/s]


test_loss: 0.3341 test_precision: 0.8590 test_recall: 0.8894 test_f1: 0.8662





## Analysis


| Model             | Phase      | Precision | Recall | F1 Score |
| ----------------- | ---------- | --------- | ------ | -------- |
| Frozen Embeddings | Testing    | 0.7100    | 0.7734 | 0.7290   |
| Top 2 Training    | Testing    | 0.6777    | 0.8467 | 0.7405   |
| Top 4 Training    | Testing    | 0.6865    | 0.7750 | 0.7139   |
| All Training      | Testing    | 0.8590    | 0.8894 | 0.8662   |
| Frozen Embeddings | Validation | 0.7337    | 0.7996 | 0.7524   |
| Top 2 Training    | Validation | 0.7117    | 0.8149 | 0.7556   |
| Top 4 Training    | Validation | 0.7543    | 0.6898 | 0.7105   |
| All Training      | Validation | 0.9181    | 0.9480 | 0.9272   |

Freezing embeddings can speed up training and reduce overfitting since the weights of frozen layers are prevented from being updated during the training process. This can be useful when the goal is to preserve the pre-trained knowledge in those layers, especially when working with smaller datasets, but it hinders the model's ability to generalize to new data. As the results show, the Top 2 Training model has a higher F1 score than the Top 4 Training model, due to the latter having a lower recall. Even so, it doesn't have a higher precision and it's important to note that the difference in performance between the two models is relatively small. The performance of models could be affected by factors like random weight initialization, number of epochs, and batch size. Training all layers allows the model to adapt the best; however, it may require more computational resources and have an increased risk of overfitting, especially with smaller datasets.


## Credits

This code was written with the help of Dhruv Verma.
