In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch

#set device to cuda, if not available check mps else use cpu
if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")
    if torch.cuda.is_available():
        device = torch.device('cuda')
    else:
        device = torch.device('cpu')
else:
    device = torch.device('mps')

    
#print device type
print("Current Device", device)
torch.manual_seed(0)
KAGGLE = 1



MPS not available because the current PyTorch install was not built with MPS enabled.
Current Device cuda


# Summary of Different Model Configurations:
General: BERT_Base (Both cased and uncased) is used as the base pre-trained model for all configurations. 
Batch Size is 32 unless otherwise specified.
lr is 1e-4 unless otherwise specified. 

Note: 
- In all configurations, the classification head (added on top of the CLS token embedding from BERT) is unfrozen
- I have experimented with cased vs uncased BERT, various levels of unfreezing of layers and even gradual unfreezing with adjusted learning rate (lowered to 1e-5)
- Accuracy refers to the number of individual labels correctly predicted/total predictions (14 \* num_examples)

## Experimentation
### Frozen Parameters
First I explored cased vs uncased models by freezing the bert model entirely. 
I concluded that cased vs uncased did not make much difference in accuracy but did make a 5-8 minute difference (per epoch) in training time. So I chose uncased (less training time).

### One Encoder Layer Unfrozen (Adjusted Learning Rate)
I trained the model for 2 epochs in this case:
- Epoch 1: Loss = 0.43, Accuracy = 83.8
- Epoch 2: Loss = 0.37, Accuracy = 84.4, lr = 2e-5

This improved accuracy so over frozen so I decided to unfreeze some encoder layers. 

## Final Configurations \[Embeddings Finetuned\]
After for cased vs uncased, level of unfreezing beneficial I decided the final 
training type: *Encoder Layer(s) and Embeddings Unfrozen*
UNCASED, Batch size = 16

- Epoch 1: Loss = 0.2703, Accuracy = 87.75, 1 layer unfrozen
- Epoch 2: lr = 4e-15, 2 layers unfrozen \[Did not save Accuracy data\]
- (Final) Epoch 3: Loss =  0.2167, Accuracy = 88., lr = 2e-15, 2 layers unfrozen

Final Test Accuracy: 89.5%
Final Validation Accuracy: 89.8%

The aggregate metrics can be found below:
```
TEST DATA
Label: A | Precision: 0.8407833120476799 | Recall: 0.836155800169348 | F1-Score: 0.8384631713012098
Label: B | Precision: 0.9690526315789474 | Recall: 0.9911714039621017 | F1-Score: 0.9799872258888653
Label: C | Precision: 0.8568953568953569 | Recall: 0.9435330026707364 | F1-Score: 0.8981296531686943
Label: D | Precision: 0.9467146126185028 | Recall: 0.9243536546441111 | F1-Score: 0.9354005167958656
Label: E | Precision: 0.8542356838618078 | Recall: 0.9228016359918201 | F1-Score: 0.8871958712214303
Label: F | Precision: 0.8127250900360145 | Recall: 0.8166465621230398 | F1-Score: 0.8146811070998796
Label: G | Precision: 0.8505970563732297 | Recall: 0.9043401240035429 | F1-Score: 0.8766456783056669
Label: H | Precision: 0.6428571428571429 | Recall: 0.11707317073170732 | F1-Score: 0.19807427785419532
Label: I | Precision: 0.753393665158371 | Recall: 0.6294896030245747 | F1-Score: 0.685890834191555
Label: J | Precision: 0.7595628415300546 | Recall: 0.556 | F1-Score: 0.6420323325635104
Label: L | Precision: 0.8072289156626506 | Recall: 0.44966442953020136 | F1-Score: 0.5775862068965518
Label: M | Precision: 0.8943694741740345 | Recall: 0.9218225419664269 | F1-Score: 0.9078885214926783
Label: N | Precision: 0.834759117514633 | Recall: 0.8362652232746955 | F1-Score: 0.8355114916629112
Label: Z | Precision: 0.8485294117647059 | Recall: 0.7513020833333334 | F1-Score: 0.7969613259668509
Macro Precision: 0.8336931651480807 | Macro Recall: 0.7571870882446886 | Macro F1-Score: 0.7767463010292762
Micro Precision: 0.8782085513902239 | Micro Recall: 0.8702155430909796 | Micro F1-Score: 0.8741937770217592


VALIDATION DATA:
Label: A | Precision: 0.8407833120476799 | Recall: 0.836155800169348 | F1-Score: 0.8384631713012098
Label: B | Precision: 0.9690526315789474 | Recall: 0.9911714039621017 | F1-Score: 0.9799872258888653
Label: C | Precision: 0.8568953568953569 | Recall: 0.9435330026707364 | F1-Score: 0.8981296531686943
Label: D | Precision: 0.9467146126185028 | Recall: 0.9243536546441111 | F1-Score: 0.9354005167958656
Label: E | Precision: 0.8542356838618078 | Recall: 0.9228016359918201 | F1-Score: 0.8871958712214303
Label: F | Precision: 0.8127250900360145 | Recall: 0.8166465621230398 | F1-Score: 0.8146811070998796
Label: G | Precision: 0.8505970563732297 | Recall: 0.9043401240035429 | F1-Score: 0.8766456783056669
Label: H | Precision: 0.6428571428571429 | Recall: 0.11707317073170732 | F1-Score: 0.19807427785419532
Label: I | Precision: 0.753393665158371 | Recall: 0.6294896030245747 | F1-Score: 0.685890834191555
Label: J | Precision: 0.7595628415300546 | Recall: 0.556 | F1-Score: 0.6420323325635104
Label: L | Precision: 0.8072289156626506 | Recall: 0.44966442953020136 | F1-Score: 0.5775862068965518
Label: M | Precision: 0.8943694741740345 | Recall: 0.9218225419664269 | F1-Score: 0.9078885214926783
Label: N | Precision: 0.834759117514633 | Recall: 0.8362652232746955 | F1-Score: 0.8355114916629112
Label: Z | Precision: 0.8485294117647059 | Recall: 0.7513020833333334 | F1-Score: 0.7969613259668509
Macro Precision: 0.8336931651480807 | Macro Recall: 0.7571870882446886 | Macro F1-Score: 0.7767463010292762
Micro Precision: 0.8782085513902239 | Micro Recall: 0.8702155430909796 | Micro F1-Score: 0.8741937770217592
```
### Adding Dropout Layer
As a final experiment, I added dropout layers in the classification head and shortened it.

Results are as follows:
```
TESTING DATA:
Label: A | Precision: 0.8210930828351836 | Recall: 0.8207426376440461 | F1-Score: 0.8209178228388474
Label: B | Precision: 0.9713916971391697 | Recall: 0.9764324324324324 | F1-Score: 0.9739055423765366
Label: C | Precision: 0.8890631125049 | Recall: 0.8692985818321196 | F1-Score: 0.8790697674418604
Label: D | Precision: 0.9225642653125992 | Recall: 0.9314322332585709 | F1-Score: 0.9269770408163265
Label: E | Precision: 0.8126901347240331 | Recall: 0.9572562068082928 | F1-Score: 0.8790692208250087
Label: F | Precision: 0.8943089430894309 | Recall: 0.6307339449541285 | F1-Score: 0.7397444519166106
Label: G | Precision: 0.8169051404345522 | Recall: 0.9099763872491146 | F1-Score: 0.8609327003630272
Label: H | Precision: 0.5253164556962026 | Recall: 0.13651315789473684 | F1-Score: 0.21671018276762402
Label: I | Precision: 0.7130620985010707 | Recall: 0.6098901098901099 | F1-Score: 0.6574531095755183
Label: J | Precision: 0.7045454545454546 | Recall: 0.5646630236794171 | F1-Score: 0.6268958543983822
Label: L | Precision: 0.6530303030303031 | Recall: 0.5693527080581242 | F1-Score: 0.6083274523641496
Label: M | Precision: 0.8991470145509283 | Recall: 0.847682119205298 | F1-Score: 0.8726564402240078
Label: N | Precision: 0.8277010947168015 | Recall: 0.7763392857142857 | F1-Score: 0.8011978806726561
Label: Z | Precision: 0.8527272727272728 | Recall: 0.6051612903225806 | F1-Score: 0.7079245283018867
Macro Precision: 0.8073961478434216 | Macro Recall: 0.7289624370673756 | Macro F1-Score: 0.7551272853487457
Micro Precision: 0.8606566142658539 | Micro Recall: 0.8485274478105012 | Micro F1-Score: 0.8545489939299555
Test Accuracy 0.8825857142857143


VALIDATION DATA
Label: A | Precision: 0.8199152542372882 | Recall: 0.8255119453924915 | F1-Score: 0.8227040816326532
Label: B | Precision: 0.9751712328767124 | Recall: 0.9735042735042735 | F1-Score: 0.9743370402053038
Label: C | Precision: 0.9089093088294047 | Recall: 0.8627227910504361 | F1-Score: 0.8852140077821012
Label: D | Precision: 0.9135802469135802 | Recall: 0.925 | F1-Score: 0.9192546583850932
Label: E | Precision: 0.817765168048887 | Recall: 0.9563552833078101 | F1-Score: 0.8816470588235295
Label: F | Precision: 0.8991869918699187 | Recall: 0.6415313225058005 | F1-Score: 0.7488151658767772
Label: G | Precision: 0.8150470219435737 | Recall: 0.9217134416543574 | F1-Score: 0.8651046721197837
Label: H | Precision: 0.5297619047619048 | Recall: 0.15724381625441697 | F1-Score: 0.24250681198910082
Label: I | Precision: 0.7215777262180975 | Recall: 0.5791433891992551 | F1-Score: 0.6425619834710744
Label: J | Precision: 0.703962703962704 | Recall: 0.5666041275797373 | F1-Score: 0.6278586278586279
Label: L | Precision: 0.65625 | Recall: 0.5976714100905562 | F1-Score: 0.6255924170616113
Label: M | Precision: 0.8966550174737893 | Recall: 0.8499763369616659 | F1-Score: 0.8726919339164237
Label: N | Precision: 0.8230297310051912 | Recall: 0.7625710537822474 | F1-Score: 0.7916477530640037
Label: Z | Precision: 0.8250428816466552 | Recall: 0.6222509702457956 | F1-Score: 0.7094395280235988
Macro Precision: 0.8075610849848361 | Macro Recall: 0.7315571543949175 | Macro F1-Score: 0.7578125528721201
Micro Precision: 0.8615215229436 | Micro Recall: 0.8502874369040943 | Micro F1-Score: 0.8558676169642228
Validation Accuracy 0.8832857142857143
```
# Further Direction
- Domain specific vocabulary can be added to the models (tokenizer.add_token() for some unique words in corpus that are not present in BERT_Base vocabulary)

- Different versions of BERT (BERT_Large, roBERTa, ALBERT, etc.) or other models can be experimented with. 




In [2]:

dataset_path = ['data/Multi-Label Text Classification Dataset.csv', '/kaggle/input/multi-label-text-cls/Multi-Label Text Classification Dataset.csv'][KAGGLE]
interrupt_save_folder = ['interrupt', '/kaggle/working'][KAGGLE]
save_folder = ['saved', '/kaggle/working'][KAGGLE]

In [3]:
data_df = pd.read_csv(dataset_path)
labels = "A,B,C,D,E,F,G,H,I,J,L,M,N,Z".split(',')
num_labels = len(labels)
print(labels, num_labels)

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'L', 'M', 'N', 'Z'] 14


### Testing

In [4]:
from transformers import BertTokenizer, BertModel

# tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# bert_model = BertModel.from_pretrained('bert-base-uncased').to(device)

In [5]:
# tokens = tokenizer.encode("Hello, my [MASK] is John.")
# mask_pos = tokens.index(tokenizer.mask_token_id)
# print(mask_pos)
# out = bert_model(torch.tensor([tokens]).to(device))
# print(out.last_hidden_state.shape)

## Class Definitions

In [25]:
import time
from tqdm import tqdm

class Dataset(torch.utils.data.Dataset):
    def __init__(self, sequences, labels:torch.Tensor, tokenizer):
        self.sequences = sequences
        self.labels = labels
        self.tokenizer = tokenizer

    def __len__(self):
        return len(self.sequences)
    
    def __getitem__(self, idx):
        char = self.tokenizer(self.sequences[idx], add_special_tokens = True,return_tensors='pt', padding='max_length', truncation=True)
        encoded_seq = char['input_ids']
        attention_mask = char['attention_mask']
        #reshape the pytorch tensor to be flattened because single element
        return encoded_seq[0], attention_mask[0], self.labels[idx]
        # return self.encoded_seqs[idx],self.attention_masks[idx], self.labels[idx]

class BERT_Base_Multilabel(torch.nn.Module):
    def __init__(self, num_labels): 
        """num_labels: number of labels to classify
           database: tuple of (X, Y) where X is a list of sentences and Y is a tensor of labels
        """
        super().__init__()
        print("Initializing BERT_Base_Multilabel...")

        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
        self.cls_head = torch.nn.Sequential(
            torch.nn.Linear(768, 1024),
            torch.nn.ReLU(),
            torch.nn.Linear(1024, num_labels),
            torch.nn.Sigmoid()
        )
        self.loss_fn = torch.nn.BCELoss()
        print("Initialized.")
    
    def forward(self, encoded_seqs, attention_masks):
        """Input: sequence (str) of shape (batch_size, seq_len)"""
        bert_out = self.bert(encoded_seqs, attention_mask=attention_masks)
        clshead_output = self.cls_head(bert_out.last_hidden_state[:, 0, :]) #use the first token to classify
        return clshead_output
    
    def predict(self, sequence):
        with torch.no_grad():
            self.eval()
            return self.forward(sequence)
    
    def save(self, path):
        torch.save(self.state_dict(), path) #save the model state dict

    def load(self, path):
        self.load_state_dict(torch.load(path))

    def fit(self, epochs, batch_size, lr, dataset:torch.utils.data.Dataset, epochs_done = 0):
        self.train()
        
        optimizer = torch.optim.AdamW(self.parameters(), lr=lr)
        dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
        completed = epochs_done
        try:
            for epoch in range(epochs_done, epochs):
                print(f"Epoch {epoch}")
                pbar = tqdm(dataloader)
                for batch in pbar:
                    # print("here1")
                    optimizer.zero_grad()
                    encoded_seqs, attention_masks, labels = batch
                    encoded_seqs = encoded_seqs.to(device)
                    attention_masks = attention_masks.to(device)
                    labels = labels.to(device)
                    # print("here2")
                    output = self.forward(encoded_seqs, attention_masks)
                    loss = self.loss_fn(output, labels)
                    # print("here3")
                    loss.backward()
                    optimizer.step()
                    # print("here4")
                    pbar.set_description(f"Loss: {loss.item()}")
                print(f"Epoch {epoch+1} completed. Training Loss: {loss.item()}")
                completed += 1
            self.save(f"{save_folder}/bertbaseuncased_{completed}_{time.strftime('%Y-%m-%d_%H:%M:%S')}.pt")
            
        except KeyboardInterrupt:
            print("Training interrupted.")
            #save the model by date and time of interruption
            self.save(f"{interrupt_save_folder}/bertbaseuncased_interrupt_{epoch}_{time.strftime('%Y-%m-%d_%H:%M:%S')}.pt")

class BERT_BaseCased_Multilabel(torch.nn.Module):
    def __init__(self, num_labels): 
        """num_labels: number of labels to classify
           database: tuple of (X, Y) where X is a list of sentences and Y is a tensor of labels
        """
        super().__init__()
        print("Initializing BERT_BaseCased_Multilabel...")

        self.bert = BertModel.from_pretrained('bert-base-cased')
        self.tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
        self.cls_head = torch.nn.Sequential(
            torch.nn.Linear(768, 1024),
            torch.nn.ReLU(),
            torch.nn.Linear(1024, num_labels),
            torch.nn.Sigmoid()
        )
        self.loss_fn = torch.nn.BCELoss()
        print("Initialized.")
    
    def forward(self, encoded_seqs, attention_masks):
        """Input: sequence (str) of shape (batch_size, seq_len)"""
        bert_out = self.bert(encoded_seqs, attention_mask=attention_masks)
        clshead_output = self.cls_head(bert_out.last_hidden_state[:, 0, :]) #use the first token to classify
        return clshead_output
    
    def predict(self, sequence):
        with torch.no_grad():
            self.eval()
            return self.forward(sequence)
    
    def save(self, path):
        torch.save(self.state_dict(), path) #save the model state dict

    def load(self, path):
        self.load_state_dict(torch.load(path))

    def fit(self, epochs, batch_size, lr, dataset:torch.utils.data.Dataset, epochs_done = 0):
        self.train()

        optimizer = torch.optim.AdamW(self.parameters(), lr=lr)
        dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
        completed = epochs_done
        try:
            for epoch in range(epochs_done, epochs):
                print(f"Epoch {epoch}")
                pbar = tqdm(dataloader)
                for batch in pbar:
                    # print("here1")
                    optimizer.zero_grad()
                    encoded_seqs, attention_masks, labels = batch
                    encoded_seqs = encoded_seqs.to(device)
                    attention_masks = attention_masks.to(device)
                    labels = labels.to(device)
                    # print("here2")
                    output = self.forward(encoded_seqs, attention_masks)
                    loss = self.loss_fn(output, labels)
                    # print("here3")
                    loss.backward()
                    optimizer.step()
                    # print("here4")
                    pbar.set_description(f"Loss: {loss.item()}")
                print(f"Epoch {epoch+1} completed. Training Loss: {loss.item()}")
                completed += 1
            self.save(f"{save_folder}/bertbasecased_{completed}_{time.strftime('%Y-%m-%d_%H:%M:%S')}.pt")
            
        except KeyboardInterrupt:
            print("Training interrupted.")
            #save the model by date and time of interruption
            self.save(f"{interrupt_save_folder}/bertbasecased_interrupt_{epoch}_{time.strftime('%Y-%m-%d_%H:%M:%S')}.pt")

class BERT_Base_Multilabel2(torch.nn.Module):
    def __init__(self, num_labels): 
        """num_labels: number of labels to classify
           database: tuple of (X, Y) where X is a list of sentences and Y is a tensor of labels
        """
        super().__init__()
        print("Initializing BERT_Base_Multilabel...")

        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
        self.cls_head = torch.nn.Sequential(
            torch.nn.Dropout(0.3),
            torch.nn.Linear(768, num_labels),
            torch.nn.Sigmoid()
        )
        self.loss_fn = torch.nn.BCELoss()
        print("Initialized.")
    
    def forward(self, encoded_seqs, attention_masks):
        """Input: sequence (str) of shape (batch_size, seq_len)"""
        bert_out = self.bert(encoded_seqs, attention_mask=attention_masks)
        clshead_output = self.cls_head(bert_out.last_hidden_state[:, 0, :]) #use the first token to classify
        return clshead_output
    
    def predict(self, sequence):
        with torch.no_grad():
            self.eval()
            return self.forward(sequence)
    
    def save(self, path):
        torch.save(self.state_dict(), path) #save the model state dict

    def load(self, path):
        self.load_state_dict(torch.load(path))

    def fit(self, epochs, batch_size, lr, dataset:torch.utils.data.Dataset, epochs_done = 0):
        self.train()
        
        optimizer = torch.optim.AdamW(self.parameters(), lr=lr)
        dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
        completed = epochs_done
        try:
            for epoch in range(epochs_done, epochs):
                print(f"Epoch {epoch}")
                pbar = tqdm(dataloader)
                for batch in pbar:
                    # print("here1")
                    optimizer.zero_grad()
                    encoded_seqs, attention_masks, labels = batch
                    encoded_seqs = encoded_seqs.to(device)
                    attention_masks = attention_masks.to(device)
                    labels = labels.to(device)
                    # print("here2")
                    output = self.forward(encoded_seqs, attention_masks)
                    loss = self.loss_fn(output, labels)
                    # print("here3")
                    loss.backward()
                    optimizer.step()
                    # print("here4")
                    pbar.set_description(f"Loss: {loss.item()}")
                print(f"Epoch {epoch+1} completed. Training Loss: {loss.item()}")
                completed += 1
            self.save(f"{save_folder}/bertbaseuncased2_{completed}_{time.strftime('%Y-%m-%d_%H:%M:%S')}.pt")
            
        except KeyboardInterrupt:
            print("Training interrupted.")
            #save the model by date and time of interruption
            self.save(f"{interrupt_save_folder}/bertbaseuncased2_interrupt_{epoch}_{time.strftime('%Y-%m-%d_%H:%M:%S')}.pt")


In [8]:
#Prepare database
concat_text = []
for i in range(data_df.shape[0]):
    concat_text.append(f"Title : {data_df.iloc[i].Title}; Abstract : {data_df.iloc[i].abstractText}")
data_df['text'] = pd.Series(concat_text)


In [9]:
print("Number of sequences:", len(data_df))
texts = list(data_df['text'].values)
target = torch.tensor(data_df[labels].values).float()

Number of sequences: 50000


In [10]:
# cls_model = BERT_Base_Multilabel(num_labels, (texts, data_df[labels].values))
# short = texts[1]
# encoded = tokenizer(short,add_special_tokens=True ,return_tensors='pt', padding='max_length', truncation=True)
# out = bert_model(encoded['input_ids'].to(device), encoded['attention_mask'].to(device))
# print(out[1].shape)

# short_dataset = Dataset(texts[:10], target[:10], tokenizer)


# Trying Variations

## Dataset Prep

In [11]:
def construct_dataset(texts, target, tokenizer):
    """
    Input: texts: list of strings, target: tensor of shape (num_samples, num_labels)
    Output: full_dataset, train_dataset, val_dataset, test_dataset
    """
    full_dataset = Dataset(texts, target, tokenizer)
    train_size = int(0.8 * len(full_dataset))
    val_size = int(0.1 * len(full_dataset))
    test_size = len(full_dataset) - train_size - val_size
    train_dataset, val_dataset, test_dataset = torch.utils.data.random_split(full_dataset, [train_size, val_size, test_size])
    train_dataset.tokenizer = tokenizer
    val_dataset.tokenizer = tokenizer
    test_dataset.tokenizer = tokenizer
    return full_dataset, train_dataset, val_dataset, test_dataset

In [19]:
def evaluate(model, dataset):
    # Evaluate Accuracy
    data_loader = torch.utils.data.DataLoader(dataset, batch_size=4, shuffle=False)
    model.eval()
    metrics = {}
    fin_targets=[]
    fin_outputs=[]
    with torch.no_grad():
        for _, batch in tqdm(enumerate(data_loader), total=len(data_loader)):
            encoded_seqs, attention_masks, targets = batch
            encoded_seqs = encoded_seqs.to(device)
            attention_masks = attention_masks.to(device)
            targets = targets.to(device)

            outputs = model(encoded_seqs, attention_masks)
            
            fin_targets.extend(targets.cpu().detach().numpy().tolist())
            fin_outputs.extend(outputs.cpu().detach().numpy().tolist())
    accuracy = 0
    for i in range(len(fin_outputs)):
        accuracy += np.sum(np.round(fin_outputs[i]) == fin_targets[i])
    accuracy = accuracy/(len(fin_outputs)*len(fin_outputs[0]))
    # Evaluate Precision, Recall and F1-Score for each class separately
    fin_outputs = np.round(fin_outputs)
    fin_targets = np.array(fin_targets)
    fin_outputs = np.array(fin_outputs)
    fin_targets = np.array(fin_targets)
    precisions = {}
    recalls = {}
    f1_scores = {}
    for label in range(len(labels)):
        precision = np.sum((fin_outputs[:, label] == 1) & (fin_targets[:, label] == 1)) / np.sum(fin_outputs[:, label] == 1)
        recall = np.sum((fin_outputs[:, label] == 1) & (fin_targets[:, label] == 1)) / np.sum(fin_targets[:, label] == 1)
        f1_score = 2 * precision * recall / (precision + recall)
        precisions[label] = precision
        recalls[label] = recall
        f1_scores[label] = f1_score
        print(f"Label: {labels[label]} | Precision: {precision} | Recall: {recall} | F1-Score: {f1_score}")
    # Evaluate Macro Average and Micro Average Precision, Recall and F1-Score
    macro_precision = np.mean(list(precisions.values()))
    macro_recall = np.mean(list(recalls.values()))
    macro_f1_score = np.mean(list(f1_scores.values()))
    micro_precision = np.sum([np.sum((fin_outputs[:, label] == 1) & (fin_targets[:, label] == 1)) for label in range(len(labels))]) / np.sum(fin_outputs == 1)
    micro_recall = np.sum([np.sum((fin_outputs[:, label] == 1) & (fin_targets[:, label] == 1)) for label in range(len(labels))]) / np.sum(fin_targets == 1)
    micro_f1_score = 2 * micro_precision * micro_recall / (micro_precision + micro_recall)
    print(f"Macro Precision: {macro_precision} | Macro Recall: {macro_recall} | Macro F1-Score: {macro_f1_score}")
    print(f"Micro Precision: {micro_precision} | Micro Recall: {micro_recall} | Micro F1-Score: {micro_f1_score}")
    metrics['accuracy'] = accuracy
    metrics['precisions'] = precisions
    metrics['recalls'] = recalls
    metrics['f1_scores'] = f1_scores
    metrics['macro_precision'] = macro_precision
    metrics['macro_recall'] = macro_recall
    metrics['macro_f1_score'] = macro_f1_score
    metrics['micro_precision'] = micro_precision
    metrics['micro_recall'] = micro_recall
    metrics['micro_f1_score'] = micro_f1_score
    return fin_outputs, fin_targets, metrics


## BERT Base Uncased [BERT Freezed]

In [14]:
cls_model = BERT_Base_Multilabel(num_labels).to(device)
full_dataset, train_dataset, val_dataset, test_dataset = construct_dataset(texts, target, cls_model.tokenizer)


Initializing BERT_Base_Multilabel...
Initialized.


In [None]:
#freeze all BERT params
for param in cls_model.bert.parameters():
    param.requires_grad = False

In [None]:
#continue training
cls_model.fit(1, 32, 1e-4, train_dataset)

In [None]:
# evaluate(cls_model, test_dataset)

## BERT Base Cased - [Bert Freezed]

In [None]:
# Trying BERT_cased for 1 epoch

cls_bertcased = BERT_BaseCased_Multilabel(num_labels).to(device)
full_dataset, train_dataset, val_dataset, test_dataset = construct_dataset(texts, target, cls_bertcased.tokenizer)
print(cls_bertcased)
#freeze all BERT params
for params in cls_bertcased.bert.parameters():
    params.requires_grad = False

In [None]:
cls_bertcased.fit(1, 32, 1e-4, train_dataset)

In [None]:
test_accuracy = evaluate(cls_bertcased, test_dataset)

## Uncased BERTBase [Last Encoder Layer Unfrozen]

In [None]:
base_unfreeze1 =  BERT_BaseCased_Multilabel(num_labels).to(device)
full_dataset, train_dataset, val_dataset, test_dataset = construct_dataset(texts, target, base_unfreeze1.tokenizer)

cnt = 0
for param in base_unfreeze1.bert.parameters():
    param.requires_grad = False
    
for layer in base_unfreeze1.bert.encoder.layer:
    cnt += 1
    if cnt >= 12:
        for param in layer.parameters():
            param.required_grad = True

print("Last 1 Encoder Layer Unfreezed")


In [None]:
base_unfreeze1.fit(1, 32, 1e-4, train_dataset)

In [None]:
test_accuracy_base_unfreeze1 = evaluate(base_unfreeze1, test_dataset)

In [None]:
# Lower the learning rate and train for some more time
base_unfreeze1.load("/kaggle/working/bertbasecased_1_2024-04-14_15:14:06.pt")

In [None]:
base_unfreeze1.fit(2, 32, 2e-5, train_dataset, epochs_done=1)

In [None]:
accuracy_3 = evaluate(base_unfreeze1, test_dataset)

## Unfreezing Embedding Layer

In [None]:

uncased_uf_emb_enc = BERT_Base_Multilabel(num_labels).to(device)
full_dataset, train_dataset, val_dataset, test_dataset = construct_dataset(texts, target, uncased_uf_emb_enc.tokenizer)

cnt = 0
for param in uncased_uf_emb_enc.bert.parameters():
    param.requires_grad = False
    
for layer in uncased_uf_emb_enc.bert.encoder.layer:
    cnt += 1
    if cnt >= 12:
        for param in layer.parameters():
            param.required_grad = True

print("Last 1 Encoder Layer Unfreezed")
for param in uncased_uf_emb_enc.bert.embeddings.parameters():
    param.requires_grad = True

print("Embedding Layer Unfreezed")

In [None]:
uncased_uf_emb_enc.fit(1,16,1e-4, train_dataset)

In [None]:
evaluate(uncased_uf_emb_enc, test_dataset)

In [None]:
# unfreeze one more layer and do 2 more epochs
uncased_uf_emb_enc.load("/kaggle/working/bertbaseuncased_interrupt_1_2024-04-14_17:29:54.pt")
cnt = 0
for param in uncased_uf_emb_enc.bert.parameters():
    param.requires_grad = False
    
for layer in uncased_uf_emb_enc.bert.encoder.layer:
    cnt += 1
    if cnt >= 11:
        for param in layer.parameters():
            param.required_grad = True

print("Last 2 Encoder Layer Unfreezed")
for param in uncased_uf_emb_enc.bert.embeddings.parameters():
    param.requires_grad = True

print("Embedding Layer Unfreezed")

In [None]:
uncased_uf_emb_enc.fit(2,16,2e-5, train_dataset, epochs_done=1)

In [None]:
evaluate(uncased_uf_emb_enc, test_dataset)

## Trying the above (uncased, unfreezed embeddings) without unfreezing the encoder layer
To confirm whether unfreezing encoder actually makes a difference or not

In [None]:
uncased_uf_emb = BERT_Base_Multilabel(num_labels).to(device)
full_dataset, train_dataset, val_dataset, test_dataset = construct_dataset(texts, target, uncased_uf_emb.tokenizer)

cnt = 0
for param in uncased_uf_emb.bert.parameters():
    param.requires_grad = False
    
# for layer in uncased_uf_emb.bert.encoder.layer:
#     cnt += 1
#     if cnt >= 12:
#         for param in layer.parameters():
#             param.required_grad = True

# print("Last 1 Encoder Layer Unfreezed")
for param in uncased_uf_emb.bert.embeddings.parameters():
    param.requires_grad = True

print("Embedding Layer Unfreezed")

In [None]:
uncased_uf_emb.fit(1,16,1e-4, train_dataset)

## Detailed Evaluation

In [15]:
uncased_uf_emb = BERT_Base_Multilabel(num_labels).to(device)
full_dataset, train_dataset, val_dataset, test_dataset = construct_dataset(texts, target, uncased_uf_emb.tokenizer)

uncased_uf_emb.load("/kaggle/input/bert_base_multilabel_classifier/pytorch/uf_emb_enc/1/e3_2024-04-14_18_37_33.pt")


Initializing BERT_Base_Multilabel...
Initialized.


In [20]:
_, _, test_report = evaluate(uncased_uf_emb, test_dataset)

100%|██████████| 1250/1250 [02:17<00:00,  9.12it/s]


Label: A | Precision: 0.8285097192224622 | Recall: 0.8382867132867133 | F1-Score: 0.8333695416033022
Label: B | Precision: 0.9719195305951384 | Recall: 0.9906023067065357 | F1-Score: 0.9811719906917709
Label: C | Precision: 0.8559498956158664 | Recall: 0.9439754412893323 | F1-Score: 0.8978102189781022
Label: D | Precision: 0.9496774193548387 | Recall: 0.9278285534194768 | F1-Score: 0.9386258568468037
Label: E | Precision: 0.8502923976608188 | Recall: 0.9318123558062036 | F1-Score: 0.889187866927593
Label: F | Precision: 0.8434886499402628 | Recall: 0.800453514739229 | F1-Score: 0.8214077952297848
Label: G | Precision: 0.8388533259114945 | Recall: 0.8880377136122569 | F1-Score: 0.8627450980392156
Label: H | Precision: 0.6534653465346535 | Recall: 0.10361067503924647 | F1-Score: 0.17886178861788615
Label: I | Precision: 0.7377049180327869 | Recall: 0.5833333333333334 | F1-Score: 0.6514994829369183
Label: J | Precision: 0.7937853107344632 | Recall: 0.5146520146520146 | F1-Score: 0.6244444

In [21]:
print(test_report['accuracy'])

0.8951428571428571


In [22]:
_, _, val_report = evaluate(uncased_uf_emb, val_dataset)

100%|██████████| 1250/1250 [02:16<00:00,  9.17it/s]

Label: A | Precision: 0.8407833120476799 | Recall: 0.836155800169348 | F1-Score: 0.8384631713012098
Label: B | Precision: 0.9690526315789474 | Recall: 0.9911714039621017 | F1-Score: 0.9799872258888653
Label: C | Precision: 0.8568953568953569 | Recall: 0.9435330026707364 | F1-Score: 0.8981296531686943
Label: D | Precision: 0.9467146126185028 | Recall: 0.9243536546441111 | F1-Score: 0.9354005167958656
Label: E | Precision: 0.8542356838618078 | Recall: 0.9228016359918201 | F1-Score: 0.8871958712214303
Label: F | Precision: 0.8127250900360145 | Recall: 0.8166465621230398 | F1-Score: 0.8146811070998796
Label: G | Precision: 0.8505970563732297 | Recall: 0.9043401240035429 | F1-Score: 0.8766456783056669
Label: H | Precision: 0.6428571428571429 | Recall: 0.11707317073170732 | F1-Score: 0.19807427785419532
Label: I | Precision: 0.753393665158371 | Recall: 0.6294896030245747 | F1-Score: 0.685890834191555
Label: J | Precision: 0.7595628415300546 | Recall: 0.556 | F1-Score: 0.6420323325635104
Labe




In [23]:
print(val_report['accuracy'])

0.8985714285714286


## Dropout Layers

In [26]:
uncased2_uf_emb = BERT_Base_Multilabel2(num_labels).to(device)
full_dataset, train_dataset, val_dataset, test_dataset = construct_dataset(texts, target, uncased2_uf_emb.tokenizer)


Initializing BERT_Base_Multilabel...
Initialized.


In [27]:
uncased2_uf_emb.fit(1,16,1e-4, train_dataset)

Epoch 0


Loss: 0.29174670577049255: 100%|██████████| 2500/2500 [41:06<00:00,  1.01it/s]


Epoch 1 completed. Training Loss: 0.29174670577049255


In [28]:
_, _, test_report = evaluate(uncased2_uf_emb, test_dataset)
print("Test Accuracy", test_report['accuracy'])


100%|██████████| 1250/1250 [02:17<00:00,  9.09it/s]


Label: A | Precision: 0.8210930828351836 | Recall: 0.8207426376440461 | F1-Score: 0.8209178228388474
Label: B | Precision: 0.9713916971391697 | Recall: 0.9764324324324324 | F1-Score: 0.9739055423765366
Label: C | Precision: 0.8890631125049 | Recall: 0.8692985818321196 | F1-Score: 0.8790697674418604
Label: D | Precision: 0.9225642653125992 | Recall: 0.9314322332585709 | F1-Score: 0.9269770408163265
Label: E | Precision: 0.8126901347240331 | Recall: 0.9572562068082928 | F1-Score: 0.8790692208250087
Label: F | Precision: 0.8943089430894309 | Recall: 0.6307339449541285 | F1-Score: 0.7397444519166106
Label: G | Precision: 0.8169051404345522 | Recall: 0.9099763872491146 | F1-Score: 0.8609327003630272
Label: H | Precision: 0.5253164556962026 | Recall: 0.13651315789473684 | F1-Score: 0.21671018276762402
Label: I | Precision: 0.7130620985010707 | Recall: 0.6098901098901099 | F1-Score: 0.6574531095755183
Label: J | Precision: 0.7045454545454546 | Recall: 0.5646630236794171 | F1-Score: 0.62689585

In [29]:
_, _, val_report = evaluate(uncased2_uf_emb, val_dataset)
print("Validation Accuracy", val_report['accuracy'])


100%|██████████| 1250/1250 [02:18<00:00,  9.00it/s]


Label: A | Precision: 0.8199152542372882 | Recall: 0.8255119453924915 | F1-Score: 0.8227040816326532
Label: B | Precision: 0.9751712328767124 | Recall: 0.9735042735042735 | F1-Score: 0.9743370402053038
Label: C | Precision: 0.9089093088294047 | Recall: 0.8627227910504361 | F1-Score: 0.8852140077821012
Label: D | Precision: 0.9135802469135802 | Recall: 0.925 | F1-Score: 0.9192546583850932
Label: E | Precision: 0.817765168048887 | Recall: 0.9563552833078101 | F1-Score: 0.8816470588235295
Label: F | Precision: 0.8991869918699187 | Recall: 0.6415313225058005 | F1-Score: 0.7488151658767772
Label: G | Precision: 0.8150470219435737 | Recall: 0.9217134416543574 | F1-Score: 0.8651046721197837
Label: H | Precision: 0.5297619047619048 | Recall: 0.15724381625441697 | F1-Score: 0.24250681198910082
Label: I | Precision: 0.7215777262180975 | Recall: 0.5791433891992551 | F1-Score: 0.6425619834710744
Label: J | Precision: 0.703962703962704 | Recall: 0.5666041275797373 | F1-Score: 0.6278586278586279
Lab