# Sentiment Analysis with Deep Learning using BERT

## Project Summary: Sentiment Analysis with Deep Learning Using BERT

This project implements **sentiment analysis** using the **BERT** (Bidirectional Encoder Representations from Transformers) model. It leverages modern **deep learning** techniques and natural language processing (NLP) methodologies to classify the sentiment of textual data. The main goal is to train and evaluate a model using the **SMILE Twitter dataset** to predict the sentiment of various tweets.

## Key Techniques and Technologies Used:

1. **Hugging Face's BERT Model**:
   - **BERT** is a transformer-based model pre-trained on large datasets. This project uses the **`bert-base-uncased`** pre-trained model from Hugging Face for **sequence classification** tasks.

2. **PyTorch Framework**:
   - The project utilizes **PyTorch** to handle model training, data loading, and tensor operations. PyTorch is the main deep learning framework used for building and fine-tuning the BERT model.

3. **Tokenizer and Data Preparation**:
   - **BERT Tokenizer** is used to preprocess the text data by encoding the tweets into tokenized format, including adding special tokens like `[CLS]` and `[SEP]`, padding, and truncating sequences to a maximum length.
   - **Tokenization** is followed by creating **attention masks** to differentiate between padded and actual data during model training.

4. **TensorDataset and DataLoader**:
   - The project creates **TensorDataset** objects for the training and validation sets after tokenization. These are then fed into **PyTorch DataLoaders**, which manage batch sampling for efficient training and validation.

5. **Model Training and Evaluation**:
   - **AdamW Optimizer** is used to update the model weights during training, with a **learning rate scheduler** (`get_linear_schedule_with_warmup`) to control learning rate decay across epochs.
   - The model is trained using a **training loop** that handles:
     - Forward propagation, backward propagation, and gradient clipping.
     - **Loss accumulation** and saving the model after each epoch.
   
6. **Evaluation Metrics**:
   - **F1 Score, Accuracy, Precision, and Recall** are used to evaluate model performance. These metrics are computed using functions from the **scikit-learn** (`sklearn`) library.
   - The overall performance is tracked across both the **training** and **validation datasets**.

7. **Validation Loop**:
   - During validation, the model is set to **evaluation mode** (`model.eval()`) to disable dropout layers and gradient computation for more efficient inference.
   - Predictions and true labels are moved from GPU to CPU to free up memory, and metrics are calculated based on these values.

8. **Performance Metrics Computation**:
   - Custom functions are defined to compute **accuracy per class** and overall **metrics** (precision, recall, F1 score) using the **scikit-learn metrics** library.

## Conclusion:
This project demonstrates how to fine-tune a **pre-trained BERT model** for sentiment analysis using **PyTorch** and Hugging Face's **Transformers** library. It showcases key techniques such as tokenization, dataset handling, and implementing custom training/evaluation loops. The final model is evaluated using comprehensive metrics like **accuracy**, **precision**, **recall**, and **F1 score**, allowing for a detailed understanding of its performance.


### Project Outline

**Task 1**: Introduction (this section)

**Task 2**: Exploratory Data Analysis and Preprocessing

**Task 3**: Training/Validation Split

**Task 4**: Loading Tokenizer and Encoding our Data

**Task 5**: Setting up BERT Pretrained Model

**Task 6**: Creating Data Loaders

**Task 7**: Setting Up Optimizer and Scheduler

**Task 8**: Defining our Performance Metrics

**Task 9**: Creating our Training Loop

**Task 10**: Loading and Evaluating our Model

## Task 1: Introduction

### What is BERT

BERT is a large-scale transformer-based Language Model that can be finetuned for a variety of tasks.

For more information, the original paper can be found [here](https://arxiv.org/abs/1810.04805).

[HuggingFace documentation](https://huggingface.co/transformers/model_doc/bert.html)

[Bert documentation](https://characters.fandom.com/wiki/Bert_(Sesame_Street) ;)

## Task 2: Exploratory Data Analysis and Preprocessing

We will use the SMILE Twitter dataset.

_Wang, Bo; Tsakalidis, Adam; Liakata, Maria; Zubiaga, Arkaitz; Procter, Rob; Jensen, Eric (2016): SMILE Twitter Emotion dataset. figshare. Dataset. https://doi.org/10.6084/m9.figshare.3187909.v2_

In [1]:
import torch
import pandas as pd
from tqdm.notebook import tqdm

In [2]:
url ='https://raw.githubusercontent.com/pandapanda3/Dataset_Machine_Learning/refs/heads/main/Sentiment%20Analysis%20with%20Deep%20Learning%20using%20BERT/smile-annotations-final.csv'
df = pd.read_csv(url, names = ['id','text','category'])
# use the id column as index. The 'inplace=True' modifies the DataFrame in place without creating a new object
df.set_index('id', inplace = True)
df.head(10)

Unnamed: 0_level_0,text,category
id,Unnamed: 1_level_1,Unnamed: 2_level_1
611857364396965889,@aandraous @britishmuseum @AndrewsAntonio Merc...,nocode
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,happy
614877582664835073,@Sofabsports thank you for following me back. ...,happy
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,happy
611570404268883969,@NationalGallery @ThePoldarkian I have always ...,happy
614456889863208960,"@britishmuseum say wot, mate?",nocode
614016385442807809,Two workshops on evaluating audience engagemen...,nocode
610916556751642624,"A Forest Road, by Thomas Gainsborough 1750 Oil...",nocode
614499696015503361,Lucky @FitzMuseum_UK! Good luck @MirandaStearn...,happy


### iloc and loc

#### iloc
df.iloc[1:5, 0:1]:
* uses **integer-based indexing** to slice the DataFrame
* 1:5 (row slice): This selects rows from position **1 up to, but not including, position 5** (i.e., rows 1, 2, 3, and 4).
* 0:1 (column slice): This selects the column at position 0 (the first column) but maintains it in a DataFrame form (as it's a slice, not a scalar).
* Returns a DataFrame.

#### loc
df.loc[df.index[1:5], df.columns[0]]:
* uses **label-based indexing** and selects a single column by name (not by integer position).
*  This returns a Series (since you are selecting a single column).


In [3]:
# Access the value from the 'text' column at the second row (index 1)
df.text.iloc[1]

'Dorian Gray with Rainbow Scarf #LoveWins (from @britishmuseum http://t.co/Q4XSwL0esu) http://t.co/h0evbTBWRq'

In [4]:
# Select rows 1 through 4 (index 1 to 4) and column 0 through 1
df.iloc[1:5, 0:1]

Unnamed: 0_level_0,text
id,Unnamed: 1_level_1
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...
614877582664835073,@Sofabsports thank you for following me back. ...
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...


In [5]:
df.loc[df.index[1:5], df.columns[0]]

Unnamed: 0_level_0,text
id,Unnamed: 1_level_1
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...
614877582664835073,@Sofabsports thank you for following me back. ...
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...


In [6]:
df.category.value_counts()

Unnamed: 0_level_0,count
category,Unnamed: 1_level_1
nocode,1572
happy,1137
not-relevant,214
angry,57
surprise,35
sad,32
happy|surprise,11
happy|sad,9
disgust|angry,7
disgust,6


In [7]:
df.category.unique()

array(['nocode', 'happy', 'not-relevant', 'angry', 'disgust|angry',
       'disgust', 'happy|surprise', 'sad', 'surprise', 'happy|sad',
       'sad|disgust', 'sad|angry', 'sad|disgust|angry'], dtype=object)

In [8]:
# ignore the category that contains "|"
df = df[~df.category.str.contains('\|')]
df.category.value_counts()

Unnamed: 0_level_0,count
category,Unnamed: 1_level_1
nocode,1572
happy,1137
not-relevant,214
angry,57
surprise,35
sad,32
disgust,6


In [9]:
# ignore the category that contains 'nocode'
df = df[df.category != 'nocode']
df.category.value_counts()

Unnamed: 0_level_0,count
category,Unnamed: 1_level_1
happy,1137
not-relevant,214
angry,57
surprise,35
sad,32
disgust,6


### Map each category to a numerical representation

In [10]:
# Map each category to a numerical representation
labels = df.category.unique()
label_dict={}
for index, label in enumerate(labels):
    label_dict[label]=index
label_dict

{'happy': 0,
 'not-relevant': 1,
 'angry': 2,
 'disgust': 3,
 'sad': 4,
 'surprise': 5}

In [11]:
# # Map the values in the 'category' column to numerical labels using the 'label_dict' dictionary and assign to a new 'label' column
df['label'] = df.category.replace(label_dict)
df.head(10)

  df['label'] = df.category.replace(label_dict)


Unnamed: 0_level_0,text,category,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy,0
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,happy,0
614877582664835073,@Sofabsports thank you for following me back. ...,happy,0
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,happy,0
611570404268883969,@NationalGallery @ThePoldarkian I have always ...,happy,0
614499696015503361,Lucky @FitzMuseum_UK! Good luck @MirandaStearn...,happy,0
613601881441570816,Yr 9 art students are off to the @britishmuseu...,happy,0
613696526297210880,@RAMMuseum Please vote for us as @sainsbury #s...,not-relevant,1
610746718641102848,#AskTheGallery Have you got plans to privatise...,not-relevant,1
612648200588038144,@BarbyWT @britishmuseum so beautiful,happy,0


## Task 3: Training/Validation Split

In [12]:
from sklearn.model_selection import train_test_split

In [13]:
df.label.value_counts()

Unnamed: 0_level_0,count
label,Unnamed: 1_level_1
0,1137
1,214
2,57
5,35
4,32
3,6


In [14]:
df.index

Index([614484565059596288, 614746522043973632, 614877582664835073,
       611932373039644672, 611570404268883969, 614499696015503361,
       613601881441570816, 613696526297210880, 610746718641102848,
       612648200588038144,
       ...
       611227963976253440, 612242969035411456, 614900716960915456,
       614053885733412864, 610405281604993024, 611258135270060033,
       612214539468279808, 613678555935973376, 615246897670922240,
       613016084371914753],
      dtype='int64', name='id', length=1481)

In [15]:
# Split the data into training and validation sets, with 15% for validation, ensuring the label distribution is preserved
X_train,X_val,y_train,y_val = train_test_split(
    df.index.values,
    df.label.values,
    test_size=0.15,
    random_state=17,
    # stratify=df.label.values
)

In [16]:
# Print the first 5 values of X_train, X_val, y_train, and y_val
print("X_train (first 5):", X_train[:5])
print("X_val (first 5):", X_val[:5])
print("y_train (first 5):", y_train[:5])
print("y_val (first 5):", y_val[:5])


X_train (first 5): [614770792627372032 614113686316281856 611174977853743104
 613727438800056320 611667572602368000]
X_val (first 5): [614852295055028224 611796541204860928 611849399203930112
 612341092646801408 610756005908086784]
y_train (first 5): [0 1 0 0 0]
y_val (first 5): [0 0 0 0 2]


### Assign 'train' to the 'data_type' column for rows with indices in X_train



In [44]:
df.shape

(1481, 4)

In [17]:
# Create a new column 'data_type' and initialize all its values to 'not_set' for every row in the DataFrame
df['data_type'] = ['not_set']*df.shape[0]

In [18]:
df.head(10)

Unnamed: 0_level_0,text,category,label,data_type
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy,0,not_set
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,happy,0,not_set
614877582664835073,@Sofabsports thank you for following me back. ...,happy,0,not_set
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,happy,0,not_set
611570404268883969,@NationalGallery @ThePoldarkian I have always ...,happy,0,not_set
614499696015503361,Lucky @FitzMuseum_UK! Good luck @MirandaStearn...,happy,0,not_set
613601881441570816,Yr 9 art students are off to the @britishmuseu...,happy,0,not_set
613696526297210880,@RAMMuseum Please vote for us as @sainsbury #s...,not-relevant,1,not_set
610746718641102848,#AskTheGallery Have you got plans to privatise...,not-relevant,1,not_set
612648200588038144,@BarbyWT @britishmuseum so beautiful,happy,0,not_set


In [19]:
# Assign 'train' to the 'data_type' column for rows with indices in X_train
df.loc[X_train,'data_type'] = 'train'
# Assign 'val' to the 'data_type' column for rows with indices in X_val
df.loc[X_val,'data_type'] = 'val'

In [20]:
df.groupby(['category','label','data_type']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,text
category,label,data_type,Unnamed: 3_level_1
angry,2,train,48
angry,2,val,9
disgust,3,train,6
happy,0,train,964
happy,0,val,173
not-relevant,1,train,183
not-relevant,1,val,31
sad,4,train,26
sad,4,val,6
surprise,5,train,31


## Task 4: Loading Tokenizer and Encoding our Data

In [21]:
from transformers import BertTokenizer
from torch.utils.data import TensorDataset

In [22]:
tokenizer = BertTokenizer.from_pretrained(
    'bert-base-uncased',
    do_lower_case=True
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]



### preparing the encoded training and validation data (from BERT tokenization) for use in deep learning model training by creating PyTorch tensors and then combining these into TensorDataset objects.



In [23]:
# Encode training data
encoded_data_train = tokenizer.batch_encode_plus(
    df[df.data_type=='train'].text.values,
    # Add special tokens such as [CLS] and [SEP]
    add_special_tokens=True,
    # Return attention mask
    return_attention_mask=True,
    # Pad to maximum length
    pad_to_max_length=True,
    max_length=256,
    # Explicitly enable truncation
    truncation=True,
    return_tensors='pt'
)

encoded_data_val = tokenizer.batch_encode_plus(
    df[df.data_type=='val'].text.values,
    add_special_tokens=True,
    return_attention_mask=True,
    pad_to_max_length=True,
    max_length=256,
    truncation=True,
    return_tensors='pt'
)




In [24]:
# Filter the training data from the DataFrame
# Extract the label column
# Convert to a NumPy array
# Convert to a PyTorch tensor, PyTorch tensors are used for deep learning computations and can be processed on GPU if needed.
input_ids_train = encoded_data_train['input_ids']
attention_masks_train = encoded_data_train['attention_mask']
# converts the extracted labels (which are currently in array-like format) into a PyTorch tensor.
labels_train = torch.tensor(df[df.data_type == 'train'].label.values)

input_ids_val = encoded_data_val['input_ids']
attention_masks_val = encoded_data_val['attention_mask']
labels_val = torch.tensor(df[df.data_type == 'val'].label.values)

In [25]:
dataset_train = TensorDataset(input_ids_train,
                              attention_masks_train,labels_train)
dataset_val = TensorDataset(input_ids_val,
                              attention_masks_val,labels_val)

In [26]:
dataset_train

<torch.utils.data.dataset.TensorDataset at 0x7ee2a74f4c10>

In [27]:
len(dataset_train)

1258

In [28]:
len(dataset_val)

223

## Task 5: Setting up BERT Pretrained Model

In [29]:
from transformers import BertForSequenceClassification

In [30]:
model = BertForSequenceClassification.from_pretrained(
    'bert-base-uncased',
    num_labels = len(label_dict),
    output_attentions = False,
    output_hidden_states=False
)



model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Task 6: Creating Data Loaders

In [31]:
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

In [32]:
batch_size = 4
# It iterates over the dataset and returns batches of data (input and labels) to the model during training.
dataloader_train = DataLoader(
    dataset_train,
    sampler = RandomSampler(dataset_train),
    # This defines how many samples are processed at once (in each batch)
    batch_size= batch_size
)

dataloader_val = DataLoader(
    dataset_val,
    sampler = RandomSampler(dataset_val),
    batch_size= 32
)

## Task 7: Setting Up Optimizer and Scheduler

In [33]:
from transformers import AdamW, get_linear_schedule_with_warmup

In [34]:
optimizer = AdamW(
    model.parameters(),
    # learning rate
    lr = 1e-5,
    # Epsilon: It is used to prevent division by zero errors during computations and ensure numerical stability.
    eps= 1e-8
)



In [35]:
epochs = 10

scheduler = get_linear_schedule_with_warmup(
    optimizer,
    # No warmup period, the learning rate starts from the initial value immediately.
    num_warmup_steps = 0,
    # total number of training steps across all epochs.
    num_training_steps = len(dataloader_train)*epochs
)

## Task 8: Defining our Performance Metrics

Accuracy metric approach originally used in accuracy function in [this tutorial](https://mccormickml.com/2019/07/22/BERT-fine-tuning/#41-bertforsequenceclassification).

## np.argmax(preds, axis=1):

For each row in preds, it selects the index of the largest value (the class with the highest predicted probability).

```
preds = [[0.1, 0.6, 0.3], [0.7, 0.2, 0.1], [0.3, 0.4, 0.3]]
np.argmax(preds, axis=1)
```
output

```
[1, 0, 1]  # the prediction category is  1, 0, 1 respectively
```

## Why Use the Weighted F1 Score?
The weighted F1 score is used to account for **class imbalance**. In datasets where some classes have significantly more instances than others, the performance on the minority classes can be overshadowed by the majority class. The weighted F1 score ensures that each class contributes proportionally to the final score based on the number of instances in that class.

In [36]:
import numpy as np

In [77]:
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score

In [38]:
def f1_score_func(preds, labels):
    # Convert the predicted probabilities into class indices by selecting the highest value along axis 1,
    # then flatten the result into a 1D array of predicted classes.
    preds_flat = np.argmax(preds, axis = 1).flatten()
    labels_flat = labels.flatten()
    return f1_score(labels_flat, preds_flat, average = 'weighted')

In [60]:
label_dict['surprise']

5

In [71]:
def accuracy_per_class(preds, labels):
    # inverse the dict
    # label_dict_inverse = {v: k for k, v in label_dict.items()}
    # print(f' label_dict_inverse is {label_dict_inverse}')
    preds_flat = np.argmax(preds, axis = 1).flatten()
    labels_flat = labels.flatten()
    print(f'preds_flat is {preds_flat}')
    print(f'labels_flat is {labels_flat}')


    # compute the accuracy for each class
    for label in np.unique(labels_flat):
        y_preds = preds_flat[labels_flat == label]
        y_true = labels_flat[labels_flat == label]
        print(f'y_preds is {y_preds}')
        print(f'y_true is {y_true}')
        print(f'Class: {label}')
        print(f'Accuracy: {len(y_preds[y_preds == label])}/{len(y_true)}\n')

In [78]:
def compute_metrics(preds, labels):
    # Flatten the predictions and labels
    preds_flat = np.argmax(preds, axis=1).flatten()  # Convert predicted probabilities to class predictions
    labels_flat = labels.flatten()  # Flatten the true labels

    # Compute accuracy
    accuracy = accuracy_score(labels_flat, preds_flat)

    # Compute precision, recall, and F1 score (macro average for multi-class)
    precision = precision_score(labels_flat, preds_flat, average='weighted')
    recall = recall_score(labels_flat, preds_flat, average='weighted')
    f1 = f1_score(labels_flat, preds_flat, average='weighted')

    # Print or return the computed metrics
    print(f'Accuracy: {accuracy:.4f}')
    print(f'Precision: {precision:.4f}')
    print(f'Recall: {recall:.4f}')
    print(f'F1 Score: {f1:.4f}')

    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1': f1
    }


## Task 9: Creating our Training Loop

Approach adapted from an older version of HuggingFace's `run_glue.py` script. Accessible [here](https://github.com/huggingface/transformers/blob/5bfcd0485ece086ebcbed2d008813037968a9e58/examples/run_glue.py#L128).


## Why Move Data from GPU to CPU?

### Freeing GPU Memory:
GPU memory is a scarce and valuable resource, particularly during model training or evaluation. Keeping unnecessary data on the GPU (like logits and labels) can quickly consume a lot of memory, leading to out-of-memory (OOM) errors, especially when working with large models like BERT and large datasets.  
After the model finishes its forward pass and generates the logits (predictions), the computations on the GPU are done. We don’t need these logits on the GPU anymore, so we move them to the CPU to save memory for the next batch of computations.

### Performing Further Processing on the CPU:
Once the logits and labels are moved to the CPU and converted to NumPy arrays, it becomes easier to perform certain post-processing tasks such as:
- Collecting all predictions across batches.
- Computing evaluation metrics like accuracy or F1 score, which are typically done using NumPy or other CPU-based libraries (such as sklearn).

Operations like appending predictions to a list (`predictions.append(logits)`) or performing NumPy operations are much more efficient on the CPU for these types of tasks. These operations are not computationally expensive, so the CPU is more than capable of handling them without using up precious GPU resources.


In [40]:
import random
# ensures that the results are reproducible across different runs.
seed_val = 17
# By setting the seed, any random operations performed using NumPy will also be deterministic and reproducible.
random.seed(seed_val)
np.random.seed(seed_val)
# sets the seed for PyTorch's CPU random number generator.
torch.manual_seed(seed_val)
# This sets the seed for all GPU devices in PyTorch when using CUDA (for GPU acceleration)
torch.cuda.manual_seed_all(seed_val)

In [41]:
# The model's weights and biases (i.e., the model's parameters) will be moved to the specified device.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
print(device)

cuda


In [42]:
def evaluate(dataloader_val):
    # In evaluation mode, Dropout is disabled, and the BatchNorm layers use the mean and variance computed during training.
    model.eval()

    loss_val_total = 0
    predictions, true_vals = [], []

    for batch in tqdm(dataloader_val):
        # Move each tensor in the batch to the specified device
        batch = tuple(b.to(device) for b in batch)

        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }

        # Disable gradient computation, reduce memory consumption, and improve computational efficiency.
        with torch.no_grad():
            # Pass the input data to the model for forward propagation and return the model's output.
            outputs = model(**inputs)

        # get the loss value
        loss = outputs[0]
        # get the predict value from the model
        logits = outputs[1]
        loss_val_total += loss.item()

        # Move the predictions and true labels from the GPU to the CPU and convert them to NumPy arrays to free up GPU memory.
        logits = logits.detach().cpu().numpy()
        label_ids = inputs['labels'].cpu().numpy()
        predictions.append(logits)
        true_vals.append(label_ids)

    loss_val_avg = loss_val_total/len(dataloader_val)

    # Axis 0 represents the row direction; when concatenating along axis 0, the number of rows in the new array will increase.
    predictions = np.concatenate(predictions, axis=0)
    # array1 = np.array([[1, 2, 3],
                   # [4, 5, 6]])
    # array2 = np.array([[7, 8, 9],
    #                    [10, 11, 12]])
    # result = np.concatenate([array1, array2], axis=0)
    # output : [[ 1  2  3]
    #  [ 4  5  6]
    #  [ 7  8  9]
    #  [10 11 12]]

    true_vals = np.concatenate(true_vals, axis=0)

    return loss_val_avg, predictions, true_vals


In [47]:
for epoch in tqdm(range(1, epochs+1)):
    # Dropout randomly ignores some neurons in the neural network to prevent the network from overfitting to the training data.
    # BatchNorm layers calculate the mean and variance of the current batch and use these statistics to normalize the inputs. It also updates the running mean and variance.
    model.train()
    loss_train_total = 0
    # Created an iterator with a progress bar.
    progress_bar = tqdm(dataloader_train,
                        desc='Epoch {:1d}'.format(epoch),
                        # After the progress bar iteration is complete, it will be cleared and not remain in the output.
                        leave = False,
                        # Enable progress bar
                        disable = False
                       )
    for batch in progress_bar:
        # Clears the old gradients from the previous step.
        model.zero_grad()
        # Moves each tensor in the batch to the specified device (CPU or GPU).
        batch = tuple(b.to(device) for b in batch)

        inputs = {
            'input_ids': batch[0],
            'attention_mask': batch[1],
            'labels': batch[2]
        }

        outputs = model(**inputs)
        loss = outputs[0]
        loss_train_total += loss.item()
        # Computes the gradient of the loss with respect to the model parameters.
        loss.backward()

        # Clips the gradients to prevent exploding gradients.
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        # Updates the model parameters based on the computed gradients.
        optimizer.step()
        # Updates the learning rate based on the learning rate schedule.
        scheduler.step()
        # loss.item(): Extracts the scalar value of the loss tensor, converts this tensor to a Python float.
        progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/len(batch))})

    torch.save(model.state_dict(),f'Models/Bert_ft_epoch{epoch}.model')

    tqdm.write(f'\nEpoch {epoch}')

    loss_train_avg = loss_train_total/len(dataloader_train)
    tqdm.write(f'The average loss train is {loss_train_avg}')
    # Evaluate the model's performance on the validation set.
    val_loss, predictions, true_vals = evaluate(dataloader_val)
    val_f1 = f1_score_func(predictions, true_vals)
    tqdm.write(f'Validation loss: {val_loss}')
    tqdm.write(f'F1 Score weighted: {val_f1}')

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch 1:   0%|          | 0/315 [00:00<?, ?it/s]


Epoch {epoch}
The average loss train is 0.1760167605554064


  0%|          | 0/7 [00:00<?, ?it/s]

Validation loss: 0.36939870566129684
F1 Score weighted: 0.8933691495350687


Epoch 2:   0%|          | 0/315 [00:00<?, ?it/s]


Epoch {epoch}
The average loss train is 0.14820945916460856


  0%|          | 0/7 [00:00<?, ?it/s]

Validation loss: 0.39445937105587553
F1 Score weighted: 0.8967110244016074


Epoch 3:   0%|          | 0/315 [00:00<?, ?it/s]


Epoch {epoch}
The average loss train is 0.09639013568050273


  0%|          | 0/7 [00:00<?, ?it/s]

Validation loss: 0.4368599033249276
F1 Score weighted: 0.9126769418929586


Epoch 4:   0%|          | 0/315 [00:00<?, ?it/s]


Epoch {epoch}
The average loss train is 0.058789553780788706


  0%|          | 0/7 [00:00<?, ?it/s]

Validation loss: 0.4388943381075348
F1 Score weighted: 0.9140380098227632


Epoch 5:   0%|          | 0/315 [00:00<?, ?it/s]


Epoch {epoch}
The average loss train is 0.03912910555788715


  0%|          | 0/7 [00:00<?, ?it/s]

Validation loss: 0.4783846464207662
F1 Score weighted: 0.9153013472454252


Epoch 6:   0%|          | 0/315 [00:00<?, ?it/s]


Epoch {epoch}
The average loss train is 0.029667860072379607


  0%|          | 0/7 [00:00<?, ?it/s]

Validation loss: 0.47662870426263126
F1 Score weighted: 0.9111036348671643


Epoch 7:   0%|          | 0/315 [00:00<?, ?it/s]


Epoch {epoch}
The average loss train is 0.02543116855908126


  0%|          | 0/7 [00:00<?, ?it/s]

Validation loss: 0.47504886984825134
F1 Score weighted: 0.9114025885293467


Epoch 8:   0%|          | 0/315 [00:00<?, ?it/s]


Epoch {epoch}
The average loss train is 0.02278923869543221


  0%|          | 0/7 [00:00<?, ?it/s]

Validation loss: 0.47775343449653257
F1 Score weighted: 0.9114025885293467


Epoch 9:   0%|          | 0/315 [00:00<?, ?it/s]


Epoch {epoch}
The average loss train is 0.023623919364486243


  0%|          | 0/7 [00:00<?, ?it/s]

Validation loss: 0.47805115793432507
F1 Score weighted: 0.9114025885293467


Epoch 10:   0%|          | 0/315 [00:00<?, ?it/s]


Epoch {epoch}
The average loss train is 0.022375169896431978


  0%|          | 0/7 [00:00<?, ?it/s]

Validation loss: 0.4750709448541914
F1 Score weighted: 0.9114025885293467


# Task 10: Loading and Evaluating our Model

In [48]:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=len(label_dict),
                                                      output_attentions=False,
                                                      output_hidden_states=False)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [49]:
model.to(device)

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

In [73]:
model.load_state_dict(
    torch.load('Models/Bert_ft_epoch10.model',
              map_location = torch.device('cpu'))
)

  torch.load('Models/Bert_ft_epoch10.model',


<All keys matched successfully>

In [74]:
_, predictions,true_vals = evaluate(dataloader_val)

  0%|          | 0/7 [00:00<?, ?it/s]

In [75]:
true_vals

array([0, 1, 4, 0, 2, 0, 0, 0, 4, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       1, 2, 1, 1, 0, 0, 5, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 4, 1, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 1, 0, 0, 0, 0, 0, 5, 0, 0,
       0, 0, 0, 2, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,
       0, 0, 1, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0,
       0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 4, 0, 0, 5, 0, 0, 0, 0, 0, 0, 1,
       1, 0, 0, 1, 0, 0, 2, 4, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 1, 0])

In [76]:
accuracy_per_class(predictions, true_vals)

preds_flat is [0 1 0 0 2 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 2 1 1 0 0 5 0 0 0 0 0 0 0 0
 1 0 0 0 0 0 0 1 0 0 0 0 0 0 4 0 0 0 0 0 0 0 4 0 0 1 0 0 0 1 0 0 0 0 0 0 0
 0 0 5 0 0 1 0 0 0 0 0 5 0 0 0 0 0 2 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 5 0 0 1 0 0 0 0
 0 0 0 0 1 0 0 0 1 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 0 0
 0 0 4 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0 2 2 0 0 0 0 0 0 0 1 0 2 1 0 0 0 0 1
 0]
labels_flat is [0 1 4 0 2 0 0 0 4 1 0 0 0 0 0 0 1 0 0 0 0 0 1 2 1 1 0 0 5 0 0 0 0 0 2 0 0
 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 4 1 0 0 0 1 0 0 0 0 0 0 0
 0 0 5 0 0 1 0 0 0 0 0 5 0 0 0 0 0 2 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 1
 0 0 0 0 1 0 0 0 1 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 0 1
 0 0 4 0 0 5 0 0 0 0 0 0 1 1 0 0 1 0 0 2 4 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1
 0]
y_preds is [0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

In [80]:
accuracy, precision,recall,f1 =compute_metrics(predictions, true_vals)

Accuracy: 0.9148
Precision: 0.9094
Recall: 0.9148
F1 Score: 0.9114
