# Emotion analysis with BERT

Using transformers with the distilled bert-base model on the go-emotions dataset, to perform emotion analysis based on the circumplex model.

Written by Luc Bijl.

Retrieving the go-emotions training and testing dataset from the datasets directory.

In [1]:
import os
import pandas as pd

go_emotions_train_dataset = "../../datasets/go-emotions/train.tsv"
go_emotions_test_dataset = "../../datasets/go-emotions/test.tsv"
go_emotions_labels = "../../datasets/go-emotions/emotions-labeled.csv"

column_names = ['Text', 'Label', 'ID']

df_train_raw = pd.read_csv(go_emotions_train_dataset, delimiter='\t', header=None, names=column_names)
df_test_raw = pd.read_csv(go_emotions_test_dataset, delimiter='\t', header=None, names=column_names)
df_labels = pd.read_csv(go_emotions_labels, header=0, names=['Label', 'V', 'A', 'D'])

print("Train Data:")
print(df_train_raw.head())
print("\nTest Data:")
print(df_test_raw.head())
print("\nLabels")
print(df_labels.head())

Train Data:
                                                Text Label       ID
0  My favourite food is anything I didn't have to...    27  eebbqej
1  Now if he does off himself, everyone will thin...    27  ed00q6i
2                     WHY THE FUCK IS BAYLESS ISOING     2  eezlygj
3                        To make her feel threatened    14  ed7ypvh
4                             Dirty Southern Wankers     3  ed0bdzj

Test Data:
                                                Text Label       ID
0  I’m really sorry about your situation :( Altho...    25  eecwqtt
1    It's wonderful because it's awful. At not with.     0  ed5f85d
2  Kings fan here, good luck to you guys! Will be...    13  een27c3
3  I didn't know that, thank you for teaching me ...    15  eelgwd1
4  They got bored from haunting earth for thousan...    27  eem5uti

Labels
        Label     V     A     D
0  admiration  0.80  0.50  0.70
1   amusement  0.70  0.80  0.50
2       anger -0.43  0.67  0.34
3   annoyance -0.60  0.6

Converting the labels in the training and testing dataset to the circumplex model with valence, arousal and dominance.

In [2]:
import numpy as np

mapping_train = []
mapping_test = []

df_train = df_train_raw[['Text']].copy()
df_test = df_test_raw[['Text']].copy()

for index, row in df_train_raw.iterrows():
    matrix = []
    labels = [int(label) for label in row['Label'].split(',')]
    
    for label in labels:
        matrix.extend([list(df_labels.loc[label][['V', 'A', 'D']])])
    
    mapping_train.extend([np.dot(np.array(matrix).reshape(3, len(labels)), np.ones(len(labels))) / len(labels)])

df_train[['V', 'A', 'D']] = mapping_train

for index, row in df_test_raw.iterrows():
    matrix = []
    labels = [int(label) for label in row['Label'].split(',')]
    
    for label in labels:
        matrix.extend([list(df_labels.loc[label][['V', 'A', 'D']])])
    
    mapping_test.extend([np.dot(np.array(matrix).reshape(3, len(labels)), np.ones(len(labels))) / len(labels)])

df_test[['V', 'A', 'D']] = mapping_test

Evaluating the summary statistics of the training dataset.

In [4]:
df_train[['V', 'A', 'D']].describe()

Unnamed: 0,V,A,D
count,43410.0,43410.0,43410.0
mean,0.208837,0.322031,0.204614
std,0.493538,0.325608,0.314346
min,-0.7,-0.665,-0.7
25%,0.0,0.0,0.0
50%,0.0,0.4,0.2
75%,0.7,0.55,0.5
max,0.9,0.9,0.8


Printing the most extreme sentences in the training set in either of the three dimensions.

In [5]:
for i in ['V','A','D']:
    print("Min {}:\n{}".format(i, df_train.loc[df_train[i].argmin()]))
    print()
    print("Max {}:\n{}".format(i, df_train.loc[df_train[i].argmax()]))
    print()
    print()

Min V:
Text    He was off by 5 minutes, not impressed. 
V                                           -0.7
A                                            0.5
D                                           -0.6
Name: 87, dtype: object

Max V:
Text    Very interesting. Thx
V                         0.9
A                         0.7
D                         0.4
Name: 54, dtype: object


Min A:
Text    I wasn't meaning it as an insult or anything, ...
V                                                    -0.2
A                                                  -0.665
D                                                   -0.03
Name: 399, dtype: object

Max A:
Text    This...has 9k upvotes. Wow.
V                               0.9
A                               0.9
D                               0.6
Name: 63, dtype: object


Min D:
Text    Apologies, I take it all back as I’ve just see...
V                                                    -0.6
A                                                     

In [8]:
for i in ['V','A','D']:
    print("Min {}:\n{}".format(i, df_test.loc[df_test[i].argmin()]))
    print()
    print("Max {}:\n{}".format(i, df_test.loc[df_test[i].argmax()]))
    print()
    print()

Min V:
Text    Crap. I need more Excedrin. STAT.
V                                    -0.7
A                                     0.5
D                                    -0.6
Name: 68, dtype: object

Max V:
Text    Kings fan here, good luck to you guys! Will be...
V                                                     0.9
A                                                     0.9
D                                                     0.6
Name: 2, dtype: object


Min A:
Text    And [NAME], would again like to apologize for ...
V                                                    -0.2
A                                                  -0.665
D                                                   -0.03
Name: 182, dtype: object

Max A:
Text    Kings fan here, good luck to you guys! Will be...
V                                                     0.9
A                                                     0.9
D                                                     0.6
Name: 2, dtype: object


Min D:


Determining the length of the training and testing dataset, to set a proper batch size.

In [6]:
print(f"Length training set: {len(df_train)}\nLength testing set: {len(df_test)}")

Length training set: 43410
Length testing set: 5427


Preparing the data for BERT, this includes tokenization, encoding and creating dataloaders for both training and testing datasets.

In [9]:
import torch
from transformers import DistilBertTokenizerFast
from torch.utils.data import DataLoader
from transformers import DistilBertForSequenceClassification

model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizerFast.from_pretrained(model_name)

# Tokenizing and encoding the text data
train_encodings = tokenizer(df_train['Text'].tolist(), truncation=True, padding=True, return_tensors='pt')
test_encodings = tokenizer(df_test['Text'].tolist(), truncation=True, padding=True, return_tensors='pt')

# Creating data loaders
train_dataset = torch.utils.data.TensorDataset(
    train_encodings['input_ids'], 
    train_encodings['attention_mask'], 
    torch.tensor(df_train[['V', 'A', 'D']].values, dtype=torch.float32)
)
train_dataloader = DataLoader(train_dataset, batch_size=30, shuffle=True)

test_dataset = torch.utils.data.TensorDataset(
    test_encodings['input_ids'], 
    test_encodings['attention_mask'], 
    torch.tensor(df_test[['V', 'A', 'D']].values, dtype=torch.float32)
)
test_dataloader = DataLoader(test_dataset, batch_size=27, shuffle=False)

  torch.utils._pytree._register_pytree_node(


Defining the model: distilbert.

In [10]:
model = DistilBertForSequenceClassification.from_pretrained(model_name, num_labels=3)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier.weight', 'pre_classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Defining the optimizer and loss function.

In [11]:
from torch.optim import Adam
from torch.nn import L1Loss

optimizer = Adam(model.parameters(), lr=1e-5)
loss_fn = L1Loss()

Defining the training loop, here BERT will be trained with the training dataset and validated with the test dataset.

In [12]:
from torch.utils.tensorboard import SummaryWriter

log_dir = 'bert-go-emotion-1/logs'
writer = SummaryWriter(log_dir)
global_step = 0

num_epochs = 30

early_stop_patience = 2
best_validation_loss = float('inf')
no_improvement_counter = 0

for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    total_loss_v = 0
    total_loss_a = 0
    total_loss_d = 0
    num_batches = 0

    for batch in train_dataloader:
        global_step += 1
        num_batches += 1
        input_ids, attention_mask, target_scores = batch

        # Forward pass
        output = model(input_ids=input_ids, attention_mask=attention_mask)
        predicted_scores = output.logits

        # Calculating the loss for each dimensions
        loss_v = loss_fn(predicted_scores[:,0], target_scores[:,0])
        loss_a = loss_fn(predicted_scores[:,1], target_scores[:,1])
        loss_d = loss_fn(predicted_scores[:,2], target_scores[:,2])

        # The main loss is defined as the sum of the individual losses
        loss = loss_v + loss_a + loss_d

        # The total loss per epoch
        total_loss_v += loss_v.item()
        total_loss_a += loss_a.item()
        total_loss_d += loss_d.item()
        total_loss += loss.item()

        # Determining the average loss in the epoch
        average_loss_v = total_loss_v / num_batches
        average_loss_a = total_loss_a / num_batches
        average_loss_d = total_loss_d / num_batches
        average_loss = total_loss / num_batches

        # Tensorboard logging
        writer.add_scalar('Batch-loss-train-valence', average_loss_v, global_step)
        writer.add_scalar('Batch-loss-train-arousal', average_loss_a, global_step)
        writer.add_scalar('Batch-loss-train-dominance', average_loss_d, global_step)
        writer.add_scalar('Batch-loss-train', average_loss, global_step)

        # Backward pass and optimization
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

    # Determining the average loss for the epoch
    average_loss_v = total_loss_v / len(train_dataloader)
    average_loss_a = total_loss_a / len(train_dataloader)
    average_loss_d = total_loss_d / len(train_dataloader)
    average_loss = total_loss / len(train_dataloader)
    
    # Logging
    writer.add_scalar('Epoch-loss-train-valence', average_loss_v, epoch + 1)
    writer.add_scalar('Epoch-loss-train-arousal', average_loss_a, epoch + 1)
    writer.add_scalar('Epoch-loss-train-dominance', average_loss_d, epoch + 1)
    writer.add_scalar('Epoch-loss-train', average_loss, epoch + 1)
    print(f"Epoch {epoch + 1}/{num_epochs}, Training Loss: {average_loss:.4f}")
    
    # Validation
    model.eval()
    total_loss = 0
    total_loss_v = 0
    total_loss_a = 0
    total_loss_d = 0

    for batch in test_dataloader:
        with torch.no_grad():
            input_ids, attention_mask, target_scores = batch

            # Obtaining the scores
            output = model(input_ids=input_ids, attention_mask=attention_mask)
            predicted_scores = output.logits

            # Calculating the loss for each dimensions
            loss_v = loss_fn(predicted_scores[:,0], target_scores[:,0])
            loss_a = loss_fn(predicted_scores[:,1], target_scores[:,1])
            loss_d = loss_fn(predicted_scores[:,2], target_scores[:,2])
            
            # The main loss is defined as the sum of the individual losses
            loss = loss_v + loss_a + loss_d

            # The total loss per epoch
            total_loss_v += loss_v.item()
            total_loss_a += loss_a.item()
            total_loss_d += loss_d.item()
            total_loss += loss.item()

    # Determining the average loss for the epoch
    average_loss_v = total_loss_v / len(test_dataloader)
    average_loss_a = total_loss_a / len(test_dataloader)
    average_loss_d = total_loss_d / len(test_dataloader)
    average_loss = total_loss / len(test_dataloader)

    # Logging  
    writer.add_scalar('Epoch-loss-validation-valence', average_loss_v, epoch + 1)
    writer.add_scalar('Epoch-loss-validation-arousal', average_loss_a, epoch + 1)
    writer.add_scalar('Epoch-loss-validation-dominance', average_loss_d, epoch + 1)
    writer.add_scalar('Epoch-loss-validation', average_loss, epoch + 1)
    print(f"Epoch {epoch + 1}/{num_epochs}, Validation Loss: {average_loss:.4f}\n")

    # Saving the model
    torch.save(model, f'bert-go-emotion-1/{epoch + 1}.pth')

    # Early stopping check
    if average_loss < best_validation_loss:
        best_validation_loss = average_loss
        no_improvement_counter = 0
    else:
        no_improvement_counter += 1

    if no_improvement_counter >= early_stop_patience:
        break

writer.close()

Epoch 1/30, Training Loss: 0.6840
Epoch 1/30, Validation Loss: 0.6063

Epoch 2/30, Training Loss: 0.5961
Epoch 2/30, Validation Loss: 0.5880

Epoch 3/30, Training Loss: 0.5556
Epoch 3/30, Validation Loss: 0.5794

Epoch 4/30, Training Loss: 0.5192
Epoch 4/30, Validation Loss: 0.5826

Epoch 5/30, Training Loss: 0.4942
Epoch 5/30, Validation Loss: 0.5743

Epoch 6/30, Training Loss: 0.4677
Epoch 6/30, Validation Loss: 0.5806

Epoch 7/30, Training Loss: 0.4501
Epoch 7/30, Validation Loss: 0.5746



Loading a version of the model.

In [19]:
model = torch.load('bert-go-emotion-1/7.pth')

Evaluating the model, with as output the MAE, MSE and R-value.

In [20]:
from scipy.stats import pearsonr
from sklearn.metrics import mean_squared_error, mean_absolute_error

model.eval()
list_predicted_scores = {'V': [], 'A': [], 'D': []}

for batch in test_dataloader:
    with torch.no_grad():
        input_ids, attention_mask, target_scores = batch

        # Obtaining the scores
        output = model(input_ids=input_ids, attention_mask=attention_mask)
        predicted_scores = output.logits

        # Writing the scores to the list
        list_predicted_scores['V'].extend(predicted_scores[:, 0].tolist())
        list_predicted_scores['A'].extend(predicted_scores[:, 1].tolist())
        list_predicted_scores['D'].extend(predicted_scores[:, 2].tolist())

# Inserting the scores in df_test
for i,j in zip(['V', 'A', 'D'],['V-p', 'A-p', 'D-p']):
    df_test[j] = list_predicted_scores[i]

# Computing the R, MSE and MAE values.
for i,j in zip(['V', 'A', 'D'],['V-p', 'A-p', 'D-p']):

    correlation, _ = pearsonr(df_test[i], df_test[j])

    print(f"Pearson Correlation Coefficient (R) {i}: {correlation:.4f}")
    print(f"Mean Squared Error (MSE) {i}: {mean_squared_error(df_test[i], df_test[j]):.4f}")
    print(f"Mean Absolute Error (MAE) {i}: {mean_absolute_error(df_test[i], df_test[j]):.4f}")
    print()

Pearson Correlation Coefficient (R) V: 0.6871
Mean Squared Error (MSE) V: 0.1450
Mean Absolute Error (MAE) V: 0.2278

Pearson Correlation Coefficient (R) A: 0.5471
Mean Squared Error (MSE) A: 0.0838
Mean Absolute Error (MAE) A: 0.1825

Pearson Correlation Coefficient (R) D: 0.5982
Mean Squared Error (MSE) D: 0.0696
Mean Absolute Error (MAE) D: 0.1643



Evaluating the summary statistics of the testing dataset and the predicted values.

In [15]:
df_test[['V', 'V-p', 'A', 'A-p', 'D', 'D-p']].describe()

Unnamed: 0,V,V-p,A,A-p,D,D-p
count,5427.0,5427.0,5427.0,5427.0,5427.0,5427.0
mean,0.199616,0.217029,0.317785,0.315409,0.204603,0.203345
std,0.497983,0.459614,0.320525,0.28404,0.313527,0.268033
min,-0.7,-0.742525,-0.665,-0.341613,-0.7,-0.634617
25%,0.0,7e-06,0.0,1.2e-05,0.0,-1.6e-05
50%,0.0,0.000998,0.4,0.395903,0.2,0.139291
75%,0.7,0.720704,0.5,0.521588,0.5,0.418724
max,0.9,0.941922,0.9,0.938188,0.75,0.753028


Printing the most extreme sentences in the test set in either of the six dimensions.

In [16]:
for i in ['V', 'V-p', 'A', 'A-p', 'D', 'D-p']:
    print("Min {}:\n{}".format(i, df_test.loc[df_test[i].argmin()]))
    print()
    print("Max {}:\n{}".format(i, df_test.loc[df_test[i].argmax()]))
    print()
    print()

Min V:
Text    Crap. I need more Excedrin. STAT.
V                                    -0.7
A                                     0.5
D                                    -0.6
V-p                             -0.238292
A-p                              0.348867
D-p                              0.091098
Name: 68, dtype: object

Max V:
Text    Kings fan here, good luck to you guys! Will be...
V                                                     0.9
A                                                     0.9
D                                                     0.6
V-p                                              0.765529
A-p                                              0.597058
D-p                                              0.513681
Name: 2, dtype: object


Min V-p:
Text    I'm a little disappointed that the tier markin...
V                                                    -0.7
A                                                     0.5
D                                                    