# Sentiment Analysis with Deep Learning using BERT

### Prerequisites

- Intermediate-level knowledge of Python 3 (NumPy and Pandas preferably, but not required)
- Exposure to PyTorch usage
- Basic understanding of Deep Learning and Language Models (BERT specifically)

### Project Outline

**Task 1**: Introduction (this section)

**Task 2**: Exploratory Data Analysis and Preprocessing

**Task 3**: Training/Validation Split

**Task 4**: Loading Tokenizer and Encoding our Data

**Task 5**: Setting up BERT Pretrained Model

**Task 6**: Creating Data Loaders

**Task 7**: Setting Up Optimizer and Scheduler

**Task 8**: Defining our Performance Metrics

**Task 9**: Creating our Training Loop

## Introduction

### What is BERT

BERT is a large-scale transformer-based Language Model that can be finetuned for a variety of tasks.

For more information, the original paper can be found [here](https://arxiv.org/abs/1810.04805). 

[HuggingFace documentation](https://huggingface.co/transformers/model_doc/bert.html)

[Bert documentation](https://characters.fandom.com/wiki/Bert_(Sesame_Street) ;)

<img src="Images/BERT_diagrams.pdf" width="1000">

## Exploratory Data Analysis and Preprocessing

We will use the SMILE Twitter dataset.

_Wang, Bo; Tsakalidis, Adam; Liakata, Maria; Zubiaga, Arkaitz; Procter, Rob; Jensen, Eric (2016): SMILE Twitter Emotion dataset. figshare. Dataset. https://doi.org/10.6084/m9.figshare.3187909.v2_

In [None]:
import torch
import pandas as pd
from tqdm.notebook import tqdm
import csv

In [None]:
# df = pd.read_csv('circa-data.tsv')
df = pd.read_csv('circa-data.tsv', delimiter="\t", encoding='utf-8',quoting=csv.QUOTE_NONE,usecols=['context','questionX','answerY','goldstandard1','goldstandard2'])
# df.set_index('id', inplace=True)

In [None]:
df.dropna(subset=["goldstandard1"], inplace=True)
df = df[df.goldstandard1 != 'Other']
df = df[df.goldstandard1 != 'I am not sure how X will interpret Y’s answer']
print(len(df))

30958


In [None]:
df.sample(5)

Unnamed: 0,context,questionX,answerY,goldstandard1,goldstandard2
1655,X wants to know what sorts of books Y likes to...,Do you like books by American authors usually?,It makes no difference to me.,"In the middle, neither yes nor no","In the middle, neither yes nor no"
15479,X and Y are childhood neighbours who unexpecte...,Did you stay in the same neighborhood?,I moved when I went to college.,No,No
20271,X wants to know about Y's food preferences.,Would you like to eat local cuisine?,If we can have something spicy.,"Yes, subject to some conditions","Yes, subject to some conditions"
16329,Y has just moved into a neighbourhood and meet...,Did you move for work?,I relocated for this job.,Yes,Yes
29477,X wants to know what sorts of books Y likes to...,How about Stephen King?,I like Stephen King,Yes,Yes


In [None]:
df.goldstandard1.value_counts()

Yes                                              14504
No                                               10829
Yes, subject to some conditions                   2583
Probably yes / sometimes yes                      1244
Probably no                                       1160
In the middle, neither yes nor no                  638
I am not sure how X will interpret Y’s answer       63
Name: goldstandard1, dtype: int64

In [None]:
df.goldstandard2.value_counts()

Yes                                  15748
No                                   11989
Yes, subject to some conditions       2583
In the middle, neither yes nor no      701
Other                                  504
Name: goldstandard2, dtype: int64

In [None]:
possible_labels = df.goldstandard1.unique()

In [None]:
possible_labels

array(['Yes', 'No', 'In the middle, neither yes nor no',
       'Probably yes / sometimes yes', 'Probably no',
       'Yes, subject to some conditions'], dtype=object)

In [None]:
label_dict = {}
for index, possible_label in enumerate(possible_labels):
    label_dict[possible_label] = index

In [None]:
label_dict

{'In the middle, neither yes nor no': 2,
 'No': 1,
 'Probably no': 4,
 'Probably yes / sometimes yes': 3,
 'Yes': 0,
 'Yes, subject to some conditions': 5}

In [None]:
df['goldstandard1'] = df.goldstandard1.replace(label_dict)

In [None]:
df.head()

Unnamed: 0,context,questionX,answerY,goldstandard1,goldstandard2
0,Y has just travelled from a different city to ...,Are you employed?,I'm a veterinary technician.,0,Yes
1,X wants to know about Y's food preferences.,Are you a fan of Korean food?,I wouldn't say so,1,No
2,Y has just told X that he/she is thinking of b...,Are you bringing any pets into the flat?,I do not own any pets,1,No
3,X wants to know what activities Y likes to do ...,Would you like to get some fresh air in your f...,I am desperate to get out of the city.,0,Yes
4,X and Y are childhood neighbours who unexpecte...,Is your family still living in the neighborhood?,My parents are snowbirds now.,2,"In the middle, neither yes nor no"


## Training/Validation Split

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_val_test, y_train, y_val_test = train_test_split(df.index.values, 
                                                  df.goldstandard2.values, 
                                                  test_size=0.6, 
                                                  random_state=17, 
                                                  stratify=df.goldstandard2.values)
X_test, X_val, y_test, y_val = train_test_split(df.loc[X_val_test].index.values, 
                                                  df.loc[X_val_test].goldstandard2.values, 
                                                  test_size=0.5, 
                                                  random_state=17, 
                                                  stratify=df.loc[X_val_test].goldstandard2.values)

In [None]:
df['data_type'] = ['not_set']*df.shape[0]

In [None]:
df.loc[X_train, 'data_type'] = 'train'
df.loc[X_val, 'data_type'] = 'val'
df.loc[X_test,'data_type'] = 'test'

In [None]:
df.sample(5)

Unnamed: 0,context,questionX,answerY,goldstandard1,goldstandard2,data_type
4309,Y has just travelled from a different city to ...,Would you like to meet my new boyfriend?,I would really enjoy getting to know him.,0,Yes,test
14003,Y has just told X that he/she is thinking of b...,Is New York a nice place?,It's my favorite city.,0,Yes,train
18544,X wants to know what sorts of books Y likes to...,Are you interested in short stories?,some of them are okay,3,Yes,val
22981,Y has just moved into a neighbourhood and meet...,Do you work in the area?,I worl close by,0,Yes,val
11487,X and Y are colleagues who are leaving work on...,Do you know if it's raining outside?,I can see it out the window.,0,Yes,val


In [None]:
ans = df.groupby(['context']).count()

In [None]:
ans

Unnamed: 0_level_0,questionX,answerY,goldstandard1,goldstandard2,data_type
context,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
X and Y are childhood neighbours who unexpectedly run into each other at a cafe.,3047,3047,3047,3047,3047
X and Y are colleagues who are leaving work on a Friday at the same time.,3155,3155,3155,3155,3155
X wants to know about Y's food preferences.,2920,2920,2920,2920,2920
X wants to know about Y's music preferences.,3183,3183,3183,3183,3183
X wants to know what activities Y likes to do during weekends.,3203,3203,3203,3203,3203
X wants to know what sorts of books Y likes to read.,3139,3139,3139,3139,3139
Y has just moved into a neighbourhood and meets his/her new neighbour X.,3003,3003,3003,3003,3003
Y has just told X that he/she is considering switching his/her job.,3063,3063,3063,3063,3063
Y has just told X that he/she is thinking of buying a flat in New York.,3068,3068,3068,3068,3068
Y has just travelled from a different city to meet X.,3177,3177,3177,3177,3177


## Loading Tokenizer and Encoding our Data

In [None]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.13.0-py3-none-any.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 4.3 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 40.1 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.2.1-py3-none-any.whl (61 kB)
[K     |████████████████████████████████| 61 kB 579 kB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.46-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 38.9 MB/s 
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 36.8 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
  Attem

In [None]:
from transformers import BertTokenizer
from torch.utils.data import TensorDataset

In [None]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', 
                                          do_lower_case=True)

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [None]:
encoded_data_train = tokenizer.batch_encode_plus(
    df[df.data_type=='train'].questionX.values, 
    add_special_tokens=True, 
    return_attention_mask=True, 
    pad_to_max_length=True, 
    max_length=15, 
    return_tensors='pt'
)

encoded_data_val = tokenizer.batch_encode_plus(
    df[df.data_type=='val'].questionX.values, 
    add_special_tokens=True, 
    return_attention_mask=True, 
    pad_to_max_length=True, 
    max_length=15, 
    return_tensors='pt'
)

encoded_data_test = tokenizer.batch_encode_plus(
    df[df.data_type=='test'].answerY.values, 
    add_special_tokens=True, 
    return_attention_mask=True, 
    pad_to_max_length=True, 
    max_length=15, 
    return_tensors='pt'
)

input_ids_train = encoded_data_train['input_ids']
attention_masks_train = encoded_data_train['attention_mask']
labels_train = torch.tensor(df[df.data_type=='train'].goldstandard1.values)

input_ids_val = encoded_data_val['input_ids']
attention_masks_val = encoded_data_val['attention_mask']
labels_val = torch.tensor(df[df.data_type=='val'].goldstandard1.values)

input_ids_test = encoded_data_test['input_ids']
attention_masks_test = encoded_data_test['attention_mask']
labels_test = torch.tensor(df[df.data_type=='test'].goldstandard1.values)



In [None]:
df[df.data_type=='train'].questionX.values  + '[SEP]' + df[df.data_type=='train'].answerY.values

array(["Are you employed?[SEP]I'm a veterinary technician.",
       'Are you bringing any pets into the flat?[SEP]I do not own any pets',
       'Is your family still living in the neighborhood?[SEP]My parents are snowbirds now.',
       ..., 'Do you drink beer?[SEP]All alcohol is great.',
       'Do you like pie?[SEP]My favorite pie is pecan.',
       "Want to go to a concert with me?[SEP]I'd rather do something else."],
      dtype=object)

In [None]:
dataset_train = TensorDataset(input_ids_train, attention_masks_train, labels_train)
dataset_val = TensorDataset(input_ids_val, attention_masks_val, labels_val)
dataset_test = TensorDataset(input_ids_test, attention_masks_test, labels_test)

In [None]:
len(dataset_train)

12610

In [None]:
len(dataset_val)

9458

In [None]:
len(dataset_test)

9457

## Setting up BERT Pretrained Model

In [None]:
from transformers import BertForSequenceClassification

In [None]:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=len(label_dict),
                                                      output_attentions=False,
                                                      output_hidden_states=False)
#"ishan/bert-base-uncased-mnli"


Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

## Creating Data Loaders

In [None]:
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

In [None]:
batch_size = 32

dataloader_train = DataLoader(dataset_train, 
                              sampler=RandomSampler(dataset_train), 
                              batch_size=batch_size)

dataloader_validation = DataLoader(dataset_val, 
                                   sampler=SequentialSampler(dataset_val), 
                                   batch_size=batch_size)

dataloader_test = DataLoader(dataset_test, 
                                   sampler=SequentialSampler(dataset_test), 
                                   batch_size=batch_size)

## Setting Up Optimiser and Scheduler

In [None]:
from transformers import AdamW, get_linear_schedule_with_warmup

In [None]:
optimizer = AdamW(model.parameters(),
                  lr=2e-5, 
                  eps=1e-8)

In [None]:
epochs = 3

scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps=0,
                                            num_training_steps=len(dataloader_train)*epochs)

## Defining our Performance Metrics

Accuracy metric approach originally used in accuracy function in [this tutorial](https://mccormickml.com/2019/07/22/BERT-fine-tuning/#41-bertforsequenceclassification).

In [None]:
import numpy as np

In [None]:
from sklearn.metrics import f1_score

In [None]:
def f1_score_func(preds, labels):
    preds_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return f1_score(labels_flat, preds_flat, average='weighted')

In [None]:
def accuracy_per_class(preds, labels):
    label_dict_inverse = {v: k for k, v in label_dict.items()}
    
    preds_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    total_correct = 0;total_samples = 0
    for label in np.unique(labels_flat):
        y_preds = preds_flat[labels_flat==label]
        y_true = labels_flat[labels_flat==label]
        print(f'Class: {label_dict_inverse[label]}')
        print(f'Accuracy: {len(y_preds[y_preds==label])}/{len(y_true)}')
        print('Correct Predictions ', str(label) , len(y_preds[y_preds==label])/len(y_true))
        total_correct += len(y_preds[y_preds==label])
        total_samples += len(y_true)
    print('Total Correct Predictions ',total_correct/total_samples)
    

## Creating our Training Loop

Approach adapted from an older version of HuggingFace's `run_glue.py` script. Accessible [here](https://github.com/huggingface/transformers/blob/5bfcd0485ece086ebcbed2d008813037968a9e58/examples/run_glue.py#L128).

In [None]:
import random

seed_val = 17
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

print(device)

cuda


In [None]:
def evaluate(dataloader_val):

    model.eval()
    
    loss_val_total = 0
    predictions, true_vals = [], []
    
    for batch in dataloader_val:
        
        batch = tuple(b.to(device) for b in batch)
        
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }

        with torch.no_grad():        
            outputs = model(**inputs)
            
        loss = outputs[0]
        logits = outputs[1]
        loss_val_total += loss.item()

        logits = logits.detach().cpu().numpy()
        label_ids = inputs['labels'].cpu().numpy()
        predictions.append(logits)
        true_vals.append(label_ids)
    
    loss_val_avg = loss_val_total/len(dataloader_val) 
    
    predictions = np.concatenate(predictions, axis=0)
    true_vals = np.concatenate(true_vals, axis=0)
            
    return loss_val_avg, predictions, true_vals

In [None]:
for epoch in tqdm(range(1, epochs+1)):
    
    model.train()
    
    loss_train_total = 0

    progress_bar = tqdm(dataloader_train, desc='Epoch {:1d}'.format(epoch), leave=False, disable=False)
    for batch in progress_bar:

        model.zero_grad()
        
        batch = tuple(b.to(device) for b in batch)
        
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }       

        outputs = model(**inputs)
        
        loss = outputs[0]
        loss_train_total += loss.item()
        loss.backward()

        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        optimizer.step()
        scheduler.step()
        
        progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/len(batch))})
         
        
    torch.save(model.state_dict(), f'finetuned_BERT_epoch_{epoch}.model')
        
    tqdm.write(f'\nEpoch {epoch}')
    
    loss_train_avg = loss_train_total/len(dataloader_train)            
    tqdm.write(f'Training loss: {loss_train_avg}')
    
    val_loss, predictions, true_vals = evaluate(dataloader_validation)
    val_f1 = f1_score_func(predictions, true_vals)
    tqdm.write(f'Validation loss: {val_loss}')
    tqdm.write(f'F1 Score (Weighted): {val_f1}')

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch 1:   0%|          | 0/387 [00:00<?, ?it/s]

In [None]:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=len(label_dict),
                                                      output_attentions=False,
                                                      output_hidden_states=False)

model.to(device)

In [None]:
# model.load_state_dict(torch.load('Models/<<INSERT MODEL NAME HERE>>.model', map_location=torch.device('cpu')))

<All keys matched successfully>

In [None]:
_, predictions, true_vals = evaluate(dataloader_validation)

In [None]:
accuracy_per_class(predictions, true_vals)

Class: Yes
Accuracy: 3475/4725
Correct Predictions  0 0.7354497354497355
Class: No
Accuracy: 1645/3597
Correct Predictions  1 0.45732554906866835
Class: In the middle, neither yes nor no
Accuracy: 0/210
Correct Predictions  2 0.0
Class: Yes, subject to some conditions
Accuracy: 18/775
Correct Predictions  3 0.023225806451612905
Class: Other
Accuracy: 133/151
Correct Predictions  4 0.8807947019867549
Total Correct Predictions  0.5573059843518714


In [None]:
val_f1 = f1_score_func(predictions, true_vals)

In [None]:
predictions

array([[ 1.514964  ,  1.2869047 , -0.9304254 ,  1.1301137 , -2.9246495 ],
       [ 2.8078253 ,  1.6857846 , -0.8008237 , -1.7124525 , -2.4481955 ],
       [ 1.6388955 ,  1.7505739 , -1.0769817 ,  0.8898702 , -3.096721  ],
       ...,
       [ 2.0944319 ,  2.7399504 , -1.3699355 , -0.4011945 , -3.3582625 ],
       [ 1.7346531 ,  1.3869745 , -1.1197953 ,  1.0050094 , -2.9913623 ],
       [ 1.2897934 , -0.30967528, -0.19036706, -0.06392112, -0.12975693]],
      dtype=float32)

In [None]:
val_f1

0.5239151713681761

In [None]:
preds = np.argmax(predictions, axis=1).flatten()

NameError: ignored

In [None]:
from sklearn.metrics import classification_report

In [None]:
# target_names = ['Yes','No','Yes, subject to some conditions','In the middle, neither yes nor no','Other']
print(classification_report(true_vals,preds,target_names=target_names))

NameError: ignored

In [None]:
_, predictions, true_vals = evaluate(dataloader_test)

In [None]:
accuracy_per_class(predictions, true_vals)

Class: Yes
Accuracy: 4305/4724
Correct Predictions  0 0.9113039796782387
Class: No
Accuracy: 204/3596
Correct Predictions  1 0.05672969966629588
Class: In the middle, neither yes nor no
Accuracy: 0/211
Correct Predictions  2 0.0
Class: Yes, subject to some conditions
Accuracy: 0/775
Correct Predictions  3 0.0
Class: Other
Accuracy: 7/151
Correct Predictions  4 0.046357615894039736
Total Correct Predictions  0.4775298720524479


In [None]:
preds = np.argmax(predictions, axis=1).flatten()

In [None]:
target_names = ['Yes','No','Yes, subject to some conditions','In the middle, neither yes nor no','Other']
print(classification_report(true_vals,preds,target_names=target_names))

                                   precision    recall  f1-score   support

                              Yes       0.57      0.74      0.64      4725
                               No       0.52      0.46      0.49      3597
  Yes, subject to some conditions       0.00      0.00      0.00       210
In the middle, neither yes nor no       0.33      0.02      0.04       775
                            Other       0.92      0.88      0.90       151

                         accuracy                           0.56      9458
                        macro avg       0.47      0.42      0.41      9458
                     weighted avg       0.52      0.56      0.52      9458



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
