# Sentiment Analysis with Deep Learning using BERT

### Prerequisites

- Intermediate-level knowledge of Python 3 (NumPy and Pandas preferably, but not required)
- Exposure to PyTorch usage
- Basic understanding of Deep Learning and Language Models (BERT specifically)

## Introduction

### What is BERT

BERT is a large-scale transformer-based Language Model that can be finetuned for a variety of tasks.

For more information, the original paper can be found [here](https://arxiv.org/abs/1810.04805). 

[HuggingFace documentation](https://huggingface.co/transformers/model_doc/bert.html)

[Bert documentation](https://characters.fandom.com/wiki/Bert_(Sesame_Street) ;)

##  Exploratory Data Analysis and Preprocessing

We will use the SMILE Twitter dataset.

_Wang, Bo; Tsakalidis, Adam; Liakata, Maria; Zubiaga, Arkaitz; Procter, Rob; Jensen, Eric (2016): SMILE Twitter Emotion dataset. figshare. Dataset. https://doi.org/10.6084/m9.figshare.3187909.v2_

In [1]:
import torch
import pandas as pd
from tqdm.notebook import trange, tqdm

C:\Users\abdul\.conda\envs\tf2x\lib\site-packages\numpy\.libs\libopenblas.WCDJNK7YVMPZQ2ME2ZZHJJRJ3JIKNDB7.gfortran-win_amd64.dll
C:\Users\abdul\.conda\envs\tf2x\lib\site-packages\numpy\.libs\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll
  stacklevel=1)


In [2]:
# TDQ is a A Fast, Extensible Progress Bar for Python and CLI

In [3]:

for i in trange(10):
    print(i)

  0%|          | 0/10 [00:00<?, ?it/s]

0
1
2
3
4
5
6
7
8
9


In [60]:
torch.cuda.is_available()

False

In [61]:
df = pd.read_csv('BTC_tweets_daily_example.csv')


In [62]:
df.columns

Index(['Unnamed: 0', 'Date', 'Tweet', 'Screen_name', 'Source', 'Link',
       'Sentiment', 'sent_score', 'New_Sentiment_Score',
       'New_Sentiment_State'],
      dtype='object')

In [63]:
df.rename(columns = {'Unnamed: 0':'id'}, inplace = True)

In [64]:
df = df[['id', 'Tweet', 'Sentiment']]
df.shape

(50873, 3)

In [50]:
df.set_index('id', inplace=True)

In [65]:
df.Tweet.iloc[0]

"RT @ALXTOKEN: Paul Krugman, Nobel Luddite. I had to tweak the nose of this Bitcoin enemy. He says such foolish things. Here's the link: htt…"

In [73]:
df = df[df['Sentiment'] != '0.0']

In [74]:
df.Sentiment.unique()

array(["['neutral']", "['positive']", "['negative']", nan], dtype=object)

In [75]:
df.Sentiment.value_counts()

['positive']    22937
['neutral']     21932
['negative']     5983
Name: Sentiment, dtype: int64

In [76]:
df.shape

(50866, 3)

In [77]:
df = df[df.Sentiment.notnull()]
df.shape

(50852, 3)

In [7]:
df = df[~df.category.str.contains('\|')]

In [8]:
df = df[df.category!= 'nocode']

In [78]:
df.Sentiment.value_counts()

['positive']    22937
['neutral']     21932
['negative']     5983
Name: Sentiment, dtype: int64

It is unbalanced

In [79]:
possible_labels = df.Sentiment.unique()

In [80]:
label_dict = {}
for index, possible_label in enumerate(possible_labels):
    label_dict[possible_label]= index

In [81]:
label_dict

{"['neutral']": 0, "['positive']": 1, "['negative']": 2}

In [83]:
df['label'] = df.Sentiment.replace(label_dict)
df.head()

Unnamed: 0,id,Tweet,Sentiment,label
0,0,"RT @ALXTOKEN: Paul Krugman, Nobel Luddite. I h...",['neutral'],0
1,1,@lopp @_Kevin_Pham @psycho_sage @naval But @Pr...,['neutral'],0
2,2,RT @tippereconomy: Another use case for #block...,['positive'],1
3,3,free coins https://t.co/DiuoePJdap,['positive'],1
4,4,RT @payvxofficial: WE are happy to announce th...,['positive'],1


In [85]:
df['Tweet'].iloc[0]

"RT @ALXTOKEN: Paul Krugman, Nobel Luddite. I had to tweak the nose of this Bitcoin enemy. He says such foolish things. Here's the link: htt…"

## Training/Validation Split

In [86]:
from sklearn.model_selection import train_test_split

In [87]:
df.index.values

array([    0,     1,     2, ..., 50870, 50871, 50872], dtype=int64)

In [88]:
df.label.values

array([0, 0, 1, ..., 0, 1, 1], dtype=int64)

In [89]:
X_train, X_val, y_train, y_val = train_test_split(
    df.index.values,
    df.label.values,
    test_size=0.15,
    random_state=17,
    stratify=df.label.values
)

In [90]:
X_train

array([40003, 22660,   170, ..., 44471,  2570, 32508], dtype=int64)

In [91]:
y_train

array([2, 0, 1, ..., 0, 1, 0], dtype=int64)

In [92]:
df['data_type']= ['no_set']*df.shape[0]

In [93]:
X_train

array([40003, 22660,   170, ..., 44471,  2570, 32508], dtype=int64)

In [94]:
df.loc[X_train, 'data_type']='train'
df.loc[X_val, 'data_type']='val'

In [96]:
df.groupby(['Sentiment','label','data_type']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,id,Tweet
Sentiment,label,data_type,Unnamed: 3_level_1,Unnamed: 4_level_1
['negative'],2,train,5086,5086
['negative'],2,val,897,897
['neutral'],0,train,18642,18642
['neutral'],0,val,3290,3290
['positive'],1,train,19496,19496
['positive'],1,val,3441,3441


##  Loading Tokenizer and Encoding our Data

In [98]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.18.0-py3-none-any.whl (4.0 MB)
Collecting filelock
  Using cached filelock-3.6.0-py3-none-any.whl (10.0 kB)
Collecting sacremoses
  Downloading sacremoses-0.0.53.tar.gz (880 kB)
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.5.1-py3-none-any.whl (77 kB)
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-win_amd64.whl (3.3 MB)
Collecting regex!=2019.12.17
  Downloading regex-2022.4.24-cp37-cp37m-win_amd64.whl (261 kB)
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py): started
  Building wheel for sacremoses (setup.py): finished with status 'done'
  Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895253 sha256=211386fb338e5fa9d7fde75746bcf2072cf64d903164a5c8c51292697681dcc9
  Stored in directory: c:\users\abdul\appdata\local\pip\cache\wheels\87\39\dd\a83eeef36d0bf98e7a4d1933a4ad2d660295a40613079bafc9
S

In [99]:
from transformers import BertTokenizer
from torch.utils.data import TensorDataset

In [100]:
tokenizer = BertTokenizer.from_pretrained(
    'bert-base-uncased',
    do_lower_case=True
)

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [101]:
df.data_type=='train'

0         True
1         True
2         True
3         True
4         True
         ...  
50868     True
50869     True
50870     True
50871     True
50872    False
Name: data_type, Length: 50852, dtype: bool

In [102]:
df[df.data_type=='train']

Unnamed: 0,id,Tweet,Sentiment,label,data_type
0,0,"RT @ALXTOKEN: Paul Krugman, Nobel Luddite. I h...",['neutral'],0,train
1,1,@lopp @_Kevin_Pham @psycho_sage @naval But @Pr...,['neutral'],0,train
2,2,RT @tippereconomy: Another use case for #block...,['positive'],1,train
3,3,free coins https://t.co/DiuoePJdap,['positive'],1,train
4,4,RT @payvxofficial: WE are happy to announce th...,['positive'],1,train
...,...,...,...,...,...
50867,50853,"RT @PhotoCoin_io: 2,000,000 PHT TOKEN #airdrop...",['neutral'],0,train
50868,50854,RT @fixy_app: Fixy Network brings popular cryp...,['positive'],1,train
50869,50855,RT @bethereumteam: After a successful launch o...,['positive'],1,train
50870,50856,"RT @GymRewards: Buy #GYMRewards Tokens, Bonus ...",['neutral'],0,train


In [103]:
df[df.data_type=='train'].Tweet.values

array(["RT @ALXTOKEN: Paul Krugman, Nobel Luddite. I had to tweak the nose of this Bitcoin enemy. He says such foolish things. Here's the link: htt…",
       '@lopp @_Kevin_Pham @psycho_sage @naval But @ProfFaustus (dum b a ss) said you know nothing about #Bitcoin ... 😂😂😂 https://t.co/SBAMFQ2Yiy',
       'RT @tippereconomy: Another use case for #blockchain and #Tipper. The #TipperEconomy  can unseat Facebook and change everything! ICO Live No…',
       ...,
       "RT @bethereumteam: After a successful launch of our Bounty campaign, we've managed to filter out the Bounty related questions to: https://t…",
       'RT @GymRewards: Buy #GYMRewards Tokens, Bonus Time is ending! https://t.co/HDvhoZrz2J, #ICO #cryptocurrency #mobile #app #mining #exercisin…',
       'I added a video to a @YouTube playlist https://t.co/ntFJrNvSvZ How To Bitcoin Cloud Mining Free For Lifetime Urdu / Hindi'],
      dtype=object)

We have to encode the texts by using [tokenizer.batch_encode_plus](https://huggingface.co/transformers/internal/tokenization_utils.html#transformers.tokenization_utils_base.PreTrainedTokenizerBase.batch_encode_plus)

In [105]:
encoded_data_train = tokenizer.batch_encode_plus(
    df[df.data_type=='train'].Tweet.values,
    add_special_tokens=True,
    return_attention_mask=True,
    #pad_to_max_length=True,
    padding=True,
    truncation=True,
    max_length=256,
    return_tensors='pt'
)

In [106]:
encoded_data_val = tokenizer.batch_encode_plus(
    df[df.data_type=='val'].Tweet.values,
    add_special_tokens=True,
    return_attention_mask=True,
   # pad_to_max_length=True,
    padding=True,
    truncation=True,
    max_length=256,
    return_tensors='pt'
)

For the train

In [107]:
input_ids_train = encoded_data_train['input_ids']
attention_masks_train = encoded_data_train['attention_mask']
labels_train = torch.tensor(df[df.data_type=='train'].label.values) 

For the validation

In [108]:
input_ids_val = encoded_data_val['input_ids']
attention_masks_val = encoded_data_val['attention_mask']
labels_val = torch.tensor(df[df.data_type=='val'].label.values) 

It is created the TensorDataset adapted to Bert for the train and validation

In [109]:
dataset_train = TensorDataset(
    input_ids_train,
    attention_masks_train,
    labels_train
)

In [110]:
dataset_val = TensorDataset(input_ids_val,
                            attention_masks_val,
                            labels_val
)

In [111]:
len(dataset_train)

43224

In [112]:
len(dataset_val)

7628

## Setting up BERT Pretrained Model

In [113]:
from transformers import BertForSequenceClassification

In [115]:
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', 
     num_labels=len(label_dict),
     output_attentions=False,
     output_hidden_states=False)

Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

## Creating Data Loaders

In [116]:
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

In [117]:
#In Google Colab -- GPU Instance (k80)
#batch_size =32
#epoch =10

In [118]:
batch_size = 4 #32
dataloader_train = DataLoader(
    dataset_train,
    sampler=RandomSampler(dataset_train),
    batch_size=batch_size
)


dataloader_val = DataLoader(
    dataset_val,
    sampler=SequentialSampler(dataset_val),
    batch_size=batch_size 
)

## Setting Up Optimizer and Scheduler

In [119]:
from transformers import AdamW, get_linear_schedule_with_warmup

In [120]:
optimizer = AdamW(
    model.parameters(),
    lr=1e-5, #2e-5 > 5e-5
    eps=1e-8
)



In [121]:
epochs = 10

scheduler = get_linear_schedule_with_warmup(
        optimizer,
        num_warmup_steps=0,
        num_training_steps=len(dataloader_train)*epochs
)

## Defining our Performance Metrics

Accuracy metric approach originally used in accuracy function in [this tutorial](https://mccormickml.com/2019/07/22/BERT-fine-tuning/#41-bertforsequenceclassification).

In [122]:
import numpy as np

In [123]:
from sklearn.metrics import f1_score

In [124]:
#preds=[0.9 0.05 0.05 0 0 0]
#preds = [1 0 0 0 0]

In [125]:
def f1_score_func(preds, labels):
    preds_flat = np.argmax(preds, axis =1 ).flatten()
    labels_flat = labels.flatten()
    return f1_score(labels_flat, preds_flat, average='weighted')

In [126]:
def accuracy_per_class(preds, labels):
    label_dict_inverse={v: k for k, v in label_dict.items()}
    preds_flat = np.argmax(preds, axis =1 ).flatten()
    labels_flat = labels.flatten()
    
    for label in np.unique(labels_flat):
        y_pred = preds_flat[labels_flat== label]
        y_true = labels_flat[labels_flat== label]
        print(f'Class:{label_dict_inverse[label]}')
        print(f'Accuracy:{len(y_pred[y_pred==label])}/{len(y_true)}\n')

##  Creating our Training Loop

Approach adapted from an older version of HuggingFace's `run_glue.py` script. Accessible [here](https://github.com/huggingface/transformers/blob/5bfcd0485ece086ebcbed2d008813037968a9e58/examples/run_glue.py#L128).

In [127]:
import random

seed_val = 17
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

In [128]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
print(device)

cpu


Assuming valX is a tensor with the complete validation data, 
The usual approach would be to wrap it in a Dataset and DataLoader and get the predictions for each batch. 

Also, to save memory during evaluation and test, you could wrap the validation and test code into a with torch.no_grad() block.

 for evaluation and test set the code should be:
```python

with torch.no_grad():
    model.eval()
    y_pred = model(valX)
    val_loss = criterion(y_pred, valY)
```

and
```python

with torch.no_grad():
    model.eval()
    y_pred = model(test)
    test_loss = criterion(y_pred, testY)
```

In [129]:
  def evaluate(dataloader_val):
    model.e  val()
    loss_val_total = 0
    predictions, true_vals = [], []
    for batch in tqdm(dataloader_val):
        batch = tuple(b.to(device) for b in batch)
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }

        with torch.no_grad():        
            outputs = model(**inputs)
            
        loss = outputs[0]
        logits = outputs[1]
        loss_val_total += loss.item()

        logits = logits.detach().cpu().numpy()
        label_ids = inputs['labels'].cpu().numpy()
        predictions.append(logits)
        true_vals.append(label_ids)
    
    loss_val_avg = loss_val_total/len(dataloader_val) 
    
    predictions = np.concatenate(predictions, axis=0)
    true_vals = np.concatenate(true_vals, axis=0)
            
    return loss_val_avg, predictions, true_vals


In [130]:
for epoch in tqdm(range(1, epochs+1)):
    
    model.train()
    
    loss_train_total = 0
    
    progress_bar = tqdm(dataloader_train, 
                        desc='Epoch {:1d}'.format(epoch),
                        leave=False,
                        disable=False)
    for batch in progress_bar:
        model.zero_grad()
        batch = tuple(b.to(device) for b in batch)
        inputs ={
            'input_ids'    :batch[0],
            'attention_mask':batch[1],
            'labels'        :batch[2]
        }
        outputs = model(**inputs)
        loss = outputs[0]
        loss_train_total += loss.item()                     v 
        loss.backward()
    
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
        scheduler.step()
        progress_bar.set_postfix(
            {'training_loss': '{:.3f}'.format(loss.item()/len(batch))})
        
    #torch.save(model.state_dict(),f'Models/BERT_ft_epoch{epoch}.model')
    tqdm.write('\nEpoch {epoch}')
    
    loss_train_avg= loss_train_total/len(dataloader_train)
    tqdm.write(f'Training loss:{loss_train_avg}')
    
    val_loss, predictions, true_vals = evaluate(dataloader_val)
    val_f1= f1_score_func(predictions,true_vals)
    tqdm.write(f'Validation{val_loss}')
    tqdm.write(f'F1 Score (weigthed): {val_f1}')
torch.save(model.state_dict(),f'Models/BERT_ft_epoch{epoch}.model')       

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch 1:   0%|          | 0/10806 [00:00<?, ?it/s]


Epoch {epoch}
Training loss:0.26445183036617576


  0%|          | 0/1907 [00:00<?, ?it/s]

Validation0.1380689826670701
F1 Score (weigthed): 0.9776306484148939


Epoch 2:   0%|          | 0/10806 [00:00<?, ?it/s]


Epoch {epoch}
Training loss:0.08285939587430981


  0%|          | 0/1907 [00:00<?, ?it/s]

Validation0.10676112664527498
F1 Score (weigthed): 0.9832201381387757


Epoch 3:   0%|          | 0/10806 [00:00<?, ?it/s]

KeyboardInterrupt: 

When saving a general checkpoint, to be used for either inference or resuming training, you must save more than just the model’s state_dict. It is important to also save the optimizer’s state_dict, as this contains buffers and parameters that are updated as the model trains. Other items that you may want to save are the epoch you left off on, the latest recorded training loss, external torch.nn.Embedding layers, etc. As a result, such a checkpoint is often 2~3 times larger than the model alone.

To save multiple components, organize them in a dictionary and use torch.save() to serialize the dictionary. A common PyTorch convention is to save these checkpoints using the .tar file extension.

##  Loading and Evaluating our Model

In [54]:
model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=len(label_dict),
    output_attentions=False,
    output_hidden_states=False)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

When we are loading the bert-base-cased checkpoint (which is a checkpoint that was trained using a similar architecture to BertForPreTraining) in a BertForSequenceClassification model.

This means that:

The layers that BertForPreTraining has, but BertForSequenceClassification does not have will be discarded
The layers that BertForSequenceClassification has but BertForPreTraining does not have will be randomly initialized.
This is expected, and tells you that you won't have good performance with your BertForSequenceClassification model before you fine-tune it 🙂.

This warning means that during your training, you're not using the pooler in order to compute the loss. I don't know how you're finetuning your model, but if you're not using the pooler layer then there's no need to worry about that warning.

In [55]:
len(label_dict)

6

In PyTorch, the learnable parameters (i.e. weights and biases) of an torch.nn.Module model are contained in the model’s parameters (accessed with model.parameters()). A state_dict is simply a Python dictionary object that maps each layer to its parameter tensor.

In [56]:
# Print model's state_dict
#print("Model's state_dict:")
#for param_tensor in model.state_dict():
#    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

In [57]:
# Print optimizer's state_dict
#print("Optimizer's state_dict:")
#for var_name in optimizer.state_dict():
#    print(var_name, "\t", optimizer.state_dict()[var_name])

In [58]:
device = torch.device('cuda')
pass

In [59]:
model.to(device)
pass
# Make sure to call input = input.to(device) on any input tensors that you feed to the model

In [60]:
PATH='./Models/BERT_ft_epoch10.model'

In [61]:
model.load_state_dict(torch.load(PATH, 
                                 map_location=torch.device('cuda:0')))

<All keys matched successfully>

When loading a model on a GPU that was trained and saved on GPU, simply convert the initialized model to a CUDA optimized model using model.to(torch.device('cuda')). Also, be sure to use the .to(torch.device('cuda')) function on all model inputs to prepare the data for the model. Note that calling my_tensor.to(device) returns a new copy of my_tensor on GPU. It does NOT overwrite my_tensor. Therefore, remember to manually overwrite tensors: my_tensor = my_tensor.to(torch.device('cuda')).

In [62]:
_, predictions, true_vals = evaluate(dataloader_val)

  0%|          | 0/56 [00:00<?, ?it/s]

In [63]:
accuracy_per_class(predictions, true_vals)

Class:happy
Accuracy:161/171

Class:not-relevant
Accuracy:20/32

Class:angry
Accuracy:8/9

Class:disgust
Accuracy:0/1

Class:sad
Accuracy:2/5

Class:surprise
Accuracy:2/5

