## The main code for training the sentiment-classifier

### Note: pip installed packages like pandas, sklearn, transformers etc. are not explicitly marked that way. Please install the needed packages on your own.

### This code was executed inside the SageMaker Studio Lab

This code was constructed by combining codes of many tutorials...

- https://mccormickml.com/2019/07/22/BERT-fine-tuning/
- https://towardsdatascience.com/multi-class-text-classification-with-deep-learning-using-bert-b59ca2f5c613
- https://huggingface.co/docs/transformers/master/en/custom_datasets#seq_imdb

... and official documentation or other researched solutions on the internet for smaller faced problems.

In [7]:
import tensorflow as tf
import torch
import numpy as np
from transformers import BertTokenizer
from tqdm.notebook import tqdm
from torch.utils.data import TensorDataset
from transformers import BertForSequenceClassification
import pandas as pd

In [8]:
# Check if a GPU is available
device_name = tf.test.gpu_device_name()
if device_name == '/device:GPU:0':
    print('Found GPU at: {}'.format(device_name))
else:
    raise SystemError('GPU device not found')

Found GPU at: /device:GPU:0


In [9]:
# Use the available GPU. Note: I did not test if this will work on a cpu, but it might.
# Using a CPU is not recommended.
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print('We will use the This:', torch.cuda.get_device_name(0))

We will use the This: Tesla T4


In [10]:
# Read the given dataset with a tabular seperator and no header.
# Define the columns as "Emotion" and "Text"
df = pd.read_csv("train.csv", sep='\t', header=None, names=['Emotion', 'Text'])

In [11]:
# Visual help cell, not needed for compilation
# From now on I will just mark them as "Visual cell", but still might explain the process
df.head()

Unnamed: 0,Emotion,Text
0,anger,@USERNAME A little [#TRIGGERWORD#] that I am n...
1,disgust,@USERNAME @USERNAME It's pretty [#TRIGGERWORD#...
2,fear,Apparently I've been black mailing my brother ...
3,fear,Republicans are so [#TRIGGERWORD#] that people...
4,sad,Katy once felt so [#TRIGGERWORD#] that she bar...


In [12]:
# Visual cell
# Check the amount of labels per unique emotion
df['Emotion'].value_counts()

joy         27361
anger       25016
fear        25009
surprise    24994
disgust     24962
sad         22625
Name: Emotion, dtype: int64

In [13]:
# Iterate through unique labels and create a dict based on them
possible_labels = df.Emotion.unique()

label_dict = {}
for index, possible_label in enumerate(possible_labels):
    label_dict[possible_label] = index
label_dict

{'anger': 0, 'disgust': 1, 'fear': 2, 'sad': 3, 'surprise': 4, 'joy': 5}

In [14]:
# Replace the labels with numerical values through our defined dict
df['label'] = df.Emotion.replace(label_dict)

In [17]:
# Use train_test_split from sklearn to split our data into the defined size of 85%/15%
# Stratify the values to the label counts
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(df.index.values, 
                                                  df.label.values, 
                                                  test_size=0.15, 
                                                  random_state=42, 
                                                  stratify=df.label.values)

# Initialize a new column and assign the test split accordingly
df['data_type'] = ['not_set']*df.shape[0]

df.loc[X_train, 'data_type'] = 'train'
df.loc[X_val, 'data_type'] = 'val'

# Check if the split was successful
df.groupby(['Emotion', 'label', 'data_type']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Text
Emotion,label,data_type,Unnamed: 3_level_1
anger,0,train,21263
anger,0,val,3753
disgust,1,train,21218
disgust,1,val,3744
fear,2,train,21257
fear,2,val,3752
joy,5,train,23257
joy,5,val,4104
sad,3,train,19231
sad,3,val,3394


In [18]:
# Use the BertTokenizer to tokenize the train and validation dataset.
# Truncation is enabled and the max_size has been generously. It could have been made shorter (referencing the string cleaning done)...
# ... but I was not sure if the tokenizer needed additional space apart from the padding. 
# As we use PyTorch, tensors are returned as "pt"
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', 
                                          do_lower_case=True)
                                          
encoded_data_train = tokenizer.batch_encode_plus(
    df[df.data_type=='train'].Text.values, 
    add_special_tokens=True, 
    truncation=True,
    return_attention_mask=True,  
    max_length=80, 
    padding='max_length',
    return_tensors='pt'
)

encoded_data_val = tokenizer.batch_encode_plus(
    df[df.data_type=='val'].Text.values, 
    add_special_tokens=True, 
    truncation=True,
    return_attention_mask=True, 
    max_length=80, 
    padding='max_length',
    return_tensors='pt'
)

# Create training and validation datasets from the tokenizer
# Input_ids describe the tokenized values, attention masks show which tokens should be actively used
# Labels are just the numerically converted emotion values
input_ids_train = encoded_data_train['input_ids']
attention_masks_train = encoded_data_train['attention_mask']
labels_train = torch.tensor(df[df.data_type=='train'].label.values)

input_ids_val = encoded_data_val['input_ids']
attention_masks_val = encoded_data_val['attention_mask']
labels_val = torch.tensor(df[df.data_type=='val'].label.values)

# Initialize a TensorDataset for PyTorch with the created datasets.
dataset_train = TensorDataset(input_ids_train, attention_masks_train, labels_train)
dataset_val = TensorDataset(input_ids_val, attention_masks_val, labels_val)

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [19]:
# Load the pretrained BertForSequenceClassification. In this case we have multiclass-labels, so we have to adjust "num_labels"
model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=len(label_dict),
                                                      output_attentions=False,
                                                      output_hidden_states=False)
model.to(device)

Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, element

In [20]:
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

# Initialize the batch size as explained in the report.
batch_size = 32

# Our DataLoader will combine a DataSet and a RandomSampler to be an iterable.
dataloader_train = DataLoader(dataset_train, 
                              sampler=RandomSampler(dataset_train), 
                              batch_size=batch_size)

dataloader_validation = DataLoader(dataset_val, 
                                   sampler=SequentialSampler(dataset_val), 
                                   batch_size=batch_size)

In [21]:
from transformers import AdamW, get_linear_schedule_with_warmup

# Use learning rate, epsilon and epochs based on the research in the report.
optimizer = AdamW(model.parameters(),
                  lr=2e-5, 
                  eps=1e-8)
                  
epochs = 3

# This will create a schedule to increase the learning rate in the warmup period to the set value, before decreasing to 0 again afterwards.
scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps=0,
                                            num_training_steps=len(dataloader_train)*epochs)



In [22]:
from sklearn.metrics import f1_score

# Some helper functions to show the f1-score and the accuracy per sentiment.
def f1_score_func(preds, labels):
    preds_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return f1_score(labels_flat, preds_flat, average='weighted')

def accuracy_per_class(preds, labels):
    label_dict_inverse = {v: k for k, v in label_dict.items()}
    
    preds_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()

    for label in np.unique(labels_flat):
        y_preds = preds_flat[labels_flat==label]
        y_true = labels_flat[labels_flat==label]
        print(f'Class: {label_dict_inverse[label]}')
        print(f'Accuracy: {len(y_preds[y_preds==label])}/{len(y_true)}\n')

In [23]:
# IMPORTANT:
# If you execute this cell, make sure a folder named "sentiment_model" exists here, as the trained model will be saved there.
# It might have happened before that a full epoch of training was finished before being faced with a trivial error :)

import random

seed_val = 47
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

# This has to be used by a trained model to predict and evaluate a dataset.
# Returns the average loss, the predictions and actual values.
def evaluate(dataloader_val):

    # Put model in eval mode
    model.eval()
    
    loss_val_total = 0
    predictions, true_vals = [], []
    
    # Iterate through the validation set (or whatever should be evaluated)
    for batch in dataloader_val:
        
        batch = tuple(b.to(device) for b in batch)
        
        # Unpack the data from the dataloader
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }

        # This is a forward pass, so no constructing of the computing graph is needed.
        # Furthermore save the logit predictions from the model.
        with torch.no_grad():        
            outputs = model(**inputs)
            
        # Extract the specific values and calculate the total validation loss.
        loss = outputs[0]
        logits = outputs[1]
        loss_val_total += loss.item()

        # Move Logits and Labels to CPU and fill the matrix of the prediction and true values.
        logits = logits.detach().cpu().numpy()
        label_ids = inputs['labels'].cpu().numpy()
        predictions.append(logits)
        true_vals.append(label_ids)
    
    # Relative loss
    loss_val_avg = loss_val_total/len(dataloader_val) 
    
    # Numpy arrays
    predictions = np.concatenate(predictions, axis=0)
    true_vals = np.concatenate(true_vals, axis=0)
            
    return loss_val_avg, predictions, true_vals

# This loop lets the model train the amount of epochs defined. 
# Most of these steps are analog to the above.
for epoch in tqdm(range(1, epochs+1)):
    
    # Put the model in training mode
    model.train()
    
    loss_train_total = 0

    # Add visuals for every batch(epoch)
    progress_bar = tqdm(dataloader_train, desc='Epoch {:1d}'.format(epoch), leave=False, disable=False)
    for batch in progress_bar:

        # Clear previous calculated gradients
        model.zero_grad()
        
        batch = tuple(b.to(device) for b in batch)
        
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }       

        outputs = model(**inputs)
        
        loss = outputs[0]
        loss_train_total += loss.item()
        loss.backward()

        # Cited from: https://mccormickml.com/2019/07/22/BERT-fine-tuning/
        # Clip the norm of the gradients to 1.0.
        # This is to help prevent the "exploding gradients" problem.
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        # Optimizer will update the parameters and takes a step based on the computed gradient.
        # Scheduler will just update the learning rate
        optimizer.step()
        scheduler.step()
        
        progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/len(batch))})
         
    # Save the model after each epoch    
    torch.save(model.state_dict(), f'sentiment_model/finetuned_BERT_epoch_{epoch}.model')
        
    # Show visual outputs for f1-score, accuracy and validation loss after each epoch
    tqdm.write(f'\nEpoch {epoch}')
    
    loss_train_avg = loss_train_total/len(dataloader_train)            
    tqdm.write(f'Training loss: {loss_train_avg}')
    
    val_loss, predictions, true_vals = evaluate(dataloader_validation)
    val_f1 = f1_score_func(predictions, true_vals)
    tqdm.write(f'Validation loss: {val_loss}')
    tqdm.write(f'F1 Score (Weighted): {val_f1}')

  0%|          | 0/3 [00:00<?, ?it/s]

Epoch 1:   0%|          | 0/3984 [00:00<?, ?it/s]


Epoch 1
Training loss: 0.9843949908876873
Validation loss: 0.8222518500528837
F1 Score (Weighted): 0.6991553968615907


Epoch 2:   0%|          | 0/3984 [00:00<?, ?it/s]


Epoch 2
Training loss: 0.6967255929597171
Validation loss: 0.7950210174066075
F1 Score (Weighted): 0.7131743345603304


Epoch 3:   0%|          | 0/3984 [00:00<?, ?it/s]


Epoch 3
Training loss: 0.5410770274077853
Validation loss: 0.833258919174098
F1 Score (Weighted): 0.7139919390894979


In [24]:
# Initialize the model again
model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=len(label_dict),
                                                      output_attentions=False,
                                                      output_hidden_states=False)

model.to(device)

# Load the model (with the highest accuracy) 
model.load_state_dict(torch.load('sentiment_model/finetuned_BERT_epoch_3.model', map_location=torch.device('cpu')))

# Check the accuracy for each specific each class
_, predictions, true_vals = evaluate(dataloader_validation)
accuracy_per_class(predictions, true_vals)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Class: anger
Accuracy: 2437/3753

Class: disgust
Accuracy: 2546/3744

Class: fear
Accuracy: 2805/3752

Class: sad
Accuracy: 2299/3394

Class: surprise
Accuracy: 2659/3749

Class: joy
Accuracy: 3301/4104



In [None]:
from sklearn import metrics

In [None]:
# Construct an array from the predictions
pred_np = np.zeros(len(predictions))
for i, pred in enumerate(predictions):
  pred_np[i] = np.argmax(pred)

In [None]:
# Use this to show the confusion Matrix
conf_mx = metrics.confusion_matrix(true_vals, pred_np)
conf_mx

***

### Now we are using the trained model to predict the sentiments on our second dataset.

In [25]:
# Read the second dataset
pred_df = pd.read_csv("experiment.tsv", sep='\t', header=None, names=['Text', 'label'])

In [26]:
# Visual cell
pred_df.head()

Unnamed: 0,Text,label
0,Drug challenge idea: take shrooms or some LSD ...,0
1,Austin pulliam making rude gestures toward me....,0
2,"Got my mind thinking, pretty girls drinking ta...",0
3,"""@XXX: I think white and have nigger lips"" mad...",0
4,You need to swallow some make-up or something ...,0


In [27]:
# Same steps as before: Tokenize the data
encoded_pred = tokenizer.batch_encode_plus(
                      pred_df.Text.values,                      
                      add_special_tokens = True, 
                      truncation = True,
                      max_length = 80,           
                      padding='max_length',
                      return_attention_mask = True,  
                      return_tensors = 'pt',  
                  )

In [28]:
# Create dataframes for input_ids, att_mask and labels.
input_ids_pred = encoded_pred['input_ids']
attention_masks_pred = encoded_pred['attention_mask']
labels_pred = torch.tensor(pred_df.label.values)

In [30]:
# Create TensorDataset
dataset_pred = TensorDataset(input_ids_pred, attention_masks_pred, labels_pred)

In [31]:
# DataLoader with sampler
dataloader_pred = DataLoader(dataset_pred, 
                                sampler=SequentialSampler(dataset_pred), 
                                batch_size=batch_size)

In [32]:
# We only need the predictions from the evaluation
_, predictions, _ = evaluate(dataloader_pred)

In [None]:
# Visual cell
predictions

array([[ 0.17669497,  2.5332584 ,  0.5245931 ,  0.19869341, -0.75338733,
        -2.7886083 ],
       [ 1.0019441 ,  2.7872298 , -0.83420616, -1.0796018 ,  0.05279772,
        -2.1156585 ],
       [ 0.04759236,  2.6465518 , -0.41091102,  0.36313918, -1.309263  ,
        -1.805133  ],
       ...,
       [-0.22940406, -1.1568662 , -1.4875146 ,  0.72091216, -1.0040518 ,
         4.1983314 ],
       [-0.5771077 ,  2.4314728 , -0.53765196,  0.2757519 , -0.65303725,
        -0.8418206 ],
       [-0.70699555, -1.3317142 ,  0.6421219 ,  0.3783986 , -1.4312243 ,
         3.5257003 ]], dtype=float32)

In [33]:
# This is redundant, but I didn't prepare the dataset for the predictions, so we read the data again.
# And create a new column named "sent_label"
result_df = pd.read_csv("experiment.tsv", sep='\t', header=None, names=['Text', 'privacy'])
result_df['sent_label']=0
result_df = pd.DataFrame(result_df)

In [34]:
# Visual cell
result_df.head()

Unnamed: 0,Text,privacy,sent_label
0,Drug challenge idea: take shrooms or some LSD ...,0,0
1,Austin pulliam making rude gestures toward me....,0,0
2,"Got my mind thinking, pretty girls drinking ta...",0,0
3,"""@XXX: I think white and have nigger lips"" mad...",0,0
4,You need to swallow some make-up or something ...,0,0


In [35]:
# Iterate through every prediction and write the index of the highest value into the new column.
for i, pred in enumerate(predictions):
  result_df['sent_label'][i] = np.argmax(pred)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  result_df['sent_label'][i] = np.argmax(pred)


In [36]:
# Visual cell (it is just a coincidence that the first labels were all predicted with 1s)
result_df.head()

Unnamed: 0,Text,privacy,sent_label
0,Drug challenge idea: take shrooms or some LSD ...,0,1
1,Austin pulliam making rude gestures toward me....,0,1
2,"Got my mind thinking, pretty girls drinking ta...",0,1
3,"""@XXX: I think white and have nigger lips"" mad...",0,1
4,You need to swallow some make-up or something ...,0,1


In [37]:
# Save the result
result_df.to_csv("experimentResult.tsv", sep='\t', header=None, index=False)

In [38]:
# Visual cell
# Here you can see, that not everything was a 1 :)
result_df.groupby(['privacy', 'sent_label']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Text
privacy,sent_label,Unnamed: 2_level_1
0,0,113
0,1,570
0,2,224
0,3,136
0,4,114
0,5,278
1,0,58
1,1,208
1,2,218
1,3,168


In [39]:
# Visual cell
# General amount of sentiment predictions
result_df['sent_label'].value_counts(ascending=True)

0    171
4    270
3    304
2    442
1    778
5    905
Name: sent_label, dtype: int64

In [40]:
# I could have continued with the dataset, but wanted the possibility to do this from scratch
# Read dataset with the newly acquired sentiment column
df = pd.read_csv("experimentResult.tsv", sep='\t', header=None, names=['Text', 'Privacy', 'Sentiment'])

In [43]:
# Define a dictionary and replace the numbers with text again.
label_dict = {0: 'angry', 1: 'disgusted', 2: 'fearful', 3: 'sad', 4: 'surprised', 5: 'joyful'}
df['Sentiment'] = df.Sentiment.replace(label_dict)

In [44]:
# Visual cell
df.head()

Unnamed: 0,Text,Privacy,Sentiment
0,Drug challenge idea: take shrooms or some LSD ...,0,disgusted
1,Austin pulliam making rude gestures toward me....,0,disgusted
2,"Got my mind thinking, pretty girls drinking ta...",0,disgusted
3,"""@XXX: I think white and have nigger lips"" mad...",0,disgusted
4,You need to swallow some make-up or something ...,0,disgusted


In [45]:
# Make a new file
replaced_file = pd.DataFrame(df)
replaced_file.to_csv("experiment_withSentiment19_02.tsv", sep='\t')

In [46]:
# Create the new Tweets with the added sentiments in text form.
for i, row in df.iterrows():
  combined = ""
  combined += "This is a {:} tweet. ".format(row["Sentiment"])
  combined += row["Text"]
  df.at[i, 'Text'] = combined

In [47]:
# Delete the useless column
df.drop("Sentiment", axis=1, inplace=True)

In [49]:
# Save the dataset with feats.
replaced_file = pd.DataFrame(df)
replaced_file.to_csv("experiment_withFeats19_02.tsv", sep='\t')

In [50]:
# Now everything is analog to the first model. Except if stated otherwise.
X_train, X_val, y_train, y_val = train_test_split(df.index.values, 
                                                  df.Privacy.values, 
                                                  test_size=0.15, 
                                                  random_state=42, 
                                                  stratify=df.Privacy.values)

df['data_type'] = ['not_set']*df.shape[0]

df.loc[X_train, 'data_type'] = 'train'
df.loc[X_val, 'data_type'] = 'val'

df.groupby(['Privacy', 'data_type']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Text
Privacy,data_type,Unnamed: 2_level_1
0,train,1219
0,val,216
1,train,1220
1,val,215


In [51]:
# Tokenize like in the model before
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', 
                                          do_lower_case=True)
                                          
encoded_data_train = tokenizer.batch_encode_plus(
    df[df.data_type=='train'].Text.values, 
    add_special_tokens=True, 
    truncation=True,
    return_attention_mask=True,  
    max_length=80, 
    padding='max_length',
    return_tensors='pt'
)

encoded_data_val = tokenizer.batch_encode_plus(
    df[df.data_type=='val'].Text.values, 
    add_special_tokens=True, 
    truncation=True,
    return_attention_mask=True, 
    max_length=80, 
    padding='max_length',
    return_tensors='pt'
)


input_ids_train = encoded_data_train['input_ids']
attention_masks_train = encoded_data_train['attention_mask']
labels_train = torch.tensor(df[df.data_type=='train'].Privacy.values)

input_ids_val = encoded_data_val['input_ids']
attention_masks_val = encoded_data_val['attention_mask']
labels_val = torch.tensor(df[df.data_type=='val'].Privacy.values)

dataset_train = TensorDataset(input_ids_train, attention_masks_train, labels_train)
dataset_val = TensorDataset(input_ids_val, attention_masks_val, labels_val)

In [85]:
# In this case we only have binary classification, meaning we change num_labels
model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=2,
                                                      output_attentions=False,
                                                      output_hidden_states=False)
model.to(device)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, element

In [86]:
batch_size = 32

dataloader_train = DataLoader(dataset_train, 
                              sampler=RandomSampler(dataset_train), 
                              batch_size=batch_size)

dataloader_validation = DataLoader(dataset_val, 
                                   sampler=SequentialSampler(dataset_val), 
                                   batch_size=batch_size)

In [87]:
from transformers import AdamW, get_linear_schedule_with_warmup

optimizer = AdamW(model.parameters(),
                  lr=1e-5, 
                  eps=1e-8)

# I have chosen 8 epochs, because this dataset is small and computation is fast.
# Because we save the model after each epoch anyway, I tried to make the best use of my limited time in Google Colab and Sagemaker                  
epochs = 8

scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps=0,
                                            num_training_steps=len(dataloader_train)*epochs)



In [88]:
# IMPORTANT
# This time, make sure the folder "sensitivity_model" exists.
# Everything else is analog

seed_val = 47
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

def evaluate(dataloader_val):

    model.eval()
    
    loss_val_total = 0
    predictions, true_vals = [], []
    
    for batch in dataloader_val:
        
        batch = tuple(b.to(device) for b in batch)
        
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }

        with torch.no_grad():        
            outputs = model(**inputs)
            
        loss = outputs[0]
        logits = outputs[1]
        loss_val_total += loss.item()

        logits = logits.detach().cpu().numpy()
        label_ids = inputs['labels'].cpu().numpy()
        predictions.append(logits)
        true_vals.append(label_ids)
    
    loss_val_avg = loss_val_total/len(dataloader_val) 
    
    predictions = np.concatenate(predictions, axis=0)
    true_vals = np.concatenate(true_vals, axis=0)
            
    return loss_val_avg, predictions, true_vals
    
for epoch in tqdm(range(1, epochs+1)):
    
    model.train()
    
    loss_train_total = 0

    progress_bar = tqdm(dataloader_train, desc='Epoch {:1d}'.format(epoch), leave=False, disable=False)
    for batch in progress_bar:

        model.zero_grad()
        
        batch = tuple(b.to(device) for b in batch)
        
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }       

        outputs = model(**inputs)
        
        loss = outputs[0]
        loss_train_total += loss.item()
        loss.backward()

        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        optimizer.step()
        scheduler.step()
        
        progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/len(batch))})
         
        
    torch.save(model.state_dict(), f'sensitivity_model/finetuned_BERT_epoch_{epoch}.model')
        
    tqdm.write(f'\nEpoch {epoch}')
    
    loss_train_avg = loss_train_total/len(dataloader_train)            
    tqdm.write(f'Training loss: {loss_train_avg}')
    
    val_loss, predictions, true_vals = evaluate(dataloader_validation)
    val_f1 = f1_score_func(predictions, true_vals)
    tqdm.write(f'Validation loss: {val_loss}')
    tqdm.write(f'F1 Score (Weighted): {val_f1}')

  0%|          | 0/8 [00:00<?, ?it/s]

Epoch 1:   0%|          | 0/153 [00:00<?, ?it/s]


Epoch 1
Training loss: 0.5543305965420467
Validation loss: 0.40381859794810965
F1 Score (Weighted): 0.8370606490624514


Epoch 2:   0%|          | 0/153 [00:00<?, ?it/s]


Epoch 2
Training loss: 0.3650708786802354
Validation loss: 0.2798691996269756
F1 Score (Weighted): 0.8816667115697782


Epoch 3:   0%|          | 0/153 [00:00<?, ?it/s]


Epoch 3
Training loss: 0.27133655448267663
Validation loss: 0.2802765986157788
F1 Score (Weighted): 0.8908866671662925


Epoch 4:   0%|          | 0/153 [00:00<?, ?it/s]


Epoch 4
Training loss: 0.19794302106244502
Validation loss: 0.3355070071777812
F1 Score (Weighted): 0.8696134287234207


Epoch 5:   0%|          | 0/153 [00:00<?, ?it/s]


Epoch 5
Training loss: 0.131150781152636
Validation loss: 0.3209110410815036
F1 Score (Weighted): 0.9024829135652119


Epoch 6:   0%|          | 0/153 [00:00<?, ?it/s]


Epoch 6
Training loss: 0.10119444871735242
Validation loss: 0.3520169951435592
F1 Score (Weighted): 0.8769705522343187


Epoch 7:   0%|          | 0/153 [00:00<?, ?it/s]


Epoch 7
Training loss: 0.0735911515730074
Validation loss: 0.37441471015551575
F1 Score (Weighted): 0.8955882749145101


Epoch 8:   0%|          | 0/153 [00:00<?, ?it/s]


Epoch 8
Training loss: 0.048232248075781206
Validation loss: 0.38536370059268343
F1 Score (Weighted): 0.8955849020614858


In [89]:
# Training more epochs was not really necessary, but as I said, I wanted to enjoy the resources.
# Load the model with the best accuracy
model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=2,
                                                      output_attentions=False,
                                                      output_hidden_states=False)

model.to(device)

model.load_state_dict(torch.load('sensitivity_model/finetuned_BERT_epoch_5.model', map_location=torch.device('cpu')))

_, predictions, true_vals = evaluate(dataloader_validation)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

In [90]:
# This is used for predicting the accuracy of private and public tweets.
# Just iterate through the predictions and fill the values accordingly. I could have used an array but was lazy.
pr_right, pr_wrong, pu_right, pu_wrong = 0, 0, 0, 0 
for i, pred in enumerate(predictions):
  if(true_vals[i]==0):
    if(np.argmax(pred) == true_vals[i]):
      pr_right += 1
    else:
      pr_wrong +=1
  else:
    if(np.argmax(pred) == true_vals[i]):
      pu_right += 1
    else:
      pu_wrong +=1

# Calculate the accuracies
pr_acc = round(100*(pr_right/(pr_right+pr_wrong)), 2)
pu_acc = round(100*(pu_right/(pu_right+pu_wrong)), 2)
print("Private Tweets Accuracy:")
print("{}/{} {}%".format(pr_right,pr_wrong+pr_right,pr_acc))
print("Public Tweets Accuracy:")
print("{}/{} {}%".format(pu_right,pu_wrong+pu_right,pu_acc))

Private Tweets Accuracy:
189/216 87.5%
Public Tweets Accuracy:
200/215 93.02%
