<a href="https://colab.research.google.com/github/sachaRfd/Sentiment-Analysis-NLP/blob/main/Sentiment_Analysis_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simple Sentiment Analysis using IMDB PyTorch Dataset and simple LSTM:

All Imports:

In [1]:
!pip install torchdata  # Install Torch Datasets
!pip install nltk  # Import the Natural Language Toolkit --> Most Common

import nltk  # Download key files
nltk.download('punkt')  # Sequence Tokeniser
nltk.download('stopwords')  # List of Most Common StopWords
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import string 



import numpy as np
import pandas as pd

import gc
from tqdm import tqdm

import torch
from torch import nn
from torch.nn.functional import pad
import torch.nn.functional as F
from torchtext.data import to_map_style_dataset
from torch.utils.data import DataLoader
from torch.optim import RMSprop


# Set Device to GPU is available - otherwise set to CPU: 
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Your Current Device is {device}')  # Check the Colab Device we are using

from torchtext import data, datasets  # Import the datasets
from sklearn.model_selection import train_test_split  # Import splitting function
from sklearn.metrics import accuracy_score
import torchdata

from torchtext.vocab import GloVe  # Import the Glove Embedding

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torchdata
  Downloading torchdata-0.5.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB)
[K     |████████████████████████████████| 4.6 MB 35.7 MB/s 
[?25hCollecting urllib3>=1.25
  Downloading urllib3-1.26.13-py2.py3-none-any.whl (140 kB)
[K     |████████████████████████████████| 140 kB 72.1 MB/s 
[?25hCollecting portalocker>=2.0.0
  Downloading portalocker-2.6.0-py2.py3-none-any.whl (15 kB)
Collecting torch==1.13.1
  Downloading torch-1.13.1-cp38-cp38-manylinux1_x86_64.whl (887.4 MB)
[K     |██████████████████████████████  | 834.1 MB 1.1 MB/s eta 0:00:48tcmalloc: large alloc 1147494400 bytes == 0x3aa3c000 @  0x7f5425f61615 0x5d6f4c 0x51edd1 0x51ef5b 0x4f750a 0x4997a2 0x4fd8b5 0x4997c7 0x4fd8b5 0x49abe4 0x4f5fe9 0x55e146 0x4f5fe9 0x55e146 0x4f5fe9 0x55e146 0x5d8868 0x5da092 0x587116 0x5d8d8c 0x55dc1e 0x55cd91 0x5d8941 0x49abe4 0x55cd91 0x5d8941 0x4990ca 0x

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


Your Current Device is cuda


In [19]:
import random

def set_seed(seed):
    """
    Use this to set ALL the random seeds to a fixed value and take out any randomness from cuda kernels
    """

    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

Let's get the Train, Validation and Training Sets ready: 

In [2]:
# Get the train and test splits form the IMDB Dataset
train_dataset, test_dataset  = datasets.IMDB(root = '.data', split = ('train', 'test'))

# Let's now split the test set into a test and validation set: 
test_dataset, valid_dataset = train_test_split(list(test_dataset), train_size=.8)


## Understanding the Dataset: 
IMDB Reviews


In [3]:
print(f'The shape of the Train set is {len(list(train_dataset))}')  #  Have to Convert to List
print(f'The shape of the Validation set is {len(valid_dataset)}')
print(f'The shape of the Test set is {len(test_dataset)}')

The shape of the Train set is 25000
The shape of the Validation set is 5000
The shape of the Test set is 20000


Shape of the training Dataset is 25_000. 

Let's Check if our data is balanced in the training set: 

In [4]:
# Code to check for balanced dataset
dataframe_ = pd.DataFrame(list(train_dataset), columns=['Y', 'x'])
dataframe_.Y.value_counts()

1    12500
2    12500
Name: Y, dtype: int64

### Let's now visualise some of the reviews: 

In [5]:
# Plotting the First 2 Reviews
list(train_dataset)[:5]

[(1,
  'I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered pornographic. Really, the sex and nudity scenes are few and far betw

## To summarise the dataset: 
- The dataset consists of Movie reviews taken from IMDB
- The train set is formed of 25_000 reviews
- The validation set of 5_000 reviews
- and the Test set is of 20_000 reviews. 
- In the Y variable, a 1 consists of a Negative Review and 2 a Positive Review
- We can also see that our dataset is BALANCED, with 12_500 bad reviews and 12_5000 good reviews.

# Data Preprocessing: 

For simlpe NLP applications, the data has to be processed in a certain manner: 
- Things to Check For:
  - All lower-case text
  - No Numbers in text
  - No Punctutation - Good for generalisation - eventhough some people use punctuation to show sentiment
- Transformation of the sentences into list of tokens - Therefore the sentence becomes a list of words
- We have to tokenise the words - We will be using words from the GloVe library.
  - We want to get the Index of our words in the GloVe library.
- Padding of the sentences is also required as some of the reviews are very long or relatively short. Let's use a maximum padding of 150 here - For no reason.

# List of Functions: 
- To Remove Numbers in the sentences
- To Remove Punctuation
- To Tokenise the sentences
- To Remove Unwanted Stopwords
- To Get the Index of the words in the GloVe library
- To Pad the Sentences
- A final Function which transforms the inputted test by using all the above functions and converting the sentences to lowercase.

In [6]:
def remove_numbers(text):
  '''Function to Remove Numbers from inputted text'''
  
  text = ''.join(word for word in text if not word.isdigit())
  return text

def remove_punctuation(text):
  '''Function to Remove all Punctuation from inputted text'''

  for punctuation in string.punctuation:
     text = text.replace(punctuation, '')  # Replace the Punctuation with empty space
  return text

def tokenize(text):
  '''Function to Tokenise any inputted text using NLTK tokenise'''

  word_tokens = word_tokenize(text)  # Tokenise Using the NLTK Tokenise Function
  return word_tokens

def remove_stopwords(word_tokens, language='english'):
  '''Function to remove all stopwords in given language from the inputted words tokens'''

  stop_words = set(stopwords.words(language))  # Most common English Stopwords
  word_tokens = [w for w in word_tokens if not w in stop_words]  # Get list of words if they are not stopwords
  return word_tokens

glove = GloVe(dim='50', name='6B', max_vectors=20000)  # Get the Glove with 50 dimension vector with a vocabulary size of 20_000

def get_index(text, vocab=glove):
  '''Function that gets the index of each token in a text from the GloVe Library'''

  embedded_text = []
  for word in text:
     try:
         embedded_text.append(glove.stoi[word])  # Get String to Integer
     except:
         pass
  return embedded_text  # return list of the indices of the tokenised words in the GloVe library


def pad_sentence(text, MAX_LENGTH = 100):
  ''' Function that Pads a sentence to a given length'''

  if text.shape[0]>=MAX_LENGTH:
      return text[:MAX_LENGTH]
  else:
      return pad(text, (0, MAX_LENGTH-text.shape[0]), 'constant',0).long()


# Final Transform Function: 

def transform_text(text):
  '''Function that applies all the Data-Preprocessing Functions'''
  
  text = text.lower()
  text = remove_numbers(text)
  text = remove_punctuation(text)
  text = tokenize(text)
  text = remove_stopwords(text)
  text = torch.tensor(get_index(text)).long()
  return pad_sentence(text)

.vector_cache/glove.6B.zip: 862MB [02:41, 5.35MB/s]                           
100%|█████████▉| 19999/20000 [00:00<00:00, 46678.59it/s]


Now that we have setup our data-preprocessing, let's test it out on an example from our training dataset: 

In [7]:
example_train = list(train_dataset)[5][1]  # Get a random train data
transform_text(example_train)

tensor([   54,   339,   220,   674,  1588,  2891,  8560,  1588,   978,  1607,
          921,  2001, 12073,   117,  2837,   219,  1739, 11184, 10487,   122,
          151, 12425,   175,  2782,  1378,  6959,   152,   164,  9797,  4629,
          364,   319,  1607,  1380,   570,  6801,  5412,   521,   298,  3468,
         1254,   492,  1797,  7582,   151,  1507,   978,    69,   580,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0])

Now that our pre-processing function seem to work as we want - Let's finalise the dataset with a dataloader. 

To not apply our transform function to all the data at once, let'd just apply it batch by batch from the dataloader using transform_batch function: 

In [8]:
train_y = torch.tensor([item[0] for item in list(train_dataset)])-1
train_x = torch.stack([transform_text(item[1]) for item in list(train_dataset)])

val_y = torch.tensor([item[0] for item in list(test_dataset)])-1
val_x = torch.stack([transform_text(item[1]) for item in list(test_dataset)])

test_y = torch.tensor([item[0] for item in list(test_dataset)])-1
test_x = torch.stack([transform_text(item[1]) for item in list(test_dataset)])

In [9]:
def transform_batch(batch):
    Y, X = list(zip(*batch))
    
    X_embedded = torch.stack([transform_text(txt) for txt in X])  # Get the transformed - embedded text
    
    return X_embedded, torch.tensor(Y).long()-1  # Return the Embedded text and the Y variable as .long() as it is a categorical label

train_dataset=  to_map_style_dataset(train_dataset)  # We will be using the to_map_style_dataset as it CHECK WHAT IT DOES

train_loader = DataLoader(train_dataset, batch_size=512, collate_fn=transform_batch, shuffle=True)  # Make sure to have shuffle on true for best training

In [None]:
# Checking the batch sizes
# for X, Y in train_loader:
#     print(X.shape, Y.shape)
#     break

torch.Size([256, 150]) torch.Size([256])


Let's use the GloVe vectors in our embedding layer, where we inputted 50 dimensions for each vector and a vocabulary size of 20_000.

Here is an implementation of a embedded layer taken from my lecture in DL:

In [10]:
def create_emb_layer(weights_matrix, non_trainable=True):
    num_embeddings, embedding_dim = weights_matrix.size()
    emb_layer = nn.Embedding(num_embeddings, embedding_dim,padding_idx=0)
    emb_layer.load_state_dict({'weight': weights_matrix})
    if non_trainable:
        emb_layer.weight.requires_grad = False

    return emb_layer, num_embeddings, embedding_dim

## Now that we have all of the preprocessing, the embedding, and the data loaders ready, we can start thinking of Deep Learning Models.

In [11]:
class LSTM(nn.Module):
    def __init__(self, hid_dim, output_dim,drop_out=0., num_layers = 2):
        super(LSTM, self).__init__()
        
        self.hid_dim = hid_dim
        
        self.embedding, num_embeddings, embedding_dim = create_emb_layer(glove.vectors, False)
        
        n_layers = num_layers

        self.lstm = nn.LSTM(embedding_dim, hid_dim, n_layers,dropout=drop_out, batch_first=True)
        self.linear = nn.Linear(hid_dim,100)
        self.relu = nn.ReLU()
        self.fc = nn.Linear(100, output_dim)
        self.dropout = nn.Dropout(.5)
        
        self.reset_parameters()
        
    def reset_parameters(self):
        std= 1.0 / np.sqrt(self.hid_dim)
        
        for w in self.parameters():
            w.data.uniform_(-std, std)
        

    def forward(self, text):

        embedded = self.embedding(text)


        batch_size, seq_len,  _ = embedded.shape
        hid_dim = self.lstm.hidden_size
            
        outputs, (hidden, cell) = self.lstm(embedded)

        outputs = outputs[:, -1]
        
        prediction = self.fc(self.dropout(self.relu(self.linear(outputs))))


        return prediction

In [12]:
def CalcValLossAndAccuracy(model, loss_fn, val_X, val_Y):
    
    #print(f'Calculating Epoch Loss and Accuracy:')
    
    losses = []
    accuracies = []
    model.eval()
    with torch.no_grad():
        X, Y, title = (val_x, val_y,'Validation')
        X = val_X.to(device)
        Y = val_Y.to(device)
            
        outputs = model(X).squeeze()
        loss = loss_fn(outputs, Y.float())
            
        preds = [1 if p>=.5 else 0 for p in torch.sigmoid(outputs)]
        accuracy = accuracy_score(Y.detach().cpu().numpy().tolist(),preds)
            
        accuracies.append(accuracy)
        losses.append(loss)

        
        print(f'{title} Loss : {loss:.3f}')
        print(f"{title} Accuracy  : {accuracy:.3f}")
    
    return losses, accuracies


def TrainModel(model, loss_fn, optimizer, train_loader, epochs=10):
    train_losses = []
    train_accuracy = []
    val_losses = []
    val_accuracy = []
    
    for i in range(1, epochs+1):
        
        print('-'*100)
        print(f'EPOCH {i}')
        print('-'*100)
        
        epoch_losses = []

        model.train()

        
        for X, Y in tqdm(train_loader, colour='BLUE'):

            X = X.to(device)
            Y = Y.to(device)
            
            Y_preds = model(X).squeeze()
            loss = loss_fn(Y_preds, Y.float())
            
            epoch_losses.append(loss.item())

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        print("Train Loss : {:.3f}".format(torch.tensor(epoch_losses).mean()))
        
        losses, acc = CalcValLossAndAccuracy(model, loss_fn, X, Y)
        train_losses.append(losses[0])
        train_accuracy.append(losses[0])
        val_losses.append(acc[0])
        val_accuracy.append(acc[0])
        
    return train_losses, val_losses, train_accuracy, val_accuracy

def evaluateModel(model, loss_fn, x, Y):

  model.eval()
  with torch.no_grad():
    x = x.to(device)
    Y = Y.to(device)

    outputs = model(x).squeeze()

    preds = [1 if p>=.5 else 0 for p in torch.sigmoid(outputs)]
    accuracy = accuracy_score(Y.detach().cpu().numpy().tolist(),preds)

  return accuracy

In [20]:
set_seed(42)


epochs = 2
learning_rate = 1e-3

loss_fn = nn.BCEWithLogitsLoss()
text_classifier_2_epochs = LSTM(20,1).to(device)

optimizer = RMSprop(text_classifier_2_epochs.parameters(), lr=learning_rate)

print("STARTING TRAINING")
print("MODEL ARCHITECTURE:")
print(text_classifier_2_epochs)
print(" ")


TrainModel(text_classifier_2_epochs, loss_fn, optimizer, train_loader, epochs)

STARTING TRAINING
MODEL ARCHITECTURE:
LSTM(
  (embedding): Embedding(20000, 50, padding_idx=0)
  (lstm): LSTM(50, 20, num_layers=2, batch_first=True)
  (linear): Linear(in_features=20, out_features=100, bias=True)
  (relu): ReLU()
  (fc): Linear(in_features=100, out_features=1, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)
 
----------------------------------------------------------------------------------------------------
EPOCH 1
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:27<00:00,  1.79it/s]


Train Loss : 0.695
Validation Loss : 0.692
Validation Accuracy  : 0.521
----------------------------------------------------------------------------------------------------
EPOCH 2
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:27<00:00,  1.79it/s]

Train Loss : 0.622
Validation Loss : 0.490
Validation Accuracy  : 0.804





([tensor(0.6917, device='cuda:0'), tensor(0.4898, device='cuda:0')],
 [0.5212264150943396, 0.8042452830188679],
 [tensor(0.6917, device='cuda:0'), tensor(0.4898, device='cuda:0')],
 [0.5212264150943396, 0.8042452830188679])

Let's Checkout the model we have trained: 

In [21]:
test_dataset[10]

(2,
 "The Man Who Knew Too Much{1956}is a remake of a film that Alfred Hitchcock made in England in 1934 with the same name. In my opinion, his later effort is far superior. Many critics and fans of Alfred Hitchcock will argue that the remake is mediocre and doesn't have the spine tingling suspense of the original with Peter Lorre. In both films the plot is essentially the same, except the original is set in Switzerland and the remake in Marrakech . It tells the story of a married couple {James Stewart and Doris Day}vacationing with their young son and meeting a suspicious man, that is very curious about their past. It just so happens, he's an agent that's looking for a couple involved in a plot to assassinate a world leader.Then he gets stabbed in a Marrakeck market because of it being found out that he's a spy,and proceeds to fall into Stewart's arms.Dying,he tells him the whole story of the assassination plot.Stewart and Day then find out that another couple they met were the couple

In [22]:
text_classifier_2_epochs.eval()
with torch.no_grad():
    print('Predicted values with only 2 epochs: ')
    print(torch.sigmoid(text_classifier_2_epochs(test_x[:5].to(device))))
    print(test_y[:5])
    print(f'The Accuracy of our model with 2 epochs is of {evaluateModel(text_classifier_2_epochs, loss_fn, test_x, test_y)}')

Predicted values with only 2 epochs: 
tensor([[0.1600],
        [0.3135],
        [0.1773],
        [0.7810],
        [0.1773]], device='cuda:0')
tensor([1, 0, 0, 1, 0])
The Accuracy of our model with 2 epochs is of 0.76305


# Let's now try training the model with 10 epochs:

In [23]:
set_seed(42)

epochs = 10
learning_rate = 1e-3


loss_fn = nn.BCEWithLogitsLoss()
text_classifier_10_epochs = LSTM(20,1).to(device)

optimizer = RMSprop(text_classifier_10_epochs.parameters(), lr=learning_rate)

print("STARTING TRAINING")
print("MODEL ARCHITECTURE:")
print(text_classifier_10_epochs)
print(" ")


TrainModel(text_classifier_10_epochs, loss_fn, optimizer, train_loader, epochs)

STARTING TRAINING
MODEL ARCHITECTURE:
LSTM(
  (embedding): Embedding(20000, 50, padding_idx=0)
  (lstm): LSTM(50, 20, num_layers=2, batch_first=True)
  (linear): Linear(in_features=20, out_features=100, bias=True)
  (relu): ReLU()
  (fc): Linear(in_features=100, out_features=1, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)
 
----------------------------------------------------------------------------------------------------
EPOCH 1
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:27<00:00,  1.78it/s]


Train Loss : 0.695
Validation Loss : 0.692
Validation Accuracy  : 0.521
----------------------------------------------------------------------------------------------------
EPOCH 2
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:27<00:00,  1.80it/s]


Train Loss : 0.622
Validation Loss : 0.490
Validation Accuracy  : 0.804
----------------------------------------------------------------------------------------------------
EPOCH 3
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:27<00:00,  1.81it/s]


Train Loss : 0.458
Validation Loss : 0.352
Validation Accuracy  : 0.880
----------------------------------------------------------------------------------------------------
EPOCH 4
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:28<00:00,  1.69it/s]


Train Loss : 0.396
Validation Loss : 0.335
Validation Accuracy  : 0.889
----------------------------------------------------------------------------------------------------
EPOCH 5
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:27<00:00,  1.79it/s]


Train Loss : 0.334
Validation Loss : 0.234
Validation Accuracy  : 0.920
----------------------------------------------------------------------------------------------------
EPOCH 6
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:27<00:00,  1.80it/s]


Train Loss : 0.302
Validation Loss : 0.234
Validation Accuracy  : 0.925
----------------------------------------------------------------------------------------------------
EPOCH 7
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:27<00:00,  1.81it/s]


Train Loss : 0.263
Validation Loss : 0.255
Validation Accuracy  : 0.892
----------------------------------------------------------------------------------------------------
EPOCH 8
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:26<00:00,  1.82it/s]


Train Loss : 0.247
Validation Loss : 0.208
Validation Accuracy  : 0.939
----------------------------------------------------------------------------------------------------
EPOCH 9
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:27<00:00,  1.80it/s]


Train Loss : 0.224
Validation Loss : 0.241
Validation Accuracy  : 0.910
----------------------------------------------------------------------------------------------------
EPOCH 10
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:29<00:00,  1.64it/s]

Train Loss : 0.209
Validation Loss : 0.118
Validation Accuracy  : 0.974





([tensor(0.6917, device='cuda:0'),
  tensor(0.4898, device='cuda:0'),
  tensor(0.3518, device='cuda:0'),
  tensor(0.3346, device='cuda:0'),
  tensor(0.2336, device='cuda:0'),
  tensor(0.2339, device='cuda:0'),
  tensor(0.2550, device='cuda:0'),
  tensor(0.2081, device='cuda:0'),
  tensor(0.2414, device='cuda:0'),
  tensor(0.1183, device='cuda:0')],
 [0.5212264150943396,
  0.8042452830188679,
  0.8797169811320755,
  0.8891509433962265,
  0.9198113207547169,
  0.9245283018867925,
  0.8915094339622641,
  0.9386792452830188,
  0.910377358490566,
  0.9740566037735849],
 [tensor(0.6917, device='cuda:0'),
  tensor(0.4898, device='cuda:0'),
  tensor(0.3518, device='cuda:0'),
  tensor(0.3346, device='cuda:0'),
  tensor(0.2336, device='cuda:0'),
  tensor(0.2339, device='cuda:0'),
  tensor(0.2550, device='cuda:0'),
  tensor(0.2081, device='cuda:0'),
  tensor(0.2414, device='cuda:0'),
  tensor(0.1183, device='cuda:0')],
 [0.5212264150943396,
  0.8042452830188679,
  0.8797169811320755,
  0.88915094

Let's evaluate the model: 

In [26]:
text_classifier_10_epochs.eval()
print(f'The Accuracy of our model with 10 epochs is of {evaluateModel(text_classifier_10_epochs, loss_fn, test_x, test_y)}')

The Accuracy of our model with 10 epochs is of 0.8148


### Now let's use the next cell for hyper-parameter tuning: 

In [29]:
set_seed(42)

epochs = 10
learning_rate = 1e-3

loss_fn = nn.BCEWithLogitsLoss()
text_classifier_hyper = LSTM(20,1, 1).to(device)

optimizer = RMSprop(text_classifier_hyper.parameters(), lr=learning_rate)

print("STARTING TRAINING")
print("MODEL ARCHITECTURE:")
print(text_classifier_hyper)
print(" ")


TrainModel(text_classifier_hyper, loss_fn, optimizer, train_loader, epochs)
evaluateModel(text_classifier_hyper, loss_fn, test_x, test_y)

STARTING TRAINING
MODEL ARCHITECTURE:
LSTM(
  (embedding): Embedding(20000, 50, padding_idx=0)
  (lstm): LSTM(50, 20, batch_first=True)
  (linear): Linear(in_features=20, out_features=100, bias=True)
  (relu): ReLU()
  (fc): Linear(in_features=100, out_features=1, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)
 
----------------------------------------------------------------------------------------------------
EPOCH 1
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:28<00:00,  1.74it/s]


Train Loss : 0.693
Validation Loss : 0.691
Validation Accuracy  : 0.498
----------------------------------------------------------------------------------------------------
EPOCH 2
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:26<00:00,  1.83it/s]


Train Loss : 0.664
Validation Loss : 0.543
Validation Accuracy  : 0.757
----------------------------------------------------------------------------------------------------
EPOCH 3
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:26<00:00,  1.83it/s]


Train Loss : 0.473
Validation Loss : 0.381
Validation Accuracy  : 0.868
----------------------------------------------------------------------------------------------------
EPOCH 4
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:27<00:00,  1.80it/s]


Train Loss : 0.393
Validation Loss : 0.347
Validation Accuracy  : 0.875
----------------------------------------------------------------------------------------------------
EPOCH 5
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:26<00:00,  1.82it/s]


Train Loss : 0.336
Validation Loss : 0.215
Validation Accuracy  : 0.941
----------------------------------------------------------------------------------------------------
EPOCH 6
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:26<00:00,  1.82it/s]


Train Loss : 0.317
Validation Loss : 0.310
Validation Accuracy  : 0.901
----------------------------------------------------------------------------------------------------
EPOCH 7
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:27<00:00,  1.81it/s]


Train Loss : 0.264
Validation Loss : 0.218
Validation Accuracy  : 0.939
----------------------------------------------------------------------------------------------------
EPOCH 8
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:35<00:00,  1.36it/s]


Train Loss : 0.253
Validation Loss : 0.226
Validation Accuracy  : 0.929
----------------------------------------------------------------------------------------------------
EPOCH 9
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:29<00:00,  1.65it/s]


Train Loss : 0.229
Validation Loss : 0.237
Validation Accuracy  : 0.915
----------------------------------------------------------------------------------------------------
EPOCH 10
----------------------------------------------------------------------------------------------------


100%|[34m██████████[0m| 49/49 [00:29<00:00,  1.65it/s]


Train Loss : 0.205
Validation Loss : 0.212
Validation Accuracy  : 0.934


0.8215

In [30]:
text_classifier_hyper.eval()
with torch.no_grad():
    print('Predicted values with only 10 epochs: ')
    print(torch.sigmoid(text_classifier_hyper(test_x[:10].to(device))))
    print(test_y[:10])
    print(f'The Accuracy of our model whith these hyper parameters is of {evaluateModel(text_classifier_hyper, loss_fn, test_x, test_y)}')

Predicted values with only 10 epochs: 
tensor([[0.2351],
        [0.0915],
        [0.0650],
        [0.9899],
        [0.0670],
        [0.0536],
        [0.9788],
        [0.9686],
        [0.9752],
        [0.4494]], device='cuda:0')
tensor([1, 0, 0, 1, 0, 0, 0, 0, 1, 1])
The Accuracy of our model whith these hyper parameters is of 0.8215


Let's Try using the same above model but with the Mish activation function - This one seemed to work very well with generative models as it is self-regularising. 

Let's see how it performs here:

In [46]:
class LSTM_Mish(nn.Module):
    def __init__(self, hid_dim, output_dim,drop_out=0., num_layers = 2):
        super(LSTM, self).__init__()
        
        self.hid_dim = hid_dim
        
        self.embedding, num_embeddings, embedding_dim = create_emb_layer(glove.vectors, False)
        
        n_layers = num_layers

        self.lstm = nn.LSTM(embedding_dim, hid_dim, n_layers,dropout=drop_out, batch_first=True)
        self.linear = nn.Linear(hid_dim,100)
        self.relu = nn.Mish()
        self.fc = nn.Linear(100, output_dim)
        self.dropout = nn.Dropout(.5)
        
        self.reset_parameters()
        
    def reset_parameters(self):
        std= 1.0 / np.sqrt(self.hid_dim)
        
        for w in self.parameters():
            w.data.uniform_(-std, std)
        

    def forward(self, text):

        embedded = self.embedding(text)


        batch_size, seq_len,  _ = embedded.shape
        hid_dim = self.lstm.hidden_size
            
        outputs, (hidden, cell) = self.lstm(embedded)

        outputs = outputs[:, -1]
        
        prediction = self.fc(self.dropout(self.relu(self.linear(outputs))))


        return prediction

In [47]:
set_seed(42)

epochs = 10
learning_rate = 1e-3

loss_fn = nn.BCEWithLogitsLoss()
text_classifier_Mish = LSTM_Mish(20,1, 1).to(device)

optimizer = RMSprop(text_classifier_Mish.parameters(), lr=learning_rate)

print("STARTING TRAINING")
print("MODEL ARCHITECTURE:")
print(text_classifier_Mish)
print(" ")


TrainModel(text_classifier_Mish, loss_fn, optimizer, train_loader, epochs)
evaluateModel(text_classifier_Mish, loss_fn, test_x, test_y)

TypeError: ignored

-0.53915 with 2 epochs and 20 hidden dim and 2 lstm layers

-0.54175 with 2 epochs and 20 hidden dim and 3 lstm layers

-0.5472 with 2 epochs and 20 hidden dim and 5 lstm layers

-0.61775 with 2 epochs and 40 hidden dim and 5 lstm layers

-0.50055 with 10 epochs and 40 hidden dim and 5 lstm layers