<a href="https://colab.research.google.com/github/gupta24789/sentiment-analysis/blob/main/sentiment_lstm_lighting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

There are many variations of RNN


- Single Layer RNN

![](https://github.com/gupta24789/sentiment-analysis/blob/main/imgaes/rnn.jpg?raw=1)

- Multi Layer RNN

Multi-layer RNNs (also called *deep RNNs*) are another simple concept. The idea is that we add additional RNNs on top of the initial standard RNN, where each RNN added is another *layer*. The hidden state output by the first (bottom) RNN at time-step $t$ will be the input to the RNN above it at time step $t$. The prediction is then made from the final hidden state of the final (highest) layer.

The image below shows a multi-layer unidirectional RNN. Also note that each layer needs their own initial hidden state, $h_0^L$.

![](https://github.com/gupta24789/sentiment-analysis/blob/main/imgaes/multi_layer.jpg?raw=1)

- Bidirectional RNN

The concept behind a bidirectional RNN is simple. As well as having an RNN processing the words in the sentence from the first to the last (a forward RNN), we have a second RNN processing the words in the sentence from the **last to the first** (a backward RNN). At time step $t$, the forward RNN is processing word $x_t$, and the backward RNN is processing word $x_{T-t+1}$.

In PyTorch, the hidden state (and cell state) tensors returned by the forward and backward RNNs are stacked on top of each other in a single tensor.

We make our sentiment prediction using a concatenation of the last hidden state from the forward RNN (obtained from final word of the sentence), $h_T^\rightarrow$, and the last hidden state from the backward RNN (obtained from the first word of the sentence), $h_T^\leftarrow$, i.e. $\hat{y}=f(h_T^\rightarrow, h_T^\leftarrow)$   

The image below shows a bi-directional RNN.

![](https://github.com/gupta24789/sentiment-analysis/blob/main/imgaes/bidirectional.jpg?raw=1)


---


## LSTM

We'll be using a different RNN architecture called a Long Short-Term Memory (LSTM). Why is an LSTM better than a standard RNN? Standard RNNs suffer from the vanishing gradient problem. LSTMs overcome this by having an extra recurrent state called a _cell_, $c$ - which can be thought of as the "memory" of the LSTM - and the use multiple _gates_ which control the flow of information into and out of the memory. We can simply think of the LSTM as a function of $x_t$, $h_t$ and $c_t$, instead of just $x_t$ and $h_t$.

$$(h_t, c_t) = \text{LSTM}(x_t, h_t, c_t)$$

Thus, the model using an LSTM looks something like:


![](https://github.com/gupta24789/sentiment-analysis/blob/main/imgaes/lstm.jpg?raw=1)


The initial cell state, $c_0$, like the initial hidden state is initialized to a tensor of all zeros. The sentiment prediction is still, however, only made using the final hidden state, not the final cell state, i.e. $\hat{y}=f(h_T)$.


In [9]:
!pip install -q lightning
!pip install -q neattext

In [10]:
import pandas as pd
import numpy as np
import neattext as nt
import itertools

import warnings
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
from torch.nn.utils.rnn import  pad_sequence
from torch.utils.data import Dataset, DataLoader, TensorDataset


from torchmetrics import Accuracy
import pytorch_lightning as pl
from lightning.pytorch.loggers import CSVLogger
from lightning.pytorch.callbacks import TQDMProgressBar

warnings.filterwarnings('ignore')

## Set Seed

In [11]:
SEED = 1234
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

## Utilities

In [12]:
## Clean the data
def custom_clean_text(x):
  x = nt.TextFrame(x)
  x = x.remove_stopwords().remove_urls().remove_emails().remove_dates().remove_puncts().remove_numbers().remove_userhandles().remove_multiple_spaces()
  x = x.text.lower()
  return x

def convert_token_to_number(tweet, verbose = False):
  unk_token_id = token2idx['__UNK__']
  encoded_tweet = []

  if verbose:
    print(f"UNK TOKEN ID : {unk_token_id}")
    print(f"RAW TWEET : {tweet}")

  for w in tweet.split(" "):
    if w in token2idx:
      encoded_tweet.append(token2idx[w])
    else:
      encoded_tweet.append(unk_token_id)

  return encoded_tweet

## Load Data

In [13]:
train_df = pd.read_csv("https://raw.githubusercontent.com/gupta24789/sentiment-analysis/main/data/train.csv")
val_df = pd.read_csv("https://raw.githubusercontent.com/gupta24789/sentiment-analysis/main/data/val.csv")
train_df = train_df[['raw_tweet', 'label']].dropna().reset_index(drop = True)
val_df = val_df[['raw_tweet', 'label']].dropna().reset_index(drop = True)

## clean data
train_df['processed_text'] = train_df.raw_tweet.apply(lambda x: custom_clean_text(x))
val_df['processed_text'] = val_df.raw_tweet.apply(lambda x: custom_clean_text(x))

## Train & val data
X_train = train_df.processed_text
y_train = train_df.label
X_val = val_df.processed_text
y_val = val_df.label

In [14]:
train_df.head(4)

Unnamed: 0,raw_tweet,label,processed_text
0,Want to say a huge thanks to @WarriorAssaultS ...,1.0,want huge thanks #ff thanks support :)
1,@jaynehh_ you just need a job and get a letter...,1.0,need job letter work place saying work letter...
2,"@knhillrocks HA yes, make it quick tho :D",1.0,ha yes quick tho :d
3,@shartyboy Thanks for texting me back :)) I'm ...,1.0,thanks texting :)) im texting tomorrow :))


## Create Vocab

In [15]:
special_tokens = ['__PAD__','__UNK__']
words = list(set(itertools.chain.from_iterable(train_df.processed_text.apply(lambda x: x.split(" ")))))
words = special_tokens +  words
token2idx = {w:i for i,w in enumerate(words)}
idx2tokens = {i:w for i,w in enumerate(words)}
print(f"vocab size : {len(token2idx)}")

vocab size : 11332


## Convert words to numbers

In [16]:
X_train_encoded = X_train.apply(lambda x: convert_token_to_number(x))
X_val_encoded = X_val.apply(lambda x: convert_token_to_number(x))

In [17]:
X_train_encoded[:2]

0           [2601, 8306, 2615, 5743, 2615, 5196, 3209]
1    [2, 1884, 950, 7196, 10194, 3110, 11005, 10194...
Name: processed_text, dtype: object

## Data Loader

- use **pad_sequence** function to get the same length of all tweet inside the batch.
- length of tweets could be different across the batches as it calculates the max length per batch and converts very tweet to that length.

In [21]:
def data_collator(batch):
  features = [torch.tensor(item[0]) for item in batch]
  labels = torch.tensor([item[1] for item in batch], dtype = torch.float32)
  features = pad_sequence(features, batch_first=True, padding_value= token2idx['__PAD__'])

  return (features, labels)

In [22]:
batch_size = 32
train_dl = DataLoader(list(zip(X_train_encoded, y_train)), batch_size = batch_size, collate_fn = data_collator)
val_dl = DataLoader(list(zip(X_val_encoded, y_val)), batch_size = batch_size, collate_fn = data_collator)

In [23]:
example = next(iter(train_dl))
features, labels = example[0], example[1]
features.shape, labels.shape

(torch.Size([32, 15]), torch.Size([32]))

## Model

In [36]:
class LSTMModel(pl.LightningModule):

  def __init__(self, vocab_size , embedding_dim, hidden_dim, output_dim, n_layers, bidirectional, dropout, learning_rate, pad_idx):
    super().__init__()
    self.learning_rate = learning_rate

    ## loss & metrics
    self.loss_fn = nn.BCELoss()
    num_classes = 2 if output_dim == 1 else output_dim
    self.train_accuracy = Accuracy(task = 'binary', num_classes= num_classes, threshold = 0.5)
    self.val_accuracy = Accuracy(task = 'binary', num_classes = num_classes, threshold= 0.5)

    ## define layers
    self.embedding = nn.Embedding(num_embeddings= vocab_size, embedding_dim= embedding_dim, padding_idx=pad_idx)
    self.lstm = nn.LSTM(embedding_dim, hidden_dim,
                        batch_first = True,
                        num_layers= n_layers,
                        bidirectional= bidirectional,
                        dropout=dropout)
    self.linear = nn.Linear(hidden_dim * 2 , output_dim)
    self.sigmoid = nn.Sigmoid()
    self.dropout = nn.Dropout(dropout)


  def forward(self, features, verbose = False):

    # features : [batch size, sent len]

    embedded = self.dropout(self.embedding(features))

    ## embedded : [batch size, sent len, embedding dim]

    output, (hidden, cell) = self.lstm(embedded)

    #output = [batch size, sent len, hid dim * num directions]
    #output over padding tokens are zero tensors

    #hidden = [batch_size, num layers * num directions, hidden dim]
    #cell = [batch size, num layers * num directions, hidden dim]

    #concat the final forward (hidden[-2,:,:]) and backward (hidden[-1,:,:]) hidden layers and apply dropout
    final_hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))

    #final_hidden = [batch size, hid dim * num directions]
    linear_out = self.linear(final_hidden)
    sigmoid_out = self.sigmoid(linear_out)

    if verbose:
      print(f"input shape : {features.shape}")
      print(f"emb shape : {embedded.shape}")
      print(f"lstm output shape : {output.shape}")
      print(f"lstm hidden shape : {hidden.shape}")
      print(f"lstm cell shape : {cell.shape}")

      print(f"final_hidden shape : {final_hidden.shape}")
      print(f"linear_out shape : {linear_out.shape}")
      print(f"sigmoid_out shape : {sigmoid_out.shape}")

    return sigmoid_out


  def training_step(self, batch, batch_idx):
    features, labels = batch[0], batch[1]
    logits = self(features)
    logits = logits.squeeze(dim=1)
    loss = self.loss_fn(logits, labels)
    self.train_accuracy(logits, labels)
    self.log_dict({"train_loss": loss, "train_acc": self.train_accuracy}, on_step = False, on_epoch = True, prog_bar = True)

  def validation_step(self,batch, batch_idx):
    features, labels = batch[0], batch[1]
    logits = self(features)
    logits = logits.squeeze(dim=1)
    loss = self.loss_fn(logits, labels)
    self.val_accuracy(logits, labels)
    self.log_dict({"val_loss": loss, "val_acc":  self.val_accuracy}, on_step = False, on_epoch = True, prog_bar = True)

  def on_train_epoch_end(self):
    self.train_accuracy.reset()

  def on_validation_epoch_end(self):
    print(f"Epoch : {self.current_epoch} val accuracy : {self.val_accuracy.compute()}")
    self.val_accuracy.reset()

  def configure_optimizers(self):
    optimizer = optim.Adam(self.parameters(), lr = self.learning_rate)
    return optimizer

In [37]:
# ## Test the model
# model = LSTMModel(vocab_size = len(token2idx),
#                   embedding_dim = 100,
#                   hidden_dim = 256,
#                   output_dim = 1,
#                   n_layers = 1,
#                   bidirectional = True,
#                   dropout = 0.25,
#                   learning_rate = 1e-3)


# logits = model(features, verbose = True)
# logits = logits.squeeze(dim=1)
# loss = model.loss_fn(logits, labels)
# print(f"\nLoss: {loss}")

In [38]:
## Build Trainer
model = LSTMModel(vocab_size = len(token2idx),
                  embedding_dim = 100,
                  hidden_dim = 256,
                  output_dim = 1,
                  n_layers = 2,
                  bidirectional = True,
                  dropout = 0.5,
                  learning_rate = 1e-3,
                  pad_idx= token2idx['__PAD__']
                  )

## logger
logger = pl.loggers.CSVLogger("logs", name="sentiment_analysis")

## checkpoints
checkpoint_callback  = pl.callbacks.ModelCheckpoint(
                                                filename='{epoch}-{val_loss:.2f}-{val_accuracy:.2f}',
                                                every_n_epochs = 2,
                                                save_top_k = -1,
                                                monitor='val_loss',
                                                )


trainer = pl.Trainer(accelerator="cpu",
                     max_epochs = 5,
                     check_val_every_n_epoch=1,
                     callbacks=[checkpoint_callback],
                     logger=logger

                    )

## Train the Model
trainer.fit(model, train_dl, val_dl)

INFO: GPU available: False, used: False
INFO:lightning.pytorch.utilities.rank_zero:GPU available: False, used: False
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO:lightning.pytorch.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO:lightning.pytorch.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.callbacks.model_summary:
  | Name           | Type           | Params
--------------------------------------------------
0 | loss_fn        | BCELoss        | 0     
1 | train_accuracy | BinaryAccuracy | 0     
2 | val_accuracy   | BinaryAccuracy | 0     
3 | embedding      | Embedding      | 1.1 M 
4 | lstm           | LSTM           | 733 K 
5 | linear         | Linear         | 513   
6 | sigmoid        | Sigmoid        | 0     
7 | dropout        | Dropout        | 0     
------

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Epoch : 0 val accuracy : 0.625


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Epoch : 0 val accuracy : 0.47099998593330383


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch : 1 val accuracy : 0.47099998593330383


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch : 2 val accuracy : 0.47099998593330383


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch : 3 val accuracy : 0.47099998593330383


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: `Trainer.fit` stopped: `max_epochs=5` reached.
INFO:lightning.pytorch.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.


Epoch : 4 val accuracy : 0.47099998593330383
