<a href="https://colab.research.google.com/github/gupta24789/sentiment-analysis/blob/main/sentiment_lstm_lighting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

There are many variations of RNN


- Single Layer RNN

![](https://github.com/gupta24789/sentiment-analysis/blob/main/imgaes/rnn.jpg?raw=1)

- Multi Layer RNN

Multi-layer RNNs (also called *deep RNNs*) are another simple concept. The idea is that we add additional RNNs on top of the initial standard RNN, where each RNN added is another *layer*. The hidden state output by the first (bottom) RNN at time-step $t$ will be the input to the RNN above it at time step $t$. The prediction is then made from the final hidden state of the final (highest) layer.

The image below shows a multi-layer unidirectional RNN. Also note that each layer needs their own initial hidden state, $h_0^L$.

![](https://github.com/gupta24789/sentiment-analysis/blob/main/imgaes/multi_layer.jpg?raw=1)

- Bidirectional RNN

The concept behind a bidirectional RNN is simple. As well as having an RNN processing the words in the sentence from the first to the last (a forward RNN), we have a second RNN processing the words in the sentence from the **last to the first** (a backward RNN). At time step $t$, the forward RNN is processing word $x_t$, and the backward RNN is processing word $x_{T-t+1}$.

In PyTorch, the hidden state (and cell state) tensors returned by the forward and backward RNNs are stacked on top of each other in a single tensor.

We make our sentiment prediction using a concatenation of the last hidden state from the forward RNN (obtained from final word of the sentence), $h_T^\rightarrow$, and the last hidden state from the backward RNN (obtained from the first word of the sentence), $h_T^\leftarrow$, i.e. $\hat{y}=f(h_T^\rightarrow, h_T^\leftarrow)$   

The image below shows a bi-directional RNN.

![](https://github.com/gupta24789/sentiment-analysis/blob/main/imgaes/bidirectional.jpg?raw=1)


---


## LSTM

We'll be using a different RNN architecture called a Long Short-Term Memory (LSTM). Why is an LSTM better than a standard RNN? Standard RNNs suffer from the vanishing gradient problem. LSTMs overcome this by having an extra recurrent state called a _cell_, $c$ - which can be thought of as the "memory" of the LSTM - and the use multiple _gates_ which control the flow of information into and out of the memory. We can simply think of the LSTM as a function of $x_t$, $h_t$ and $c_t$, instead of just $x_t$ and $h_t$.

$$(h_t, c_t) = \text{LSTM}(x_t, h_t, c_t)$$

Thus, the model using an LSTM looks something like:


![](https://github.com/gupta24789/sentiment-analysis/blob/main/imgaes/lstm.jpg?raw=1)


The initial cell state, $c_0$, like the initial hidden state is initialized to a tensor of all zeros. The sentiment prediction is still, however, only made using the final hidden state, not the final cell state, i.e. $\hat{y}=f(h_T)$.


In [1]:
!pip install -q pytorch-lightning

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m777.7/777.7 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m840.2/840.2 kB[0m [31m53.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import pandas as pd
import numpy as np
import itertools
import warnings

import re
import string
import nltk
from nltk.stem import PorterStemmer
from nltk.tokenize import TweetTokenizer
from nltk.corpus import stopwords

nltk.download('stopwords')

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
from torch.nn.utils.rnn import  pad_sequence
from torch.utils.data import Dataset, DataLoader

from torchmetrics import Accuracy
import pytorch_lightning as pl

warnings.filterwarnings('ignore')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


## Set Seed

In [3]:
SEED = 1234
np.random.seed(SEED)
torch.manual_seed(SEED)
pl.seed_everything(SEED)
torch.backends.cudnn.deterministic = True

INFO:lightning_fabric.utilities.seed:Seed set to 1234


## Utilities

In [4]:
def process_tweet(tweet):
    """Process tweet function.
    Input:
        tweet: a string containing a tweet
    Output:
        tweets_clean: a list of words containing the processed tweet

    """
    stemmer = PorterStemmer()
    stopwords_english = stopwords.words('english')
    # remove stock market tickers like $GE
    tweet = re.sub(r'$\w*', '', tweet)
    # remove old style retweet text "RT"
    tweet = re.sub(r'^RT[\s]+', '', tweet)
    # remove hyperlinks
    tweet = re.sub(r'https?:\/\/.*[\r\n]*', '', tweet)
    # remove hashtags
    # only removing the hash # sign from the word
    tweet = re.sub(r'#', '', tweet)
    # tokenize tweets
    tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True,reduce_len=True)
    tweet_tokens = tokenizer.tokenize(tweet)

    tweets_clean = []
    for word in tweet_tokens:
        if (word not in stopwords_english and  # remove stopwords
                word not in string.punctuation):  # remove punctuation
            # tweets_clean.append(word)
            stem_word = stemmer.stem(word)  # stemming word
            tweets_clean.append(stem_word)

    return tweets_clean

## Load Data

In [5]:
train_df = pd.read_csv("https://raw.githubusercontent.com/gupta24789/sentiment-analysis/main/data/train.csv")
val_df = pd.read_csv("https://raw.githubusercontent.com/gupta24789/sentiment-analysis/main/data/val.csv")

train_df.processed_tweet = train_df.processed_tweet.fillna('[]').apply(lambda x: eval(x) if x is not None else [])
val_df.processed_tweet = val_df.processed_tweet.fillna('[]').apply(lambda x: eval(x) if x is not None else [])

## remove blank
train_df = train_df[train_df.processed_tweet.str.len()!=0]
val_df = val_df[val_df.processed_tweet.str.len()!=0]

train_df = train_df.dropna()
val_df = val_df.dropna()

## reset index
train_df = train_df.reset_index(drop = True)
val_df = val_df.reset_index(drop = True)

In [6]:
train_df.label.value_counts()

0.0    3999
1.0    3987
Name: label, dtype: int64

In [7]:
val_df.label.value_counts()

0    1000
1     999
Name: label, dtype: int64

In [8]:
train_df.head(4)

Unnamed: 0,raw_tweet,processed_tweet,label
0,Want to say a huge thanks to @WarriorAssaultS ...,"[want, say, huge, thank, ff, thank, support, :)]",1.0
1,@jaynehh_ you just need a job and get a letter...,"[need, job, get, letter, work, place, say, wor...",1.0
2,"@knhillrocks HA yes, make it quick tho :D","[ha, ye, make, quick, tho, :d]",1.0
3,@shartyboy Thanks for texting me back :)) I'm ...,"[thank, text, back, :), i'm, text, tomorrow, :)]",1.0


## Create Vocab

In [10]:
special_words = ['__PAD__','__UNK__','</e>']
unique_words = list(set(itertools.chain.from_iterable(train_df.processed_tweet.tolist())))
vocab = special_words + unique_words
vocab = {w:i for i,w in enumerate(vocab)}
print(f"Number of words in vocab : {len(vocab)}")

Number of words in vocab : 9092


## Convert tweet to numbers

In [11]:
def tweet_to_tensor(processed_tweet_list, unk_token = '__UNK__'):
  to_tensor_list = []
  unk_token_id = vocab[unk_token]

  for w in processed_tweet_list:
    to_tensor_list.append(vocab.get(w,unk_token_id))

  to_tensor = torch.tensor(to_tensor_list)
  return to_tensor

In [12]:
train_df['tensor_tweet'] = [tweet_to_tensor(tweet) for tweet in train_df.processed_tweet]
val_df['tensor_tweet'] = [tweet_to_tensor(tweet) for tweet in val_df.processed_tweet]

In [13]:
train_df.head(3)

Unnamed: 0,raw_tweet,processed_tweet,label,tensor_tweet
0,Want to say a huge thanks to @WarriorAssaultS ...,"[want, say, huge, thank, ff, thank, support, :)]",1.0,"[tensor(4373), tensor(5484), tensor(674), tens..."
1,@jaynehh_ you just need a job and get a letter...,"[need, job, get, letter, work, place, say, wor...",1.0,"[tensor(2447), tensor(3754), tensor(2050), ten..."
2,"@knhillrocks HA yes, make it quick tho :D","[ha, ye, make, quick, tho, :d]",1.0,"[tensor(538), tensor(3929), tensor(863), tenso..."


In [15]:
train_df['lengths'] = train_df.processed_tweet.apply(lambda x: len(x))
val_df['lengths'] = val_df.processed_tweet.apply(lambda x: len(x))

In [16]:
train_data = train_df[['tensor_tweet','lengths', 'label']].reset_index(drop = True).to_dict("records")
val_data = val_df[['tensor_tweet','lengths', 'label']].reset_index(drop = True).to_dict("records")

## Data Loader

- use **pad_sequence** function to get the same length of all tweet inside the batch.
- length of tweets could be different across the batches as it calculates the max length per batch and converts very tweet to that length.

In [17]:
def custom_collate(data):
  features = [d['tensor_tweet'] for d in data]
  labels = [d['label'] for d in data]
  lengths = [d['lengths'] for d in data]

  padded_features = pad_sequence(features, batch_first=True, padding_value= vocab['__PAD__'])
  labels = torch.tensor(labels, dtype = torch.float32)
  lengths = torch.tensor(lengths, dtype = torch.long)

  batch = {"features": padded_features,"labels": labels, "lengths": lengths}
  return batch

In [20]:
train_dl = DataLoader(train_data, batch_size = 2, collate_fn = custom_collate, shuffle = True)
example = next(iter(train_dl))
feature, label, length = example['features'], example['labels'], example['lengths']

In [24]:
example = next(iter(train_dl))
example['features'].shape, example['lengths'].shape, example['labels'].shape

(torch.Size([2, 11]), torch.Size([2]), torch.Size([2]))

In [25]:
feature

tensor([[1590, 7576, 9023,    0,    0],
        [7038, 1347, 4063, 2404, 9023]])

In [26]:
label

tensor([1., 1.])

In [27]:
length

tensor([3, 5])

In [28]:
### dataloader
BATCH_SIZE = 64
train_dl = DataLoader(train_data, batch_size = BATCH_SIZE, collate_fn = custom_collate, shuffle = True)
val_dl = DataLoader(val_data, batch_size = BATCH_SIZE, collate_fn = custom_collate, shuffle = False)

## Load Glove Embedding

In [None]:
# !wget https://nlp.stanford.edu/data/glove.6B.zip
# !unzip glove.6B.zip -d glove/

In [None]:
# def load_glove_embedding(filedir, emb_dim = 100):
#   filepath = f"{filedir}/glove.6B.{emb_dim}d.txt"

#   embedding = open(filepath,"r").readlines()
#   embedding_dict = {}
#   for line in embedding:
#     line = line.strip()
#     w, emb = line.split(" ",1)
#     embedding_dict[w] = np.array(emb.split(" ")).astype(float)

#   embedding_list = []
#   for w in token2idx:
#     if w in embedding_dict:
#       embedding_list.append(torch.tensor(embedding_dict[w]))
#     else:
#       embedding_list.append(torch.normal(mean = 0, std = 1, size = (1,emb_dim)).squeeze(0))
#   glove_embedding = torch.stack(embedding_list)
#   return glove_embedding


# glove_embedding = load_glove_embedding("glove", emb_dim = 100)
# print(f"GLove Embedding shape : {glove_embedding.shape}")
# print(glove_embedding)

## Model

- we are not going to learn the embedding for the **<pad>** token. This is because we want to explitictly tell our model that padding tokens are irrelevant to determining the sentiment of a sentence. This means the embedding for the pad token will remain at what it is initialized to (we initialize it to all zeros later). We do this by passing the index of our pad token as the **padding_idx** argument to the nn.Embedding layer.

- use an LSTM instead of the standard RNN, we use nn.LSTM instead of nn.RNN. Also, note that the LSTM returns the output and a tuple of the final hidden state and the final cell state, whereas the standard RNN only returned the output and final hidden state.

- In birectional LSTM , As the final hidden state of our LSTM has both a forward and a backward component, which will be concatenated together, the size of the input to the nn.Linear layer is twice that of the hidden dimension size.

- Before we pass our embeddings to the RNN, we need to pack them, which we do with nn.utils.rnn.packed_padded_sequence. This will cause our RNN to only process the non-padded elements of our sequence. The RNN will then return packed_output (a packed sequence) as well as the hidden and cell states (both of which are tensors). Without packed padded sequences, hidden and cell are tensors from the last element in the sequence, which will most probably be a pad token, however when using packed padded sequences they are both from the last non-padded element in the sequence. Note that the lengths argument of packed_padded_sequence must be a CPU tensor so we explicitly make it one by using .to('cpu').


- We then unpack the output sequence, with nn.utils.rnn.pad_packed_sequence, to transform it from a packed sequence to a tensor. The elements of output from padding tokens will be zero tensors (tensors where every element is zero). Usually, we only have to unpack output if we are going to use it later on in the model. Although we aren't in this case, we still unpack the sequence just to show how it is done.

- The final hidden state, hidden, has a shape of **[batch size, num layers * num directions,hid dim]** . These are ordered: [forward_layer_0, backward_layer_0, forward_layer_1, backward_layer 1, ..., forward_layer_n, backward_layer n]. As we want the final (top) layer forward and backward hidden states, we get the top two hidden layers from the first dimension, **hidden[-2,:,:] and hidden[-1,:,:]**, and concatenate them together before passing them to the linear layer

In [41]:
## Model
class LSTMModel(pl.LightningModule):

  def __init__(self, num_embeddings, embedding_dim, hidden_dim, learning_rate, num_layers, bidirectional):
    super().__init__()
    self.learning_rate = learning_rate
    self.bidirectional = bidirectional

    ## define loss & metrics
    self.loss_fn = nn.BCELoss()
    self.train_accuracy = Accuracy(task = "binary", num_classes = 2, threshold= 0.5)
    self.val_accuracy = Accuracy(task = "binary", num_classes = 2, threshold= 0.5)

    ## Define Model
    self.embed_layer = nn.Embedding(num_embeddings= num_embeddings, embedding_dim=embedding_dim, padding_idx= vocab['__PAD__'])
    self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first = True, num_layers = num_layers, bidirectional=bidirectional)
    self.relu = nn.ReLU()
    self.linear = nn.Linear(in_features= hidden_dim * 2 if bidirectional else hidden_dim, out_features= 1)
    self.sigmoid = nn.Sigmoid()


  def forward(self,features,lengths, verbose = False):

    # feature : [batch size, sent len]

    embedded = self.embed_layer(features)

    ## embedded : [batch size, sent len, embedding dim]

    #pack sequence
    # lengths need to be on CPU!
    packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, lengths.to('cpu'), batch_first = True, enforce_sorted = False)
    packed_output, (hidden, cell) = self.lstm(packed_embedded)

    #unpack sequence
    output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output, batch_first = True)

    #output = [batch size, sent len, hidden dim * num directions]
    #output over padding tokens are zero tensors

    #hidden = [batch_size, num layers * num directions, hidden dim]
    #cell = [batch size, num layers * num directions, hidden dim]

    if self.bidirectional:
       ## concatnate last hidden layer of forward & backward
      hidden_squeezed = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1)
    else:
      hidden_squeezed = hidden[-1,:,:].squeeze(0)

    out = self.relu(hidden_squeezed)
    out_linear = self.linear(out)
    out_sigmoid = self.sigmoid(out_linear)
    final_output = torch.squeeze(out_sigmoid, dim = 1)

    if verbose:
      print(f"features : {features.shape}")
      print(f"embedded : {embedded.shape}")
      print(f"hidden : {hidden.shape}")
      print(f"cell : {cell.shape}")
      print(f"output : {output.shape}")
      print(f"hidden_squeezed : {hidden_squeezed.shape}")
      print(f"linear : {out_linear.shape}")
      print(f"final_output : {final_output.shape}")

    return final_output

  def _shared_step(self, batch):
    features, labels, lengths = batch['features'],batch['labels'], batch['lengths']
    logits = self(features, lengths)
    loss = self.loss_fn(logits, labels)
    return logits, loss, labels

  def training_step(self, batch, batch_idx):
    logits, loss, label = self._shared_step(batch)
    self.train_accuracy(logits,label)
    self.log_dict({"train_loss": loss, "train_accuracy": self.train_accuracy}, on_step = False, on_epoch = True, prog_bar=True)
    return loss

  def validation_step(self, batch, batch_idx):
    logits, loss, label = self._shared_step(batch)
    self.val_accuracy(logits,label)
    self.log_dict({"val_loss": loss,  "val_accuracy": self.val_accuracy}, on_step = False, on_epoch = True, prog_bar = True)
    return loss

  def on_train_epoch_end(self):
    self.train_accuracy.reset()

  def on_validation_epoch_end(self):
    if self.current_epoch!=0:
      print(f"Epoch : {self.current_epoch} Validation Accuracy : {self.val_accuracy.compute()}")
    self.val_accuracy.reset()

  def configure_optimizers(self):
    optimizer = optim.Adam(self.parameters(), lr =self.learning_rate)
    return optimizer


In [42]:
## test model
model = LSTMModel(num_embeddings= len(vocab), embedding_dim=100, hidden_dim= 32, learning_rate=0.001, num_layers=2, bidirectional=True)
logits = model(feature,length, verbose = True)
print(f"Logits : {logits}")
print(f"Loss : {model.loss_fn(logits, label)}")

features : torch.Size([2, 5])
embedded : torch.Size([2, 5, 100])
hidden : torch.Size([4, 2, 32])
cell : torch.Size([4, 2, 32])
output : torch.Size([2, 5, 64])
hidden_squeezed : torch.Size([2, 64])
linear : torch.Size([2, 1])
final_output : torch.Size([2])
Logits : tensor([0.5302, 0.5183], grad_fn=<SqueezeBackward1>)
Loss : 0.6458166241645813


## Single Layer LSTM

In [43]:
## Build Trainer
model = LSTMModel(num_embeddings= len(vocab),
                  embedding_dim=100,
                  hidden_dim= 32,
                  learning_rate=0.001,
                  num_layers=1,
                  bidirectional=False)


## Copy the pretrained glove embedding to embedding weight
# model.embedding.weight.data.copy_(glove_embedding)

## logger
logger = pl.loggers.CSVLogger("logs", name="sentiment_analysis")

## checkpoints
checkpoint_callback  = pl.callbacks.ModelCheckpoint(
                                                filename='{epoch}-{val_loss:.2f}-{val_accuracy:.2f}',
                                                every_n_epochs = 2,
                                                save_top_k = -1,
                                                monitor='val_loss',
                                                )



trainer = pl.Trainer(accelerator="cpu",
                     max_epochs = 5,
                     check_val_every_n_epoch=1,
                     callbacks=[checkpoint_callback],
                     logger=logger
                    )

## Train the Model
trainer.fit(model, train_dl, val_dl)

INFO:pytorch_lightning.utilities.rank_zero:GPU available: False, used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.callbacks.model_summary:
  | Name           | Type           | Params
--------------------------------------------------
0 | loss_fn        | BCELoss        | 0     
1 | train_accuracy | BinaryAccuracy | 0     
2 | val_accuracy   | BinaryAccuracy | 0     
3 | embed_layer    | Embedding      | 909 K 
4 | lstm           | LSTM           | 17.2 K
5 | relu           | ReLU           | 0     
6 | linear         | Linear         | 33    
7 | sigmoid        | Sigmoid        | 0     
--------------------------------------------------
926 K     Trainable params
0         Non-trainable params
926 K     Total params
3.706     Total estimated model params size (

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Epoch : 1 Validation Accuracy : 0.9914957284927368


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch : 2 Validation Accuracy : 0.9929965138435364


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch : 3 Validation Accuracy : 0.9934967756271362


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.


Epoch : 4 Validation Accuracy : 0.9939969778060913


## Multi Layer LSTM

In [44]:
## Build Trainer
model = LSTMModel(num_embeddings= len(vocab),
                  embedding_dim=100,
                  hidden_dim= 32,
                  learning_rate=0.001,
                  num_layers=2,
                  bidirectional=False)


## Copy the pretrained glove embedding to embedding weight
# model.embedding.weight.data.copy_(glove_embedding)

## logger
logger = pl.loggers.CSVLogger("logs", name="sentiment_analysis")

## checkpoints
checkpoint_callback  = pl.callbacks.ModelCheckpoint(
                                                filename='{epoch}-{val_loss:.2f}-{val_accuracy:.2f}',
                                                every_n_epochs = 2,
                                                save_top_k = -1,
                                                monitor='val_loss',
                                                )



trainer = pl.Trainer(accelerator="cpu",
                     max_epochs = 5,
                     check_val_every_n_epoch=1,
                     callbacks=[checkpoint_callback],
                     logger=logger
                    )

## Train the Model
trainer.fit(model, train_dl, val_dl)

INFO:pytorch_lightning.utilities.rank_zero:GPU available: False, used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.callbacks.model_summary:
  | Name           | Type           | Params
--------------------------------------------------
0 | loss_fn        | BCELoss        | 0     
1 | train_accuracy | BinaryAccuracy | 0     
2 | val_accuracy   | BinaryAccuracy | 0     
3 | embed_layer    | Embedding      | 909 K 
4 | lstm           | LSTM           | 25.6 K
5 | relu           | ReLU           | 0     
6 | linear         | Linear         | 33    
7 | sigmoid        | Sigmoid        | 0     
--------------------------------------------------
934 K     Trainable params
0         Non-trainable params
934 K     Total params
3.739     Total estimated model params size (

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Epoch : 1 Validation Accuracy : 0.9919959902763367


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch : 2 Validation Accuracy : 0.9924962520599365


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch : 3 Validation Accuracy : 0.9929965138435364


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.


Epoch : 4 Validation Accuracy : 0.9929965138435364


## Single Layer Bidirectional LSTM

In [45]:
## Build Trainer
model = LSTMModel(num_embeddings= len(vocab),
                  embedding_dim=100,
                  hidden_dim= 32,
                  learning_rate=0.001,
                  num_layers=1,
                  bidirectional=True)


## Copy the pretrained glove embedding to embedding weight
# model.embedding.weight.data.copy_(glove_embedding)

## logger
logger = pl.loggers.CSVLogger("logs", name="sentiment_analysis")

## checkpoints
checkpoint_callback  = pl.callbacks.ModelCheckpoint(
                                                filename='{epoch}-{val_loss:.2f}-{val_accuracy:.2f}',
                                                every_n_epochs = 2,
                                                save_top_k = -1,
                                                monitor='val_loss',
                                                )



trainer = pl.Trainer(accelerator="cpu",
                     max_epochs = 5,
                     check_val_every_n_epoch=1,
                     callbacks=[checkpoint_callback],
                     logger=logger
                    )

## Train the Model
trainer.fit(model, train_dl, val_dl)

INFO:pytorch_lightning.utilities.rank_zero:GPU available: False, used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.callbacks.model_summary:
  | Name           | Type           | Params
--------------------------------------------------
0 | loss_fn        | BCELoss        | 0     
1 | train_accuracy | BinaryAccuracy | 0     
2 | val_accuracy   | BinaryAccuracy | 0     
3 | embed_layer    | Embedding      | 909 K 
4 | lstm           | LSTM           | 34.3 K
5 | relu           | ReLU           | 0     
6 | linear         | Linear         | 65    
7 | sigmoid        | Sigmoid        | 0     
--------------------------------------------------
943 K     Trainable params
0         Non-trainable params
943 K     Total params
3.774     Total estimated model params size (

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Epoch : 1 Validation Accuracy : 0.9904952645301819


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch : 2 Validation Accuracy : 0.989995002746582


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch : 3 Validation Accuracy : 0.9914957284927368


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.


Epoch : 4 Validation Accuracy : 0.9914957284927368


## Predict

In [65]:
model.eval()

LSTMModel(
  (loss_fn): BCELoss()
  (train_accuracy): BinaryAccuracy()
  (val_accuracy): BinaryAccuracy()
  (embed_layer): Embedding(9092, 100, padding_idx=0)
  (lstm): LSTM(100, 32, batch_first=True, bidirectional=True)
  (relu): ReLU()
  (linear): Linear(in_features=64, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)

In [72]:
tweet = "I love this movies"
procesed_tweet = process_tweet(tweet)
tensor_tweet = tweet_to_tensor(procesed_tweet)
tensor_tweet = tensor_tweet.view(1,-1)
# print(tensor_tweet.shape)
lengths = torch.tensor([tensor_tweet.shape[1]], dtype = torch.long)
preds = model(tensor_tweet, lengths)[0].item()
int(preds>0.5)

1

In [73]:
tweet = "I hate this movies"
procesed_tweet = process_tweet(tweet)
tensor_tweet = tweet_to_tensor(procesed_tweet)
tensor_tweet = tensor_tweet.view(1,-1)
# print(tensor_tweet.shape)
lengths = torch.tensor([tensor_tweet.shape[1]], dtype = torch.long)
preds = model(tensor_tweet, lengths)[0].item()
int(preds>0.5)

0