# Implementing an RNN for Sentiment Classification

> Many-to-one recurrent neural networks can be used for sentiment classification. They take in an input sequence, then output a probability distribution over possible classes.

![](./images/RNN%20Text%20Classifier.png)


You'll need some libraries before we start implementing this architecture.

In [8]:
!pip install datasets
!pip install transformers
!pip install torch



We can find a classic sentiment classification dataset [here](https://huggingface.co/datasets/carblacac/twitter-sentiment-analysis) on HuggingFace and start using it as shown in the snippet on that page, copied in below:

In [9]:
from datasets import load_dataset

dataset = load_dataset("carblacac/twitter-sentiment-analysis", "None")

Using custom data configuration None
Found cached dataset twitter-sentiment-analysis (/Users/ice/.cache/huggingface/datasets/carblacac___twitter-sentiment-analysis/None/1.0.0/cd65e23e456de6a4f7264e305380b0ffe804d6f5bfd361c0ec0f68d8d1fab95b)
100%|██████████| 3/3 [00:00<00:00, 68.40it/s]


Let's take a look at the dataset.

In [10]:
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['text', 'feeling'],
        num_rows: 119988
    })
    validation: Dataset({
        features: ['text', 'feeling'],
        num_rows: 29997
    })
    test: Dataset({
        features: ['text', 'feeling'],
        num_rows: 61998
    })
})


It is a dictionary-looking object, with three keys, each mapping to the train, validation, or test split of the dataset. We'll work with the training set split until later.

In [11]:
train_set = dataset["train"]
print(train_set)

Dataset({
    features: ['text', 'feeling'],
    num_rows: 119988
})


Let's print a single example.

In [12]:
for example in train_set:
  print(example)
  break


{'text': '@fa6ami86 so happy that salman won.  btw the 14sec clip is truely a teaser', 'feeling': 0}


You can see that it's a dictionary with two keys `text` and `feeling`.

As described in the [documentation](https://huggingface.co/datasets/carblacac/twitter-sentiment-analysis), the `feeling` is a binary value, zero or one. A zero indicates that the text of the tweet is negative. A one indicates that the text of the tweet is positive.

In [13]:
idx_to_sentiment = {
    0: "negative",
    1: "positive"
}

example_tweet = example["text"]
example_sentiment = example["feeling"]

print(f" When the dataset was being created, someone manually labelled '{example_tweet}' as {idx_to_sentiment[example_sentiment]}")

 When the dataset was being created, someone manually labelled '@fa6ami86 so happy that salman won.  btw the 14sec clip is truely a teaser' as negative


Now we have our raw text and integer sentiment labels, we need to tokenise the text. We can do this using a pre-trained tokeniser.

In [14]:
# TOKENISE
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

sentence = "This is so exciting!"

print(tokenizer.encode(sentence))


Downloading: 100%|██████████| 466k/466k [00:00<00:00, 1.28MB/s]


[101, 2023, 2003, 2061, 10990, 999, 102]


## Text Preprocessing Pipeline


In [18]:
import torch

def preprocess_text(text):
    # TODO remove twitter handles
    text = tokenizer.encode(text) # TODO tokenise
    text = torch.tensor(text) # TODO cast to torch tensor
    return text


example_text = example["text"]
print(example_text)
processed_text = preprocess_text(example_text)
print(processed_text)

@fa6ami86 so happy that salman won.  btw the 14sec clip is truely a teaser
tensor([  101,  1030,  6904,  2575, 10631, 20842,  2061,  3407,  2008, 28542,
         2180,  1012, 18411,  2860,  1996,  2403,  3366,  2278, 12528,  2003,
         2995,  2135,  1037, 27071,   102])


Now our data is ready, let's build the RNN classification model.

Before diving in, it's important to understand that PyTorch's RNN layer works a little differently to other layers:
- It outputs two tensors in a tuple `(out, hidden)`
    - `out` contains: 
        - the hidden state of the last RNN layer, for every timestep
    - `hidden` contains:
        - the hidden state of the all RNN layers, for the last timestep

Why does the RNN layer have that output?
- Depending on the problem you're tackling, you may need different things
- Problems where RNNs output a sequence, need the hidden state for every timestep
- In some problems, where you want to use the internal state to represent something, like an embedding of some sequence data, use the hidden states as that representation

> RNNs break one of my favourite PyTorch rules: "the first dimension is the batch dimension". Instead, the first dimension is by default the time (sequence position) dimension

![](./images/PyTorch%20RNN%20Outputs.png)

Taking a look at the input and output parameter sizes for each layer in the [docs](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html) will be usful here.

In [65]:
class RNN(torch.nn.Module):
  def __init__(self, vocab_size, hidden_size, num_layers=2):
    """
    Parameters
    ----------
    vocab_size : int
      The number of different words that the RNN needs to be able to embed

    hidden_size : int
      The size of the internal vector representation in each layer of the RNN
    """
    super().__init__()
    self.embedding = torch.nn.Embedding(vocab_size, hidden_size)
    self.rnn = torch.nn.RNN(hidden_size, hidden_size, num_layers=num_layers, bidirectional=False)
    self.linear = torch.nn.Linear(hidden_size, 1)

  def forward(self, X):
    """
    Parameters
    ----------
    X : torch.tensor Size (T, B, D)
      A single example's tensor of input sequence tokens
    """
    embedding = self.embedding(X) # TODO get embedding of input
    
    outputs, final_hidden = self.rnn(embedding)

    final_output = outputs[-1]
    logit = self.linear(final_output)
    # print(logit)
    probability = torch.sigmoid(logit)
    return probability


vocab_size = len(tokenizer.get_vocab())

test_example = preprocess_text(example["text"])

rnn = RNN(vocab_size, hidden_size=128)
prediction = rnn(test_example)

print(prediction)

tensor([0.4992], grad_fn=<SigmoidBackward0>)


Now it's time to implement the training loop

In [66]:
import torch
import torch.nn.functional as F
from torch.utils.tensorboard import SummaryWriter

def train(model, dataset, epochs=10):

  # INITIALISE TRACKING
  writer = SummaryWriter()
  batch_idx = 0

  optimiser = torch.optim.SGD(model.parameters(), lr=0.01) # Define optimiser and set learning rate
  
  for epoch in range(epochs):# a number of times
    for example in dataset: # iterate throguh the dataset
      
      # UNPACK EXAMPLE
      tweet = example["text"]
      sentiment = example["feeling"]
      
      # PREPROCESS TEXT
      token_idxs = preprocess_text(tweet)
      
      # torch tensor and resize
      sentiment = torch.tensor(sentiment)
      
      # MAKE PREDICTION
      token_idxs = token_idxs.unsqueeze(1) # (T, B, D)
      prediction = model(token_idxs) # make prediction
      
      # CALCULATE LOSS
      loss = F.binary_cross_entropy(prediction.squeeze(), sentiment.float()) # calculate loss
      
      # CALCULATE GRADIENTS
      loss.backward()
      
      # MOVE PARAMETERS
      optimiser.step()

      # ZERO GRAD
      optimiser.zero_grad()

      # TRACK PROGRESS
      writer.add_scalar("Train/Loss", loss.item(), batch_idx)
      batch_idx += 1

rnn = RNN(vocab_size, hidden_size=128, num_layers=2)
train(rnn, train_set)

KeyboardInterrupt: 

It runs! But the training curves don't look great. 

There are a few key things we can do to improve quickly:
- Batch examples
    - Currently each parameter update is based on what's best for just a single example, which almost certainly isn't going to be representative of the update that would do best for all of the examples in general!
- Use pre-trained word embeddings
- Explore other hyperparameters including:
    - Increase the hidden size
    - Tune the optimiser and its learning rate
    - Tune the embedding size


The details of these things are rather intricate, and the utility of them has been reduced by models such as the transformer, which you should check out next if you're interested in improving the performance here.

Nevertheless, it's essential to understand how RNNs can be used for classification to build upon this knowledge.