<a href="https://colab.research.google.com/github/ferngonzalezp/deep_learning_lab/blob/main/session3/DL_Lab_Session3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Session 3: Introduction to sequence modeling

In this session we are going to deal with sequences, see how we can model different types of problems as sequences and how to deal with them from a machine learning perspective. 

## Sequencces in real life

<img src="https://fr.mathworks.com/help/examples/matlab/win64/GetDataFromAudioRecorderObjectExample_01.png" height=300>

<img src="https://www.quickanddirtytips.com/sites/default/files/images/490/sentence-length.jpg" height=300>

<img src="https://www.sigmaaldrich.com/content/dam/sigma-aldrich/articles/biology/marketing-assets/sanger-sequencing_dna-structure.png" height=300>

<img src="https://d2gk6qz8djobw9.cloudfront.net/slider/15949868021163832.jpg" height=300>

## Application of sequences

Depending on the type of sequence problem we are treating, there are many ways in how we can model it in terms of inputs and outputs. as shown in the image below:

<img src="https://drive.google.com/uc?export=view&id=1iyG0kjLo2Nbj6Y7zHTTLX2CMxwqu1Vd0" height=400>

We can use neural networks also for this types of problems, but until now the tasks that we were doing doesn't have time dependence between them. Another point is that using standard neural networks for time series is that they may be too big for covering many time steps or we may fail in capturing time dependencies that are too far away from each other. There is one neural network model that broke through those barrries in sequence modeling and started to advance the state of the art in fields like Natural Language processing and that is Recurrent Neural Networks.

<img src="https://www.drive.google.com/uc?export=view&id=10adlM-TdQNRfl4Tjx_6Wxtrcaqt1_Tox" height=400>

## Recurrent Neural Networks




Recurrent nerual networks are a family of neural network architectures that are used for sequential data and use the notion of recurrence. They can have many forms in terms of applications but the basic notion is that the parameters of the model are shared through the entire time history and it takes as input the state of the previous time-step.

<img src="https://www.drive.google.com/uc?export=view&id=1ZU-vB8owBqqhRGo2PBnv5AWn4v3Cram8" height=200>
<img src="https://www.drive.google.com/uc?export=view&id=1ELNhN9RTrIHAM22rzBJA8MlnV09CGd3D" height=200>

LSTMs
_______________
Common Problem with RNNs: Vanishing and exploding gradients tend to appear when a gradient of an output is small or large, so that error is going to propagate through the whole sequence making the learning diverge or stop updating the weights. This vanishing/exploding gradient problem is usally present when dealing with problems with long range dependencies.

One way to deal with these problems and address long-range dependencies is by adding gates to the RNN cell. One of the most effectives architectures for this are Long-Short Term Memory networks or LSTMs. 

<img src="https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png" width=1200>

\begin{equation}
\begin{array}{l}
i_{t}=\sigma\left(W_{i i} x_{t}+b_{i i}+W_{h i} h_{t-1}+b_{h i}\right) \\
f_{t}=\sigma\left(W_{i f} x_{t}+b_{i f}+W_{h f} h_{t-1}+b_{h f}\right) \\
g_{t}=\tanh \left(W_{i g} x_{t}+b_{i g}+W_{h g} h_{t-1}+b_{h g}\right) \\
o_{t}=\sigma\left(W_{i o} x_{t}+b_{i o}+W_{h o} h_{t-1}+b_{h o}\right) \\
c_{t}=f_{t} \odot c_{t-1}+i_{t} \odot g_{t} \\
h_{t}=o_{t} \odot \tanh \left(c_{t}\right)
\end{array}
\end{equation}

The LSTM has 4 gates that control the flow of information:

*   The forget gate $f_t$ that takes the previous hidden state and the current input to determine how much informaion of the hidden state to forget.
*   The input gate $i_t$ that controls how much of the new input to retain.
*   The external input gate $g_t$ that controls which parts of the hidden step to update.
*   The output gate $o_t$ that controls which information to pass to the next time step.

The LSTM introduces a cell state, that serves as a memory for the cell of the network.

#Generaring Names with a Character RNN

Tutorial from pytorch [docs](https://pytorch.org/tutorials/intermediate/char_rnn_generation_tutorial.html).

<img src="https://i.imgur.com/jzVrf7f.png">

In [None]:
!gdown --id 1ccN7lWQTyrH27NsB__7zNqqSVVFzOCe4
!unzip ./data.zip

In [None]:
from __future__ import unicode_literals, print_function, division
from io import open
import glob
import os
import unicodedata
import string

all_letters = string.ascii_letters + " .,;'-"
n_letters = len(all_letters) + 1 # Plus EOS marker

def findFiles(path): return glob.glob(path)

# Turn a Unicode string to plain ASCII, thanks to https://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters
    )

# Read a file and split into lines
def readLines(filename):
    lines = open(filename, encoding='utf-8').read().strip().split('\n')
    return [unicodeToAscii(line) for line in lines]

# Build the category_lines dictionary, a list of lines per category
category_lines = {}
all_categories = []
for filename in findFiles('data/names/*.txt'):
    category = os.path.splitext(os.path.basename(filename))[0]
    all_categories.append(category)
    lines = readLines(filename)
    category_lines[category] = lines

n_categories = len(all_categories)

if n_categories == 0:
    raise RuntimeError('Data not found. Make sure that you downloaded data '
        'from https://download.pytorch.org/tutorial/data.zip and extract it to '
        'the current directory.')

print('# categories:', n_categories, all_categories)
print(unicodeToAscii("O'Néàl"))

In [None]:
import torch
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size

        self.i2h = nn.Linear(n_categories + input_size + hidden_size, hidden_size)
        self.i2o = nn.Linear(n_categories + input_size + hidden_size, output_size)
        self.o2o = nn.Linear(hidden_size + output_size, output_size)
        self.dropout = nn.Dropout(0.1)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, category, input, hidden):
        input_combined = torch.cat((category, input, hidden), 1)
        hidden = self.i2h(input_combined)
        output = self.i2o(input_combined)
        output_combined = torch.cat((hidden, output), 1)
        output = self.o2o(output_combined)
        output = self.dropout(output)
        output = self.softmax(output)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, self.hidden_size)

In [None]:
import random

# Random item from a list
def randomChoice(l):
    return l[random.randint(0, len(l) - 1)]

# Get a random category and random line from that category
def randomTrainingPair():
    category = randomChoice(all_categories)
    line = randomChoice(category_lines[category])
    return category, line

In [None]:
# One-hot vector for category
def categoryTensor(category):
    li = all_categories.index(category)
    tensor = torch.zeros(1, n_categories)
    tensor[0][li] = 1
    return tensor

# One-hot matrix of first to last letters (not including EOS) for input
def inputTensor(line):
    tensor = torch.zeros(len(line), 1, n_letters)
    for li in range(len(line)):
        letter = line[li]
        tensor[li][0][all_letters.find(letter)] = 1
    return tensor

# LongTensor of second letter to end (EOS) for target
def targetTensor(line):
    letter_indexes = [all_letters.find(line[li]) for li in range(1, len(line))]
    letter_indexes.append(n_letters - 1) # EOS
    return torch.LongTensor(letter_indexes)

In [None]:
# Make category, input, and target tensors from a random category, line pair
def randomTrainingExample():
    category, line = randomTrainingPair()
    category_tensor = categoryTensor(category)
    input_line_tensor = inputTensor(line)
    target_line_tensor = targetTensor(line)
    return category_tensor, input_line_tensor, target_line_tensor

In [None]:
criterion = nn.NLLLoss()

learning_rate = 0.0005

def train(category_tensor, input_line_tensor, target_line_tensor):
    target_line_tensor.unsqueeze_(-1)
    hidden = rnn.initHidden()

    rnn.zero_grad()

    loss = 0

    for i in range(input_line_tensor.size(0)):
        output, hidden = rnn(category_tensor, input_line_tensor[i], hidden)
        l = criterion(output, target_line_tensor[i])
        loss += l

    loss.backward()

    for p in rnn.parameters():
        p.data.add_(p.grad.data, alpha=-learning_rate)

    return output, loss.item() / input_line_tensor.size(0)

In [None]:
import time
import math

def timeSince(since):
    now = time.time()
    s = now - since
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)

In [None]:
rnn = RNN(n_letters, 128, n_letters)

n_iters = 100000
print_every = 5000
plot_every = 500
all_losses = []
total_loss = 0 # Reset every plot_every iters

start = time.time()

for iter in range(1, n_iters + 1):
    output, loss = train(*randomTrainingExample())
    total_loss += loss

    if iter % print_every == 0:
        print('%s (%d %d%%) %.4f' % (timeSince(start), iter, iter / n_iters * 100, loss))

    if iter % plot_every == 0:
        all_losses.append(total_loss / plot_every)
        total_loss = 0

In [None]:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

plt.figure()
plt.plot(all_losses)

In [None]:
max_length = 20

# Sample from a category and starting letter
def sample(category, start_letter='A'):
    with torch.no_grad():  # no need to track history in sampling
        category_tensor = categoryTensor(category)
        input = inputTensor(start_letter)
        hidden = rnn.initHidden()

        output_name = start_letter

        for i in range(max_length):
            output, hidden = rnn(category_tensor, input[0], hidden)
            topv, topi = output.topk(1)
            topi = topi[0][0]
            if topi == n_letters - 1:
                break
            else:
                letter = all_letters[topi]
                output_name += letter
            input = inputTensor(letter)

        return output_name

# Get multiple samples from one category and multiple starting letters
def samples(category, start_letters='ABC'):
    for start_letter in start_letters:
        print(sample(category, start_letter))

samples('Russian', 'RUS')

samples('German', 'GER')

samples('Spanish', 'SPA')

samples('Chinese', 'CHI')

# Sentiment Analysis using a LSTM

Network Architecture
_______________________

<img src="https://raw.githubusercontent.com/bentrevett/pytorch-sentiment-analysis/2b666b3cba7d629a2f192c7d9c66fadcc9f0c363/assets/sentiment3.png" width=400>

Bidirectional LSTM

<img src="https://raw.githubusercontent.com/bentrevett/pytorch-sentiment-analysis/2b666b3cba7d629a2f192c7d9c66fadcc9f0c363/assets/sentiment4.png" width=400>

Multi-Layer LSTM

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchtext import datasets
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
from torchtext.data.utils import get_tokenizer
from collections import Counter
from torchtext.vocab import Vocab
from torch.utils.data.dataset import random_split
from torch.nn.utils.rnn import pad_sequence

In [None]:
train_iter, test_iter = datasets.IMDB()
max_vocab_size = 25000
tokenizer = get_tokenizer('spacy')
counter = Counter()
for (label, line) in train_iter:
    counter.update(tokenizer(line))
vocab = Vocab(counter, min_freq=1, max_size=max_vocab_size, vectors="glove.6B.100d",
              unk_init =torch.Tensor.normal_)

text_pipeline = lambda x: [vocab[token] for token in tokenizer(x)]
def label_pipeline(x):
  if x=='pos':
    return 1
  else:
    return 0

def collate_batch(batch):
    label_list, text_list, text_lengths = [], [], []
    for (_label, _text) in batch:
         label_list.append(label_pipeline(_label))
         processed_text = torch.tensor(text_pipeline(_text), dtype=torch.int64)
         text_list.append((processed_text))
         text_lengths.append(len(processed_text))
    label_list = torch.tensor(label_list, dtype=torch.int64)
    text_lengths = torch.tensor(text_lengths, dtype=torch.int64)
    text_list = pad_sequence(text_list, batch_first=True)
    return label_list.to(device), text_list.to(device), text_lengths.to(device)

In [None]:
class RNN(nn.Module):
  def __init__(self,vocab_size,embed_dim,hidden_dim,num_layers, pad_idx):
    super(RNN,self).__init__()
    self.num_layers = num_layers
    self.hidden_dim = hidden_dim
    self.embed = nn.Embedding(vocab_size,embed_dim, padding_idx = pad_idx)

    self.rnn = nn.LSTM(input_size=embed_dim, hidden_size = hidden_dim,
                        num_layers = num_layers,bidirectional=True, batch_first=True)
    
    self.fc = nn.Sequential(nn.Dropout(0.3),nn.Linear(2*hidden_dim,1),nn.Sigmoid())

  def forward(self,x,h0, text_lengths):
    x = self.embed(x)
    x = nn.utils.rnn.pack_padded_sequence(x, text_lengths.to('cpu'), batch_first=True, enforce_sorted=False)
    x, (hidden, cell) = self.rnn(x,h0)
    x, output_lengths = nn.utils.rnn.pad_packed_sequence(x, batch_first=True)
    hidden = self.fc(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))
    return hidden
  
  def init_hidden(self, batch_size):
        ''' Initializes hidden state '''
        h0 = torch.zeros((self.num_layers*2,batch_size,self.hidden_dim)).to(device)
        c0 = torch.zeros((self.num_layers*2,batch_size,self.hidden_dim)).to(device)
        hidden = (h0,c0)
        return hidden

In [None]:
def binary_accuracy(preds, y):
    """
    Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
    """

    #round predictions to the closest integer
    rounded_preds = torch.round((preds))
    correct = (rounded_preds == y).float() #convert into float for division 
    acc = correct.sum() / len(correct)
    return acc

In [None]:
def train(epochs,model, train_dataloader, val_dataloader=None):
  model = model
  learn_rate = 1e-3
  n_epochs = epochs
  #DataLoaders
  train_loader = train_dataloader
  test_loader = test_dataloader

  #Loss function and Optimizer
  criterion = nn.BCELoss()
  optimizer = torch.optim.Adam(model.parameters(),lr=learn_rate)
  
  #Training Loop
  train_loss = []
  val_loss = []
  metric = []
  for epoch in range(n_epochs):  # loop over the dataset multiple times
      epoch_loss = 0.0
      running_loss = 0.0
      for i, (label, text, text_lengths) in enumerate(train_dataloader):
          # zero the parameter gradients
          h = model.init_hidden(label.shape[0])
          optimizer.zero_grad()
          # forward + backward + optimize
          predicted_label = model(text,h,text_lengths)
          loss = criterion(predicted_label.squeeze(), label.float())
          loss.backward()
          optimizer.step()
          epoch_loss += loss.item()
          # print statistics
          running_loss += loss.item()

          if i % 50 == 49:    # print every 50 mini-batches
              print('[%d, %5d] loss: %.3f' %
                    (epoch + 1, i + 1, running_loss / 50))
              running_loss = 0.0
      train_loss.append(epoch_loss/(i+1))
      #Evaluation of the trained model
      epoch_loss = 0.0
      acc = 0.0
      if val_dataloader:
        print("validating...")
        with torch.no_grad():
            for i, (label, text, text_lengths) in enumerate(val_dataloader):
                val_h = model.init_hidden(label.shape[0])
                predicted_label = model(text, val_h,text_lengths)
                loss = criterion(predicted_label.squeeze(), label.float())
                acc += binary_accuracy(predicted_label.squeeze(), label.float()).item()
                epoch_loss += loss.item()
        acc = acc/(i+1)
        print('Accuracy of the network on the test reviews: %d %%' % (acc*100))
        metric.append(acc)
        val_loss.append(epoch_loss/(i+1))
  return train_loss, val_loss, metric

In [None]:
vocab_size = len(vocab)
embed_dim = 100
hidden_dim = 256
num_layers = 2
batch_size = 64
pad_idx = vocab.stoi['pad']
#initialize RNN
model = RNN(vocab_size,embed_dim,hidden_dim,num_layers,pad_idx).to(device)
print(model)
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

In [None]:
pretrained_embeddings = vocab.vectors

print(pretrained_embeddings.shape)

In [None]:
model.embed.weight.data.copy_(pretrained_embeddings)

In [None]:
UNK_IDX = vocab.stoi['unk']

model.embed.weight.data[UNK_IDX] = torch.zeros(embed_dim)
model.embed.weight.data[pad_idx] = torch.zeros(embed_dim)

print(model.embed.weight.data)

In [None]:
#Dataloaders
train_iter, test_iter = datasets.IMDB()
train_dataset = list(train_iter)
test_dataset = list(test_iter)
num_train = int(len(train_dataset) * 0.95)
split_train_, split_valid_ = \
    random_split(train_dataset, [num_train, len(train_dataset) - num_train])

train_dataloader = DataLoader(split_train_, batch_size=batch_size, collate_fn=collate_batch,
                              shuffle=True)
valid_dataloader = DataLoader(split_valid_, batch_size=batch_size, collate_fn=collate_batch,
                              shuffle=False)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size, collate_fn=collate_batch,
                             shuffle=True)

In [None]:
train_loss, val_loss, metric = train(epochs=5, model=model, train_dataloader=train_dataloader, valid_dataloader=valid_dataloader)

In [None]:
import spacy
nlp = spacy.load('en_core_web_sm')

def predict_sentiment(model, sentence):
    model.eval()
    tokenized = [tok.text for tok in nlp.tokenizer(sentence)]
    indexed = [vocab.stoi[t] for t in tokenized]
    length = [len(indexed)]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(0)
    length_tensor = torch.LongTensor(length)
    prediction = model(tensor,model.init_hidden(1), length_tensor)
    if torch.round(prediction) == 0:
      print('negative review')
    else:
      print('positive review')
    return prediction.item()

In [None]:
predict_sentiment(model, "Should be fired and sued for destroying justice league movie.")

In [None]:
predict_sentiment(model, "Great movie, would recommend.")

#Exercises



## 1. Create a Name Generator using a LSTM

## 2. Create a Multi-Class Text Classifier

For this task we will use the Yahoo Answers dataset, it has questions and 10 categories:

<img src="https://www.drive.google.com/uc?view=export&id=1Af1S0L4207Yfw9QPXqhcUYCJO5RXei8I">

For this problem we will use a similar LSTM model as before but with some differences:

* We will use the CrossEntropyLoss.
* The output dimension of the linear layers will be 10.
* We need to remove the squeeze() in the prediced labels when calculating the loss.
* Remove the conversion t float() of the labels.
* Remove the activation of the final linear layer, CrossEntropyLoss does logSoftmax and Negative log-likelihood combined.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchtext import datasets
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
from torchtext.data.utils import get_tokenizer
from collections import Counter
from torchtext.vocab import Vocab
from torch.utils.data.dataset import random_split
from torch.nn.utils.rnn import pad_sequence

In [None]:
train_iter, test_iter = datasets.YahooAnswers()

In [None]:
max_vocab_size = 25000
tokenizer = get_tokenizer('spacy')
counter = Counter()
for (label, line) in train_iter:
    counter.update(tokenizer(line))
vocab = Vocab(counter, min_freq=1, max_size=max_vocab_size, vectors="glove.6B.100d",
              unk_init =torch.Tensor.normal_)

In [None]:
next(test_iter)

In [None]:
text_pipeline = lambda x: [vocab[token] for token in tokenizer(x)]
label_pipeline = lambda x: int(x) - 1

def collate_batch(batch):
    label_list, text_list, text_lengths = [], [], []
    for (_label, _text) in batch:
         label_list.append(label_pipeline(_label))
         processed_text = torch.tensor(text_pipeline(_text), dtype=torch.int64)
         text_list.append((processed_text))
         text_lengths.append(len(processed_text))
    label_list = torch.tensor(label_list, dtype=torch.int64)
    text_lengths = torch.tensor(text_lengths, dtype=torch.int64)
    text_list = pad_sequence(text_list, batch_first=True)
    return label_list.to(device), text_list.to(device), text_lengths.to(device)

In [None]:
[vocab[token] for token in ['here', 'is', 'an', 'example']]

In [None]:
# Build your LSTM model

"To DO"

In [None]:
def categorical_accuracy(preds, y):
    """
    Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
    """
    top_pred = preds.argmax(1, keepdim = True)
    correct = top_pred.eq(y.view_as(top_pred)).sum()
    acc = correct.float() / y.shape[0]
    return acc

In [None]:
def train(epochs,model, train_dataloader, val_dataloader=None):
  model = model
  learn_rate = 1e-3
  n_epochs = epochs
  #DataLoaders
  train_loader = train_dataloader
  test_loader = test_dataloader

  #Loss function and Optimizer
  criterion = #Define loss Function
  optimizer = torch.optim.Adam(model.parameters(),lr=learn_rate)
  
  #Training Loop
  train_loss = []
  val_loss = []
  metric = []
  for epoch in range(n_epochs):  # loop over the dataset multiple times
      epoch_loss = 0.0
      running_loss = 0.0
      for i, (label, text, text_lengths) in enumerate(train_dataloader):
          # zero the parameter gradients
          h = model.init_hidden(label.shape[0])
          optimizer.zero_grad()
          # forward + backward + optimize
          predicted_label = model(text,h,text_lengths)
          loss = criterion(predicted_label, label)
          loss.backward()
          optimizer.step()
          epoch_loss += loss.item()
          # print statistics
          running_loss += loss.item()

          if i % 50 == 49:    # print every 50 mini-batches
              print('[%d, %5d] loss: %.3f' %
                    (epoch + 1, i + 1, running_loss / 50))
              running_loss = 0.0
      train_loss.append(epoch_loss/(i+1))
      #Evaluation of the trained model
      epoch_loss = 0.0
      acc = 0.0
      if val_dataloader:
        print("validating...")
        with torch.no_grad():
            for i, (label, text, text_lengths) in enumerate(val_dataloader):
                val_h = model.init_hidden(label.shape[0])
                predicted_label = model(text, val_h,text_lengths)
                loss = criterion(predicted_label, label)
                acc += categorical_accuracy(predicted_label, label)
                epoch_loss += loss.item()
        acc = acc.item()/(i+1)
        print('Accuracy of the network on the test reviews: %d %%' % (acc*100))
        metric.append(acc)
        val_loss.append(epoch_loss/(i+1))
  return train_loss, val_loss, metric

In [None]:
vocab_size = len(vocab)
embed_dim = 100
hidden_dim = 256
num_layers = 2
batch_size = 64
pad_idx = vocab.stoi['pad']
#initialize RNN
model = RNN(vocab_size,embed_dim,hidden_dim,num_layers,pad_idx).to(device)
print(model)
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

In [None]:
pretrained_embeddings = vocab.vectors

print(pretrained_embeddings.shape)

In [None]:
model.embed.weight.data.copy_(pretrained_embeddings)

In [None]:
UNK_IDX = vocab.stoi['unk']

model.embed.weight.data[UNK_IDX] = torch.zeros(embed_dim)
model.embed.weight.data[pad_idx] = torch.zeros(embed_dim)

print(model.embed.weight.data)

In [None]:
#Dataloaders
train_iter, test_iter = datasets.YahooAnswers()
train_dataset = list(train_iter)
test_dataset = list(test_iter)
num_train = int(len(train_dataset) * 0.95)
split_train_, split_valid_ = \
    random_split(train_dataset, [num_train, len(train_dataset) - num_train])

train_dataloader = DataLoader(split_train_, batch_size=batch_size, collate_fn=collate_batch,
                              shuffle=True)
valid_dataloader = DataLoader(split_valid_, batch_size=batch_size, collate_fn=collate_batch,
                              shuffle=False)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size, collate_fn=collate_batch,
                             shuffle=True)

In [None]:
train_loss, val_loss, metric = train(epochs=1, model=model, train_dataloader=train_dataloader, val_dataloader=valid_dataloader)

In [None]:
label, text = test_dataset[np.random.randint(len(test_dataset))]

In [None]:
import spacy
nlp = spacy.load('en_core_web_sm')

def predict_sentiment(model, sentence):
    model.eval()
    tokenized = [tok.text for tok in nlp.tokenizer(sentence)]
    indexed = [vocab.stoi[t] for t in tokenized]
    length = [len(indexed)]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(0)
    length_tensor = torch.LongTensor(length)
    prediction = model(tensor,model.init_hidden(1), length_tensor)
    return prediction.argmax(1).item() + 1

In [None]:
print(text)
print(predict_sentiment(model,text))
print(label)