<a href="https://colab.research.google.com/github/rajy4683/EVAP2/blob/master/ENDS4_Upgraded_Sentiment_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!nvidia-smi

Thu Nov 19 16:19:57 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# 2 - Updated Sentiment Analysis

Sentiment analysis on IMDB datasets using LSTMs.

We will use:
- packed padded sequences
- pre-trained word embeddings
- different RNN architecture
- bidirectional RNN
- multi-layer RNN
- regularization
- a different optimizer

To make it more interesting, we will feed the sentences in reversed order to our model and see how it behaves. 
We will target to reach ~87% accuracy without using Bidirectional LSTMs. 

## Preparing Data

As before, we'll set the seed, define the `Fields` and get the train/valid/test splits.

We'll be using *packed padded sequences*, which will make our RNN only process the non-padded elements of our sequence, and for any padded element the `output` will be a zero tensor. To use packed padded sequences, we have to tell the RNN how long the actual sequences are. We do this by setting `include_lengths = True` for our `TEXT` field. This will cause `batch.text` to now be a tuple with the first element being our sentence (a numericalized tensor that has been padded) and the second element being the actual lengths of our sentences.

In [2]:
import torch
from torchtext import data
from torchtext import datasets

SEED = 1234

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

TEXT = data.Field(tokenize = 'spacy', include_lengths = True)
LABEL = data.LabelField(dtype = torch.float)

We then load the IMDb dataset.

In [3]:
from torchtext import datasets

train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

downloading aclImdb_v1.tar.gz


aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:08<00:00, 10.0MB/s]


## Reverse the datasets 


In [4]:
def sentence_reversal(input_dataset):
    for sentence_ in input_dataset:
        sentence_.text.reverse()


Then create the validation set from our training set.

In [5]:
import random

train_data, valid_data = train_data.split(random_state = random.seed(SEED))

In [26]:
sentence_reversal(train_data)
#sentence_reversal(test_data)

## Verify if the training dataset is indeed reversed

In [27]:
for idx,sentence_ in enumerate(train_data):
    if(idx > 50):
        break
    print(sentence_.text[-10:])

['where', 'I', 'actually', 'felt', 'deeply', 'embarrassed', 'for', 'everyone', 'involved', '.']
['convincing', 'as', 'a', 'someone', 'that', 'played', 'lots', 'of', 'soccer', '.']
['me', 'crying', 'every', 'time', '.', 'A', 'truly', 'sweet', 'romance', '.']
['-', 'its', '"', 'of', 'the', "'", '30s', '&', '40s', '.']
['fond', 'of', 'it', '!', '*', 'from', '*', '*', '*', '*']
['for', 'addressing', 'many', 'of', 'the', 'shortcomings', 'of', 'the', 'scripting', '.']
['on', 'this', 'silly', ',', 'and', 'already', 'irrelevant', ',', 'DVD', '.']
["'s", 'ego', 'is', 'the', 'largest', 'character', 'in', 'this', 'film', '.']
['that', 'movie', 'than', 'rekindling', 'interest', 'in', 'that', 'classic', 'series', '.']
['recommend', 'this', 'to', 'any', 'fan', 'of', 'romance', 'stories', '.', '7/10']
['musical', 'of', 'the', 'forties', 'on', 'one', 'tenth', 'the', 'budget', '.']
['violent', ':', 'in', 'a', 'tasteful', 'manner', ',', 'of', 'course', '.']
['buy', 'this', 'movie', 'you', "'ll", 'be', '

## Verify that test data has not been tampered.

In [25]:
for idx,sentence_ in enumerate(test_data):
    if(idx > 50):
        break
    print(sentence_.text[:10])

['Not', 'a', 'stunner', ',', 'but', 'a', 'good', 'movie', 'to', 'see']
['Hong', 'Kong', ',', 'the', '1920s', '.', 'A', 'young', 'man', 'from']
['Having', 'seen', 'this', 'without', 'knowing', 'all', 'the', 'hoopla', 'surrounding', 'the']
['I', "'m", 'not', 'a', 'big', 'fan', 'of', 'musicals', ',', 'but']
['When', 'I', 'was', 'younger', 'I', 'saw', 'the', 'end', 'of', 'HAIR']
['I', 'was', 'surprised', 'that', '"', 'Forgiving', 'the', 'Franklins', '"', 'did']
['The', 'first', 'time', 'I', 'saw', 'this', 'movie', ',', 'it', 'did']
['Heath', 'Ledgers', 'acting', 'in', 'this', 'film', 'really', 'bugs', 'me', ',']
['One', 'of', 'the', 'oddest', ',', 'most', 'strikingly', 'eerie', 'and', 'creepy']
['It', 'amazes', 'me', 'that', 'production', 'companies', 'will', 'sue', 'because', 'of']
['If', 'you', 'do', "n't", 'have', 'anything', 'better', 'to', 'do', ',']
['This', 'almost', 'documentary', 'look', 'at', 'an', 'enterprising', 'boy', 'who', 'lives']
['I', 'first', 'caught', 'up', 'with', 'Jen

In [9]:
MAX_VOCAB_SIZE = 25_000

TEXT.build_vocab(train_data, 
                 max_size = MAX_VOCAB_SIZE, 
                 vectors = "glove.6B.100d", 
                 unk_init = torch.Tensor.normal_)

LABEL.build_vocab(train_data)

.vector_cache/glove.6B.zip: 862MB [06:30, 2.21MB/s]                           
100%|█████████▉| 398740/400000 [00:17<00:00, 22924.96it/s]

In [10]:
BATCH_SIZE = 64

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data), 
    batch_size = BATCH_SIZE,
    sort_within_batch = True,
    device = device)

## Actual model

In [11]:
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, 
                 bidirectional, dropout, pad_idx):
        
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
        
        self.rnn = nn.LSTM(embedding_dim, 
                           hidden_dim, 
                           num_layers=n_layers, 
                           bidirectional=bidirectional, 
                           dropout=dropout)
        
        self.fc = nn.Linear(hidden_dim * 2, output_dim)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text, text_lengths):
        
        #text = [sent len, batch size]
        
        embedded = self.dropout(self.embedding(text))
        
        #embedded = [sent len, batch size, emb dim]
        
        #pack sequence
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths)
        
        packed_output, (hidden, cell) = self.rnn(packed_embedded)
        
        #unpack sequence
        output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)

        #output = [sent len, batch size, hid dim * num directions]
        #output over padding tokens are zero tensors
        
        #hidden = [num layers * num directions, batch size, hid dim]
        #cell = [num layers * num directions, batch size, hid dim]
        
        #concat the final forward (hidden[-2,:,:]) and backward (hidden[-1,:,:]) hidden layers
        #and apply dropout
        
        hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))
                
        #hidden = [batch size, hid dim * num directions]
            
        return self.fc(hidden)

In [12]:
import torch.nn as nn

class RNN_List(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, 
                 bidirectional, dropout, pad_idx):
        
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
        
        self.rnn = nn.ModuleList()
        self.n_layers = n_layers
        self.bidirectional = bidirectional
        for i in range(n_layers):
            self.rnn.append(nn.LSTM(embedding_dim, 
                           hidden_dim, 
                           num_layers=1, 
                           bidirectional=bidirectional, 
                           dropout=dropout))  
        

        
        self.fc = nn.Linear(hidden_dim * 2, output_dim)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text, text_lengths):
        
        #text = [sent len, batch size]
        
        embedded = self.dropout(self.embedding(text))
        
        #embedded = [sent len, batch size, emb dim]
        
        #pack sequence
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths)
        
        #packed_output, (hidden, cell) = self.rnn(packed_embedded)
        hidden_cell_state_tuple = tuple()
        for offset in range(self.n_layers):
            if offset == 0:
                packed_output, hidden_cell_state_tuple = self.rnn[offset](packed_embedded)
            else:    
                packed_output, hidden_cell_state_tuple = self.rnn[offset](packed_embedded, hidden_cell_state_tuple)
        
        #unpack sequence
        output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)

        #output = [sent len, batch size, hid dim * num directions]
        #output over padding tokens are zero tensors
        
        #hidden = [num layers * num directions, batch size, hid dim]
        #cell = [num layers * num directions, batch size, hid dim]
        
        #concat the final forward (hidden[-2,:,:]) and backward (hidden[-1,:,:]) hidden layers
        #and apply dropout
        hidden, cell = hidden_cell_state_tuple
        hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))
                
        #hidden = [batch size, hid dim * num directions]
            
        return self.fc(hidden)

Like before, we'll create an instance of our RNN class, with the new parameters and arguments for the number of layers, bidirectionality and dropout probability.

To ensure the pre-trained vectors can be loaded into the model, the `EMBEDDING_DIM` must be equal to that of the pre-trained GloVe vectors loaded earlier.

We get our pad token index from the vocabulary, getting the actual string representing the pad token from the field's `pad_token` attribute, which is `<pad>` by default.

In [13]:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 384
OUTPUT_DIM = 1
N_LAYERS = 3
BIDIRECTIONAL = True
DROPOUT = 0.2
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]

# model = RNN(INPUT_DIM, 
#             EMBEDDING_DIM, 
#             HIDDEN_DIM, 
#             OUTPUT_DIM, 
#             N_LAYERS, 
#             BIDIRECTIONAL, 
#             DROPOUT, 
#             PAD_IDX)

model = RNN_List(INPUT_DIM, 
            EMBEDDING_DIM, 
            HIDDEN_DIM, 
            OUTPUT_DIM, 
            N_LAYERS, 
            BIDIRECTIONAL, 
            DROPOUT, 
            PAD_IDX)
import torch.optim as optim

optimizer = optim.Adam(model.parameters())
criterion = nn.BCEWithLogitsLoss()

model = model.to(device)
criterion = criterion.to(device)

  "num_layers={}".format(dropout, num_layers))
100%|█████████▉| 398740/400000 [00:29<00:00, 22924.96it/s]

In [14]:
#model = model.to(device)
model

RNN_List(
  (embedding): Embedding(25002, 100, padding_idx=1)
  (rnn): ModuleList(
    (0): LSTM(100, 384, dropout=0.2, bidirectional=True)
    (1): LSTM(100, 384, dropout=0.2, bidirectional=True)
    (2): LSTM(100, 384, dropout=0.2, bidirectional=True)
  )
  (fc): Linear(in_features=768, out_features=1, bias=True)
  (dropout): Dropout(p=0.2, inplace=False)
)

We'll print out the number of parameters in our model. 

Notice how we have almost twice as many parameters as before!

In [15]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 6,979,945 trainable parameters


The final addition is copying the pre-trained word embeddings we loaded earlier into the `embedding` layer of our model.

We retrieve the embeddings from the field's vocab, and check they're the correct size, _**[vocab size, embedding dim]**_ 

In [16]:
pretrained_embeddings = TEXT.vocab.vectors

print(pretrained_embeddings.shape)

torch.Size([25002, 100])


We then replace the initial weights of the `embedding` layer with the pre-trained embeddings.

**Note**: this should always be done on the `weight.data` and not the `weight`!

In [17]:
model.embedding.weight.data.copy_(pretrained_embeddings)

tensor([[-0.1117, -0.4966,  0.1631,  ...,  1.2647, -0.2753, -0.1325],
        [-0.8555, -0.7208,  1.3755,  ...,  0.0825, -1.1314,  0.3997],
        [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
        ...,
        [-0.3970,  0.4024,  1.0612,  ..., -0.0136, -0.3363,  0.6442],
        [-0.5197,  1.0395,  0.2092,  ..., -0.8857, -0.2294,  0.1244],
        [ 0.0057, -0.0707, -0.0804,  ..., -0.3292, -0.0130,  0.0716]],
       device='cuda:0')

As our `<unk>` and `<pad>` token aren't in the pre-trained vocabulary they have been initialized using `unk_init` (an $\mathcal{N}(0,1)$ distribution) when building our vocab. It is preferable to initialize them both to all zeros to explicitly tell our model that, initially, they are irrelevant for determining sentiment. 

We do this by manually setting their row in the embedding weights matrix to zeros. We get their row by finding the index of the tokens, which we have already done for the padding index.

**Note**: like initializing the embeddings, this should be done on the `weight.data` and not the `weight`!

In [18]:
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]

model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)

print(model.embedding.weight.data)

tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
        ...,
        [-0.3970,  0.4024,  1.0612,  ..., -0.0136, -0.3363,  0.6442],
        [-0.5197,  1.0395,  0.2092,  ..., -0.8857, -0.2294,  0.1244],
        [ 0.0057, -0.0707, -0.0804,  ..., -0.3292, -0.0130,  0.0716]],
       device='cuda:0')


We can now see the first two rows of the embedding weights matrix have been set to zeros. As we passed the index of the pad token to the `padding_idx` of the embedding layer it will remain zeros throughout training, however the `<unk>` token embedding will be learned.

In [28]:
element_ = next(iter(train_iterator))

In [29]:
elem_array = [*element_.text]

In [None]:
elem_array[0].cpu().shape, elem_array[1].cpu().shape

(torch.Size([191, 64]), torch.Size([64]))

In [30]:
text_array = elem_array[0]
text_array.shape

torch.Size([381, 64])

In [None]:
elem_array[1]

tensor([191, 191, 191, 191, 191, 191, 191, 191, 191, 191, 191, 191, 191, 191,
        191, 191, 191, 191, 191, 191, 191, 191, 191, 191, 191, 190, 190, 190,
        190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190,
        190, 190, 190, 190, 190, 190, 190, 190, 190, 189, 189, 189, 189, 189,
        189, 189, 189, 189, 189, 189, 189, 189], device='cuda:0')

In [None]:
text_array[1].shape

torch.Size([64])

In [None]:
" ".join([TEXT.vocab.itos[text_array[offset][-1]] for offset in range(text_array.shape[0]) ])


In [None]:
with torch.no_grad():
    emb_out = model.embedding(text_array)

In [None]:
emb_pack = nn.utils.rnn.pack_padded_sequence(emb_out, elem_array[1].cpu())

In [None]:
emb_pack.data[1,:]

tensor([-1.4927,  0.3494, -0.3233, -1.8189, -0.9395,  2.8164,  0.2858,  0.8032,
         0.6189, -0.7230,  0.2782, -0.9035, -1.3616,  0.4640, -0.4662,  0.5132,
         1.5805,  2.1826, -0.4103, -0.2576,  0.1045, -0.8688,  1.4957, -0.2583,
        -0.0282, -0.0297,  0.0640, -0.7224, -0.5380,  0.5293, -0.6616,  0.8416,
        -0.3224,  0.4930, -0.1878, -0.0139, -0.8935,  0.9628,  1.9609, -0.1048,
         0.1071,  1.2595, -1.0393, -0.1827, -0.9304, -0.5972, -0.8050, -0.1499,
         0.5111, -0.2358, -0.3499,  0.2673,  1.1686, -0.6818,  0.4988,  0.3094,
        -0.1351,  2.1641,  1.4065,  0.4533, -0.6638, -1.6466, -0.6420, -0.3813,
         2.0745, -0.6641, -1.5548,  0.5919,  0.1381,  0.0372,  0.4015, -0.1743,
        -0.0430, -1.6807,  0.6356, -0.4548,  0.9208,  1.0485, -0.2114, -0.4623,
        -0.7928, -0.1096,  0.0108,  0.4000,  0.8154,  2.5640, -0.8559, -0.5328,
        -0.0546, -0.1156,  0.6490,  0.0595,  1.5805, -0.9836,  1.0787, -1.3883,
         0.2567,  0.8978,  0.0594, -1.34

In [None]:
input, batch_sizes, sorted_indices, unsorted_indices = emb_pack

In [None]:
model.rnn.permute_hidden(hidden_local, sorted_indices).shape

torch.Size([6, 64, 256])

In [None]:
%load_ext autoreload
#%autoreload 2
#import torch.nn as nn
packed_output_local2, (hidden_local2, cell_local2)  =model.rnn(emb_pack, (None, None))

In [None]:
hidden_local2.shape

torch.Size([6, 64, 256])

In [None]:
emb_pack.data.shape

torch.Size([13071, 100])

In [None]:
model.rnn.permute_hidden()

In [None]:
packed_output_local.data.shape

torch.Size([13071, 512])

In [None]:
hidden_local.shape

torch.Size([6, 64, 256])

In [None]:
hidden_local[-2,:,:].shape

torch.Size([64, 256])

In [None]:
#unpack sequence
output_local, output_lengths_local = nn.utils.rnn.pad_packed_sequence(packed_output_local)

#output = [sent len, batch size, hid dim * num directions]
#output over padding tokens are zero tensors

#hidden = [num layers * num directions, batch size, hid dim]
#cell = [num layers * num directions, batch size, hid dim]

#concat the final forward (hidden[-2,:,:]) and backward (hidden[-1,:,:]) hidden layers
#and apply dropout

#hidden = model.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))

In [None]:
output_local.shape

torch.Size([358, 64, 512])

## Train the Model

We implement the function to calculate accuracy...

In [19]:
def binary_accuracy(preds, y):
    """
    Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
    """

    #round predictions to the closest integer
    rounded_preds = torch.round(torch.sigmoid(preds))
    correct = (rounded_preds == y).float() #convert into float for division 
    acc = correct.sum() / len(correct)
    return acc

We define a function for training our model. 

As we have set `include_lengths = True`, our `batch.text` is now a tuple with the first element being the numericalized tensor and the second element being the actual lengths of each sequence. We separate these into their own variables, `text` and `text_lengths`, before passing them to the model.

**Note**: as we are now using dropout, we must remember to use `model.train()` to ensure the dropout is "turned on" while training.

In [20]:
def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()
        
        text, text_lengths = batch.text
        
        predictions = model(text, text_lengths.cpu()).squeeze(1)
        
        loss = criterion(predictions, batch.label)
        
        acc = binary_accuracy(predictions, batch.label)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

Then we define a function for testing our model, again remembering to separate `batch.text`.

**Note**: as we are now using dropout, we must remember to use `model.eval()` to ensure the dropout is "turned off" while evaluating.

In [21]:
def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:

            text, text_lengths = batch.text
            
            predictions = model(text, text_lengths.cpu()).squeeze(1)
            
            loss = criterion(predictions, batch.label)
            
            acc = binary_accuracy(predictions, batch.label)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

And also create a nice function to tell us how long our epochs are taking.

In [22]:
import time

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

## Epochs with training data reversed

In [23]:
N_EPOCHS = 20

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'tut2-model.pt')
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

Epoch: 01 | Epoch Time: 0m 59s
	Train Loss: 0.643 | Train Acc: 62.81%
	 Val. Loss: 0.697 |  Val. Acc: 53.79%
Epoch: 02 | Epoch Time: 0m 59s
	Train Loss: 0.516 | Train Acc: 74.66%
	 Val. Loss: 0.505 |  Val. Acc: 75.36%
Epoch: 03 | Epoch Time: 0m 59s
	Train Loss: 0.415 | Train Acc: 81.17%
	 Val. Loss: 0.466 |  Val. Acc: 78.56%
Epoch: 04 | Epoch Time: 0m 59s
	Train Loss: 0.275 | Train Acc: 88.74%
	 Val. Loss: 0.345 |  Val. Acc: 85.91%
Epoch: 05 | Epoch Time: 0m 59s
	Train Loss: 0.196 | Train Acc: 92.32%
	 Val. Loss: 0.359 |  Val. Acc: 84.33%
Epoch: 06 | Epoch Time: 0m 59s
	Train Loss: 0.146 | Train Acc: 94.53%
	 Val. Loss: 0.334 |  Val. Acc: 86.94%
Epoch: 07 | Epoch Time: 0m 59s
	Train Loss: 0.109 | Train Acc: 96.18%
	 Val. Loss: 0.362 |  Val. Acc: 87.38%
Epoch: 08 | Epoch Time: 0m 59s
	Train Loss: 0.080 | Train Acc: 97.16%
	 Val. Loss: 0.359 |  Val. Acc: 86.77%
Epoch: 09 | Epoch Time: 0m 59s
	Train Loss: 0.056 | Train Acc: 98.12%
	 Val. Loss: 0.395 |  Val. Acc: 86.94%
Epoch: 10 | Epoch T

### Top accuracy reached = 87.43%

## Epochs with training data reset to regular

In [32]:
N_EPOCHS = 20

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'tut2-model.pt')
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

Epoch: 01 | Epoch Time: 0m 59s
	Train Loss: 0.090 | Train Acc: 97.07%
	 Val. Loss: 0.374 |  Val. Acc: 88.38%
Epoch: 02 | Epoch Time: 0m 59s
	Train Loss: 0.032 | Train Acc: 98.93%
	 Val. Loss: 0.499 |  Val. Acc: 88.33%
Epoch: 03 | Epoch Time: 0m 59s
	Train Loss: 0.016 | Train Acc: 99.47%
	 Val. Loss: 0.522 |  Val. Acc: 88.26%
Epoch: 04 | Epoch Time: 0m 59s
	Train Loss: 0.014 | Train Acc: 99.56%
	 Val. Loss: 0.547 |  Val. Acc: 88.79%
Epoch: 05 | Epoch Time: 0m 59s
	Train Loss: 0.011 | Train Acc: 99.65%
	 Val. Loss: 0.617 |  Val. Acc: 88.21%
Epoch: 06 | Epoch Time: 0m 59s
	Train Loss: 0.006 | Train Acc: 99.83%
	 Val. Loss: 0.687 |  Val. Acc: 88.05%
Epoch: 07 | Epoch Time: 0m 59s
	Train Loss: 0.005 | Train Acc: 99.86%
	 Val. Loss: 0.728 |  Val. Acc: 88.25%
Epoch: 08 | Epoch Time: 0m 59s
	Train Loss: 0.006 | Train Acc: 99.79%
	 Val. Loss: 0.801 |  Val. Acc: 88.28%
Epoch: 09 | Epoch Time: 0m 59s
	Train Loss: 0.007 | Train Acc: 99.77%
	 Val. Loss: 0.730 |  Val. Acc: 88.42%
Epoch: 10 | Epoch T

## Max accuracy = 88.79%

## User Input

We can now use our model to predict the sentiment of any sentence we give it. As it has been trained on movie reviews, the sentences provided should also be movie reviews.

When using a model for inference it should always be in evaluation mode. If this tutorial is followed step-by-step then it should already be in evaluation mode (from doing `evaluate` on the test set), however we explicitly set it to avoid any risk.

Our `predict_sentiment` function does a few things:
- sets the model to evaluation mode
- tokenizes the sentence, i.e. splits it from a raw string into a list of tokens
- indexes the tokens by converting them into their integer representation from our vocabulary
- gets the length of our sequence
- converts the indexes, which are a Python list into a PyTorch tensor
- add a batch dimension by `unsqueeze`ing 
- converts the length into a tensor
- squashes the output prediction from a real number between 0 and 1 with the `sigmoid` function
- converts the tensor holding a single value into an integer with the `item()` method

We are expecting reviews with a negative sentiment to return a value close to 0 and positive reviews to return a value close to 1.

In [33]:
import spacy
nlp = spacy.load('en')

def predict_sentiment(model, sentence):
    model.eval()
    tokenized = [tok.text for tok in nlp.tokenizer(sentence)]
    indexed = [TEXT.vocab.stoi[t] for t in tokenized]
    length = [len(indexed)]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(1)
    length_tensor = torch.LongTensor(length)
    prediction = torch.sigmoid(model(tensor, length_tensor))
    return prediction.item()

An example negative review...

In [34]:
predict_sentiment(model, "This film is terrible")

0.002504574367776513

An example positive review...

In [37]:
predict_sentiment(model, "Can a romantic movie be any worse than this!")

0.001538451761007309

In [38]:
predict_sentiment(model, "The movie was so bad that you would want to watch it over and over again to understand how can some one make one")

0.2725890576839447

In [39]:
predict_sentiment(model, "If you haven't watched this movie yet you are making a terrible mistake")

0.0005374264437705278