# Homework 3 Problem 1

In this homework, you'll learn how to model the sentences with recurrent neural networks(RNNs). We'll provide you with basic skeleton codes for preprocessing sequences and performing sentimental analysis with RNNs. However, provided codes can be improved with some simple modifications. The purpose of this homework is to implement several advanced techniques for improving the performance of vanilla RNNs.

First, we'll import required libraries.

In [None]:
!pip install torchtext
!pip install spacy
!python -m spacy download en
import random
import time 

import torch
import torch.nn as nn
import torch.optim as optim
from torchtext import data
from torchtext import datasets

## Preprocessing

For your convenience, we will provide you with the basic preprocessing steps for handling IMDB movie dataset. For more information, see https://pytorch.org/text/

In [None]:
TEXT = data.Field(tokenize='spacy', include_lengths=True)
LABEL = data.LabelField(dtype=torch.float)
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

In [None]:
train_data, valid_data = train_data.split(random_state=random.seed(1234))

print('Number of training examples: {:d}'.format(len(train_data)))
print('NUmber of validation examples: {:d}'.format(len(valid_data)))
print('Number of testing examples: {:d}'.format(len(test_data)))

In [None]:
TEXT.build_vocab(train_data,
                 max_size=25000)
LABEL.build_vocab(train_data)
# Tokens include <unk> and <pad>
print('Unique tokens in text vocabulary: {:d}'.format(len(TEXT.vocab)))
# Label is either positive or negative
print('Unique tokens in label vocabulary: {:d}'.format(len(LABEL.vocab)))

In [None]:
device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')
batch_size = 64
train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data),
    batch_size=batch_size,
    sort_within_batch=False,
    device=device)

In [None]:
# Note that the sequence is padded with <PAD>(=1) tokens after the sequence ends.
for batch in train_iterator:
    text, text_length = batch.text
    break

print(text[:, -1])
print(text[-10:, -1])
print(text_length[-1])

In [None]:
# We will re-load dataset since we already loaded one batch in above cell.
device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')
batch_size = 64
train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data),
    batch_size=batch_size,
    sort_within_batch=True,
    device=device)

# Problems

We will provide you with skeleton codes for training RNNs below. Run this code and you'll notice that the training / validation performance is not better than random guessing (50\~60%).
In this homework, you'll have to improve the performance of this network above 80% with several techniques commonly used in RNNs. **Please provide your answer in your report and attach notebook file which contains source code for below techniques.**

(a) (3pt) Look at the shape of tensor `hidden` and `embedded`. Have you noticed what is the problem? Explain what is the issue and report the test performance when you fix the issue. (Hint: This is related to the length of sequences. See how sequence is padded. You may use `nn.utils.rnn.pack_padded_sequence`.)

(b) (3pt) Use different architectures, such as LSTM or GRU, and report the test performance. "Do not" change hyperparameters from (a), such as batch_size, hidden_dim,...

Now, try to use below techniques to further improve the performance of provided source codes. Compare the test performance of each component with/without it.

(c) (1pt) For now, the number of layers in RNN is 1. Try to stack more layers, up to 3.

(d) (1pt) Use bidirectional RNNs.

(e) (1pt) Use dropout for regularization with stacked layers (recommended: 3 layers and dropout rate 0.5).

(f) (1pt) Finally, apply all techniques and have an enough time to play with introduced techniques (e.g., changing hyperparameters, train more epochs, try other techniques you know, ...). Report the final test performance with your implementation and hyperparameter choice. Please note that this is not a competition assignment. We will not evaluate your assignment strictly!

In [None]:
class SimpleRNN(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, pad_idx):
        super(SimpleRNN, self).__init__()
        self.embedding = nn.Embedding(input_dim, embedding_dim, padding_idx=pad_idx)
        self.rnn = nn.RNN(embedding_dim,
                           hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, text, text_lengths):
        embedded = self.embedding(text)
        output, hidden = self.rnn(embedded)
        hidden = hidden[-1]
        return self.fc(hidden.squeeze(0))

In [None]:
def binary_accuracy(preds, y):
    rounded_preds = torch.round(torch.sigmoid(preds))
    correct = (rounded_preds == y).float()
    acc = correct.sum() / len(correct)
    return acc

In [None]:
input_dim = len(TEXT.vocab)
embedding_dim = 100 
hidden_dim = 128
output_dim = 1
num_epochs = 10
val_iter = 1
pad_idx = TEXT.vocab.stoi[TEXT.pad_token]

model = SimpleRNN(input_dim, embedding_dim, hidden_dim, output_dim, pad_idx)

In [None]:
optimizer = optim.Adam(model.parameters())

criterion = nn.BCEWithLogitsLoss().to(device)
model = model.to(device)
model.train()

best_valid_loss = float('inf')
for epoch in range(num_epochs):
    running_loss = 0
    running_acc = 0

    start_time = time.time()
    
    for batch in train_iterator:
        text, text_lengths = batch.text
        predictions = model(text, text_lengths).squeeze(-1)
        loss = criterion(predictions, batch.label)
        acc = binary_accuracy(predictions, batch.label)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        running_acc += acc.item()
        
    running_loss /= len(train_iterator)
    running_acc /= len(train_iterator)
    
    if epoch % val_iter == 0:
        model.eval()
        valid_loss = 0
        valid_acc = 0
        
        with torch.no_grad():
            for batch in valid_iterator:
                text, text_lengths = batch.text
                eval_predictions = model(text, text_lengths).squeeze(1)
                valid_loss += criterion(eval_predictions, batch.label).item()
                valid_acc += binary_accuracy(eval_predictions, batch.label).item()
                
        model.train()
        valid_loss /= len(valid_iterator)
        valid_acc /= len(valid_iterator)
        
        if valid_loss < best_valid_loss:
            best_valid_loss = valid_loss
            torch.save(model.state_dict(), './simplernn.pth')  
        
    training_time = time.time() - start_time
    print('#####################################')
    print('Epoch {:d} | Training Time {:.1f}s'.format(epoch+1, training_time))
    print('Train Loss: {:.4f}, Train Acc: {:.2f}%'.format(running_loss, running_acc*100))
    if epoch % val_iter == 0:
        print('Valid Loss: {:.4f}, Valid Acc: {:.2f}%'.format(valid_loss, valid_acc*100))


In [None]:
## THIS IS THE TEST PERFORMANCE YOU SHOULD REPORT ##

model.load_state_dict(torch.load('./simplernn.pth'))
model.eval()
test_loss, test_acc = 0, 0
with torch.no_grad():
    for batch in test_iterator:
        text, text_lengths = batch.text
        test_preds = model(text, text_lengths).squeeze(1)
        test_loss += criterion(test_preds, batch.label).item()
        test_acc += binary_accuracy(test_preds, batch.label).item()
    test_loss /= len(test_iterator)
    test_acc /= len(test_iterator)
print('Test Loss: {:.4f}, Test Acc: {:.2f}%'.format(test_loss, test_acc*100))