<a href="https://colab.research.google.com/github/surabhitri/Sequence-Models/blob/main/LabRNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Implement and train a LSTM for sentiment analysis

(General Hint on Lab 1/2: Trust whatever you see from the training and report it on PDF. IDMB is far from ideal as it's more like a real-world dataset)

## Step 0: set up the environment

In [None]:
import functools
import sys
import numpy as np
import pandas as pd
import random
import re
import matplotlib.pyplot as plt
import tqdm
import nltk
from sklearn.model_selection import train_test_split
from nltk.corpus import stopwords
from collections import Counter
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

nltk.download('stopwords')

torch.backends.cudnn.benchmark = True

import os
os.makedirs("resources", exist_ok=True)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


### Hyperparameters. Do not directly touch this to mess up settings.

If you want to initalize new hyperparameter sets, use "new_hparams = HyperParams()" and change corresponding fields.

In [None]:
class HyperParams:
    def __init__(self):
        # Constance hyperparameters. They have been tested and don't need to be tuned.
        self.PAD_INDEX = 0
        self.UNK_INDEX = 1
        self.PAD_TOKEN = '<pad>'
        self.UNK_TOKEN = '<unk>'
        self.STOP_WORDS = set(stopwords.words('english'))
        self.MAX_LENGTH = 256
        self.BATCH_SIZE = 96
        self.EMBEDDING_DIM = 1
        self.HIDDEN_DIM = 100
        self.OUTPUT_DIM = 2
        self.N_LAYERS = 1
        self.DROPOUT_RATE = 0.0
        self.LR = 0.01
        self.N_EPOCHS = 5
        self.WD = 0
        self.OPTIM = "sgd"
        self.BIDIRECTIONAL = False
        self.SEED = 2

## Lab 1(a) Implement your own data loader function.  
First, you need to read the data from the dataset file on the local disk. 
Then, split the dataset into three sets: train, validation and test by 7:1:2 ratio.
Finally return x_train, x_valid, x_test, y_train, y_valid, y_test where x represents reviews and y represent labels.  

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
def load_imdb(base_csv:str = './IMDBDataset.csv'):
    """
    Load the IMDB dataset
    :param base_csv: the path of the dataset file.
    :return: train, validation and test set.
    """
    # Add your code here. 
    data = pd.read_csv('/content/drive/MyDrive/Deep Learning/Assignment 3/IMDBDataset.csv')
    X = data["review"]
    y = data["sentiment"]

    X_train_test, X_val, y_train_test, y_val = train_test_split(X, y, test_size=0.1, random_state=1)

    X_train, X_test, y_train, y_test = train_test_split(X_train_test, y_train_test, test_size=0.22222, random_state=1)
    
    print(f'shape of train data is {X_train.shape}')
    print(f'shape of test data is {X_test.shape}')
    print(f'shape of valid data is {X_val.shape}')
    return X_train, X_val, X_test, y_train, y_val, y_test

In [None]:
X_train, X_val, X_test, y_train, y_val, y_test = load_imdb()

shape of train data is (35000,)
shape of test data is (10000,)
shape of valid data is (5000,)


## Lab 1(b): Implement your function to build a vocabulary based on the training corpus.
Implement the build_vocab function to build a vocabulary based on the training corpus.
You should first compute the frequency of all the words in the training corpus. Remove the words
that are in the STOP_WORDS. Then filter the words by their frequency (≥ min_freq) and finally
generate a corpus variable that contains a list of words.

In [None]:
#x_train = list(X_train)
def build_vocab(x_train:list, min_freq: int=5, hparams=HyperParams()) -> dict:
    """
    build a vocabulary based on the training corpus.
    :param x_train:  List. The training corpus. Each sample in the list is a string of text.
    :param min_freq: Int. The frequency threshold for selecting words.
    :return: dictionary {word:index}
    """
    corpus = []
    # Add your code here. Your code should assign corpus with a list of words.
    for i in range(len(x_train)):
      for word in x_train[i].split():
        corpus.append(word)

    [x.lower() for x in corpus]

    corpus_wo_stop = []
    # Removing stop words
    for i in range(len(corpus)):
      if corpus[i] not in hparams.STOP_WORDS:
        corpus_wo_stop.append(corpus[i])

    # Creating dictionary of words in corpus with their frequency
    freq_corpus = {}
    for item in corpus_wo_stop:
        if (item in freq_corpus):
            freq_corpus[item] += 1
        else:
            freq_corpus[item] = 1

    # sorting on the basis of most common words
    # corpus_ = sorted(corpus, key=corpus.get, reverse=True)[:1000]
    corpus_ = [word for word, freq in freq_corpus.items() if freq >= min_freq]
    #print(corpus_)
    # creating a dict
    vocab = {w:i+2 for i, w in enumerate(corpus_)}
    vocab[hparams.PAD_TOKEN] = hparams.PAD_INDEX
    vocab[hparams.UNK_TOKEN] = hparams.UNK_INDEX
    return vocab

In [None]:
vocab = build_vocab(list(X_train))

## Lab 1(c): Implement your tokenize function. 
For each word, find its index in the vocabulary. 
Return a list of int that represents the indices of words in the example. 

In [None]:
def tokenize(vocab: dict, example: str)-> list:
    """
    Tokenize the give example string into a list of token indices.
    :param vocab: dict, the vocabulary.
    :param example: a string of text.
    :return: a list of token indices.
    """
    # Your code here.
    exp = re.findall(r'\w+', example)
    exp = [x for x in exp if not any(y.isdigit() for y in x)]
    exp= [i.lower() for i in exp if i!= "br"]
    token_indices = []
    for i in exp:
      if i in vocab.keys():
        token_indices.append(vocab[i])
      else:
        token_indices.append(1)

    return token_indices

In [None]:
token_indices = tokenize(vocab, X_train[1])

## Lab 1 (d): Implement the __getitem__ function. Given an index i, you should return the i-th review and label. 
The review is originally a string. Please tokenize it into a sequence of token indices. 
Use the max_length parameter to truncate the sequence so that it contains at most max_length tokens. 
Convert the label string ('positive'/'negative') to a binary index. 'positive' is 1 and 'negative' is 0. 
Return a dictionary containing three keys: 'ids', 'length', 'label' which represent the list of token ids, the length of the sequence, the binary label. 

In [None]:
from pygments import token
class IMDB(Dataset):
    def __init__(self, x, y, vocab, max_length=256) -> None:
        """
        :param x: list of reviews
        :param y: list of labels
        :param vocab: vocabulary dictionary {word:index}.
        :param max_length: the maximum sequence length.
        """
        self.x = x
        self.y = y
        self.vocab = vocab
        self.max_length = max_length

    def __getitem__(self, idx: int):
        """
        Return the tokenized review and label by the given index.
        :param idx: index of the sample.
        :return: a dictionary containing three keys: 'ids', 'length', 'label' which represent the list of token ids, the length of the sequence, the binary label.
        """
        # Add your code here.
        # converting label to binary string
        if self.y[idx] == 'positive':
          self.y[idx] = 1
        elif self.y[idx] == 'negative':
          self.y[idx] = 0

        # tokenize the review into token indices
        token_indices = tokenize(self.vocab, self.x[idx])
        n = len(token_indices)

        # truncating the sequence to contain at most max_lenth params
        if n > self.max_length:
          for i in range(0, n - self.max_length):
            token_indices.pop()
        
        # creating the dict
        dict = {'ids': token_indices, 'length': len(token_indices), 'label': self.y[idx]}
        #print(dict['label'])
        return dict
        
    def __len__(self) -> int:
        return len(self.x)

def collate(batch, pad_index):
    batch_ids = [torch.LongTensor(i['ids']) for i in batch]
    batch_ids = nn.utils.rnn.pad_sequence(batch_ids, padding_value=pad_index, batch_first=True)
    batch_length = torch.Tensor([i['length'] for i in batch])
    batch_label = torch.LongTensor([i['label'] for i in batch])
    batch = {'ids': batch_ids, 'length': batch_length, 'label': batch_label}
    return batch

collate_fn = collate

In [None]:

myclass = IMDB(list(X_train[20:30]), list(y_train[20:30]), vocab)
myclass.__getitem__(2)

## Lab 1 (e): Implement the LSTM model for sentiment analysis.
Q(a): Implement the initialization function.
Your task is to create the model by stacking several necessary layers including an embedding layer, a lstm cell, a linear layer, and a dropout layer.
You can call functions from Pytorch's nn library. For example, nn.Embedding, nn.LSTM, nn.Linear.<br>
Q(b): Implement the forward function.
    Decide where to apply dropout. 
    The sequences in the batch have different lengths. Write/call a function to pad the sequences into the same length. 
    Apply a fully-connected (fc) layer to the output of the LSTM layer. 
    Return the output features which is of size [batch size, output dim]. 

In [None]:
from torch.autograd import Variable
def init_weights(m):
    if isinstance(m, nn.Embedding):
        nn.init.xavier_normal_(m.weight)
    elif isinstance(m, nn.Linear):
        nn.init.xavier_normal_(m.weight)
        nn.init.zeros_(m.bias)
    elif isinstance(m, nn.LSTM) or isinstance(m, nn.GRU):
        for name, param in m.named_parameters():
            if 'bias' in name:
                nn.init.zeros_(param)
            elif 'weight' in name:
                nn.init.orthogonal_(param)
                
class LSTM(nn.Module):
    def __init__(
        self, 
        vocab_size: int, 
        embedding_dim: int, 
        hidden_dim: int, 
        output_dim: int, 
        n_layers: int, 
        dropout_rate: float, 
        pad_index: int,
        bidirectional: bool,
        **kwargs):
        """
        Create a LSTM model for classification.
        :param vocab_size: size of the vocabulary
        :param embedding_dim: dimension of embeddings
        :param hidden_dim: dimension of hidden features
        :param output_dim: dimension of the output layer which equals to the number of labels.
        :param n_layers: number of layers.
        :param dropout_rate: dropout rate.
        :param pad_index: index of the padding token.we
        """
        super().__init__()
        # Add your code here. Initializing each layer by the given arguments.
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.output_dim = output_dim
        self.n_layers = n_layers
        self.dropout_rate = dropout_rate
        self.pad_index = pad_index
        self.bidirectional = bidirectional
        # emedding layer
        self.embedding = nn.Embedding(num_embeddings = vocab_size, embedding_dim = embedding_dim, padding_idx = pad_index)
        # lstm layer
        self.lstm = nn.LSTM(input_size = embedding_dim, hidden_size = hidden_dim, num_layers = n_layers, bidirectional = bidirectional, batch_first=True)
        # dropout layer
        #self.dropout = nn.Dropout(dropout_rate)
        # linear layer
        self.fc = nn.Linear(hidden_dim, output_dim)

        # Weight initialization. DO NOT CHANGE!
        if "weight_init_fn" not in kwargs:
            self.apply(init_weights)
        else:
            self.apply(kwargs["weight_init_fn"])


    def forward(self, ids:torch.Tensor, length:torch.Tensor):
        """
        Feed the given token ids to the model.
        :param ids: [batch size, seq len] batch of token ids.
        :param length: [batch size] batch of length of the token ids.
        :return: prediction of size [batch size, output dim].
        """
        # Add your code here.
        a = self.embedding(ids)
        a = torch.nn.utils.rnn.pack_padded_sequence(a, batch_first=True, lengths=length, enforce_sorted=False)
        # Propagate input through LSTM
        output, (hidden, cell) = self.lstm(a.float())
        # without drop out layer
        prediction = self.fc(hidden[-1])
        # with drop out layer
        #prediction = self.fc(self.dropout(hidden[-1]))
        return prediction

In [None]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)


def train(dataloader, model, criterion, optimizer, scheduler, device):
    model.train()
    epoch_losses = []
    epoch_accs = []

    for batch in tqdm.tqdm(dataloader, desc='training...', file=sys.stdout):
        ids = batch['ids'].to(device)
        length = batch['length']
        label = batch['label'].to(device)
        prediction = model(ids, length)
        loss = criterion(prediction, label)
        accuracy = get_accuracy(prediction, label)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        epoch_losses.append(loss.item())
        epoch_accs.append(accuracy.item())
        scheduler.step()

    return epoch_losses, epoch_accs

def evaluate(dataloader, model, criterion, device):
    model.eval()
    epoch_losses = []
    epoch_accs = []

    with torch.no_grad():
        for batch in tqdm.tqdm(dataloader, desc='evaluating...', file=sys.stdout):
            ids = batch['ids'].to(device)
            length = batch['length']
            label = batch['label'].to(device)
            prediction = model(ids, length)
            loss = criterion(prediction, label)
            accuracy = get_accuracy(prediction, label)
            epoch_losses.append(loss.item())
            epoch_accs.append(accuracy.item())

    return epoch_losses, epoch_accs

def get_accuracy(prediction, label):
    batch_size, _ = prediction.shape
    predicted_classes = prediction.argmax(dim=-1)
    correct_predictions = predicted_classes.eq(label).sum()
    accuracy = correct_predictions / batch_size
    return accuracy

def predict_sentiment(text, model, vocab, device):
    tokens = tokenize(vocab, text)
    ids = [vocab[t] if t in vocab else UNK_INDEX for t in tokens]
    length = torch.LongTensor([len(ids)])
    tensor = torch.LongTensor(ids).unsqueeze(dim=0).to(device)
    prediction = model(tensor, length).squeeze(dim=0)
    probability = torch.softmax(prediction, dim=-1)
    predicted_class = prediction.argmax(dim=-1).item()
    predicted_probability = probability[predicted_class].item()
    return predicted_class, predicted_probability

### Lab 1 (g) Implement GRU.

In [None]:
class GRU(nn.Module):
    def __init__(
        self, 
        vocab_size: int, 
        embedding_dim: int, 
        hidden_dim: int, 
        output_dim: int, 
        n_layers: int, 
        dropout_rate: float, 
        pad_index: int,
        bidirectional: bool,
        **kwargs):
        """
        Create a LSTM model for classification.
        :param vocab_size: size of the vocabulary
        :param embedding_dim: dimension of embeddings
        :param hidden_dim: dimension of hidden features
        :param output_dim: dimension of the output layer which equals to the number of labels.
        :param n_layers: number of layers.
        :param dropout_rate: dropout rate.
        :param pad_index: index of the padding token.we
        """
        super().__init__()
        # Add your code here. Initializing each layer by the given arguments.
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.output_dim = output_dim
        self.n_layers = n_layers
        self.dropout_rate = dropout_rate
        self.pad_index = pad_index
        self.bidirectional = bidirectional
        # emedding layer
        self.embedding = nn.Embedding(num_embeddings = vocab_size, embedding_dim = embedding_dim, padding_idx = pad_index)
        # lstm layer
        self.gru = nn.GRU(input_size = embedding_dim, hidden_size = hidden_dim, num_layers = n_layers, bidirectional = bidirectional, batch_first=True)
        # dropout layer
        self.dropout = nn.Dropout(dropout_rate)
        # linear layer
        self.fc = nn.Linear(hidden_dim, output_dim)

        # Weight Initialization. DO NOT CHANGE!
        if "weight_init_fn" not in kwargs:
            self.apply(init_weights)
        else:
            self.apply(kwargs["weight_init_fn"])


    def forward(self, ids:torch.Tensor, length:torch.Tensor):
        """
        Feed the given token ids to the model.
        :param ids: [batch size, seq len] batch of token ids.
        :param length: [batch size] batch of length of the token ids.
        :return: prediction of size [batch size, output dim].
        """
        # Add your code here.
        a = self.embedding(ids)
        a = torch.nn.utils.rnn.pack_padded_sequence(a, batch_first=True, lengths=length, enforce_sorted=False)
        output, (hidden) = self.gru(a.float())
        prediction = self.fc(hidden[-1])
        return prediction

### Learning rate warmup. DO NOT TOUCH!

In [None]:
class ConstantWithWarmup(torch.optim.lr_scheduler._LRScheduler):
    def __init__(
        self,
        optimizer,
        num_warmup_steps: int,
    ):
        self.num_warmup_steps = num_warmup_steps
        super().__init__(optimizer)

    def get_lr(self):
        if self._step_count <= self.num_warmup_steps:
            # warmup
            scale = 1.0 - (self.num_warmup_steps - self._step_count) / self.num_warmup_steps
            lr = [base_lr * scale for base_lr in self.base_lrs]
            self.last_lr = lr
        else:
            lr = self.base_lrs
        return lr

### Implement the training / validation iteration here.

In [None]:
def train_and_test_model_with_hparams(hparams, model_type="lstm", **kwargs):
    # Seeding. DO NOT TOUCH! DO NOT TOUCH hparams.SEED!
    # Set the random seeds.
    torch.manual_seed(hparams.SEED)
    random.seed(hparams.SEED)
    np.random.seed(hparams.SEED)

    x_train, x_valid, x_test, y_train, y_valid, y_test = load_imdb()
    vocab = build_vocab(list(x_train), hparams=hparams)
    vocab_size = len(vocab)
    print(f'Length of vocabulary is {vocab_size}')

    train_data = IMDB(list(x_train), list(y_train), vocab, hparams.MAX_LENGTH)
    #print(train_data.__getitem__(0))
    valid_data = IMDB(list(x_valid), list(y_valid), vocab, hparams.MAX_LENGTH)
    #print(valid_data.__getitem__(0))
    test_data = IMDB(list(x_test), list(y_test), vocab, hparams.MAX_LENGTH)

    collate = functools.partial(collate_fn, pad_index=hparams.PAD_INDEX)

    train_dataloader = torch.utils.data.DataLoader(
        train_data, batch_size=hparams.BATCH_SIZE, collate_fn=collate, shuffle=True)
    valid_dataloader = torch.utils.data.DataLoader(
        valid_data, batch_size=hparams.BATCH_SIZE, collate_fn=collate)
    test_dataloader = torch.utils.data.DataLoader(
        test_data, batch_size=hparams.BATCH_SIZE, collate_fn=collate)
    
    # Model
    if "override_models_with_gru" in kwargs and kwargs["override_models_with_gru"]:
        model = GRU(
            vocab_size, 
            hparams.EMBEDDING_DIM, 
            hparams.HIDDEN_DIM, 
            hparams.OUTPUT_DIM,
            hparams.N_LAYERS,
            hparams.DROPOUT_RATE, 
            hparams.PAD_INDEX,
            hparams.BIDIRECTIONAL,
            **kwargs)
    else:
        model = LSTM(
            vocab_size, 
            hparams.EMBEDDING_DIM, 
            hparams.HIDDEN_DIM, 
            hparams.OUTPUT_DIM,
            hparams.N_LAYERS,
            hparams.DROPOUT_RATE, 
            hparams.PAD_INDEX,
            hparams.BIDIRECTIONAL,
            **kwargs)
    num_params = count_parameters(model)
    print(f'The model has {num_params:,} trainable parameters')


    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)

    # Optimization. Lab 2 (a)(b) should choose one of them.
    # DO NOT TOUCH optimizer-specific hyperparameters! (e.g., eps, momentum)
    # DO NOT change optimizer implementations!
    if hparams.OPTIM == "sgd":
        optimizer = optim.SGD(
            model.parameters(), lr=hparams.LR, weight_decay=hparams.WD, momentum=.9)        
    elif hparams.OPTIM == "adagrad":
        optimizer = optim.Adagrad(
            model.parameters(), lr=hparams.LR, weight_decay=hparams.WD, eps=1e-6)
    elif hparams.OPTIM == "adam":
        optimizer = optim.Adam(
            model.parameters(), lr=hparams.LR, weight_decay=hparams.WD, eps=1e-6)
    elif hparams.OPTIM == "rmsprop":
        optimizer = optim.RMSprop(
            model.parameters(), lr=hparams.LR, weight_decay=hparams.WD, eps=1e-6, momentum=.9)
    else:
        raise NotImplementedError("Optimizer not implemented!")

    criterion = nn.CrossEntropyLoss()
    criterion = criterion.to(device)

    # Start training
    best_valid_loss = float('inf')
    train_losses = []
    train_accs = []
    valid_losses = []
    valid_accs = []
    
    # Warmup Scheduler. DO NOT TOUCH!
    WARMUP_STEPS = 200
    lr_scheduler = ConstantWithWarmup(optimizer, WARMUP_STEPS)

    for epoch in range(hparams.N_EPOCHS):
        
        # Your code: implement the training process and save the best model.
        # j=0
        # while j < 1:
          # for i in train_dataloader:
          #   print(i)
          # for i in valid_dataloader:
          #   print(i)
          # j+=1
        
        train_loss, train_acc = train(train_dataloader, model, criterion, optimizer, lr_scheduler, device)
        valid_loss, valid_acc = evaluate(valid_dataloader, model, criterion, device)
        
        
        epoch_train_loss = np.mean(train_loss)
        epoch_train_acc = np.mean(train_acc)
        epoch_valid_loss = np.mean(valid_loss)
        epoch_valid_acc = np.mean(valid_acc)

        # Save the model that achieves the smallest validation loss.
        if epoch_valid_loss < best_valid_loss:
            # Your code: save the best model somewhere (no need to submit it to Sakai)
            torch.save(model, '/content/drive/MyDrive/Deep Learning/Assignment 3/best_model_new.pth')
            #torch.save(model.state_dict(), '/content/drive/MyDrive/Deep Learning/Assignment 3/best_model.pth')
            best_valid_loss = epoch_valid_loss


        print(f'epoch: {epoch+1}')
        print(f'train_loss: {epoch_train_loss:.3f}, train_acc: {epoch_train_acc:.3f}')
        print(f'valid_loss: {epoch_valid_loss:.3f}, valid_acc: {epoch_valid_acc:.3f}')


    # Your Code: Load the best model's weights.
    # if "override_models_with_gru" in kwargs and kwargs["override_models_with_gru"]:
    #     model = GRU(
    #         vocab_size, 
    #         hparams.EMBEDDING_DIM, 
    #         hparams.HIDDEN_DIM, 
    #         hparams.OUTPUT_DIM,
    #         hparams.N_LAYERS,
    #         hparams.DROPOUT_RATE, 
    #         hparams.PAD_INDEX,
    #         hparams.BIDIRECTIONAL,
    #         **kwargs)
    # else:
    #     model = LSTM(
    #         vocab_size, 
    #         hparams.EMBEDDING_DIM, 
    #         hparams.HIDDEN_DIM, 
    #         hparams.OUTPUT_DIM,
    #         hparams.N_LAYERS,
    #         hparams.DROPOUT_RATE, 
    #         hparams.PAD_INDEX,
    #         hparams.BIDIRECTIONAL,
    #         **kwargs)
    
    model = torch.load('/content/drive/MyDrive/Deep Learning/Assignment 3/best_model_new.pth')
    model.cuda()

    # Your Code: evaluate test loss on testing dataset (NOT Validation)
    test_loss, test_acc = evaluate(test_dataloader, model, criterion, device)

    epoch_test_loss = np.mean(test_loss)
    epoch_test_acc = np.mean(test_acc)
    print(f'test_loss: {epoch_test_loss:.3f}, test_acc: {epoch_test_acc:.3f}')
    
    # Free memory for later usage.
    del model
    torch.cuda.empty_cache()
    return {
        'num_params': num_params,
        "test_loss": epoch_test_loss,
        "test_acc": epoch_test_acc,
    }

### Lab 1 (f): Train model with original hyperparameters, for LSTM.

Train the model with default hyperparameter settings.

In [None]:
org_hyperparams = HyperParams()
_ = train_and_test_model_with_hparams(org_hyperparams, "lstm_1layer_base_sgd_e32_h100")

shape of train data is (35000,)
shape of test data is (10000,)
shape of valid data is (5000,)
Length of vocabulary is 60885
The model has 102,287 trainable parameters
training...: 100%|██████████| 365/365 [00:21<00:00, 17.01it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 25.62it/s]
epoch: 1
train_loss: 0.693, train_acc: 0.498
valid_loss: 0.693, valid_acc: 0.511
training...: 100%|██████████| 365/365 [00:20<00:00, 17.62it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 25.56it/s]
epoch: 2
train_loss: 0.694, train_acc: 0.498
valid_loss: 0.693, valid_acc: 0.511
training...: 100%|██████████| 365/365 [00:20<00:00, 18.18it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 25.86it/s]
epoch: 3
train_loss: 0.693, train_acc: 0.503
valid_loss: 0.693, valid_acc: 0.489
training...: 100%|██████████| 365/365 [00:21<00:00, 17.09it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 25.80it/s]
epoch: 4
train_loss: 0.693, train_acc: 0.497
valid_loss: 0.694, valid_acc: 0.489
train

### Lab 1 (h) Train GRU with vanilla hyperparameters.

In [None]:
org_hyperparams = HyperParams()
_ = train_and_test_model_with_hparams(org_hyperparams, "gru_1layer_base_sgd_e32_h100", override_models_with_gru=True)

shape of train data is (35000,)
shape of test data is (10000,)
shape of valid data is (5000,)
Length of vocabulary is 60885
The model has 91,987 trainable parameters
training...: 100%|██████████| 365/365 [00:19<00:00, 18.83it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 26.09it/s]
epoch: 1
train_loss: 0.693, train_acc: 0.496
valid_loss: 0.698, valid_acc: 0.489
training...: 100%|██████████| 365/365 [00:19<00:00, 18.74it/s]
evaluating...: 100%|██████████| 53/53 [00:01<00:00, 26.54it/s]
epoch: 2
train_loss: 0.694, train_acc: 0.500
valid_loss: 0.693, valid_acc: 0.489
training...: 100%|██████████| 365/365 [00:20<00:00, 17.68it/s]
evaluating...: 100%|██████████| 53/53 [00:01<00:00, 26.99it/s]
epoch: 3
train_loss: 0.693, train_acc: 0.496
valid_loss: 0.694, valid_acc: 0.489
training...: 100%|██████████| 365/365 [00:19<00:00, 18.78it/s]
evaluating...: 100%|██████████| 53/53 [00:01<00:00, 26.55it/s]
epoch: 4
train_loss: 0.693, train_acc: 0.500
valid_loss: 0.694, valid_acc: 0.511
traini

### Lab 2 (a) Study of LSTM Optimizers. Hint: For adaptive optimizers, we recommend using a learning rate of 0.001 (instead of 0.01).

In [None]:
# setting the new LR and optimizer
new_hparams = HyperParams()
new_hparams.LR = 0.001
new_hparams.OPTIM = "adagrad"

In [None]:
_ = train_and_test_model_with_hparams(new_hparams, "lstm")

shape of train data is (35000,)
shape of test data is (10000,)
shape of valid data is (5000,)
Length of vocabulary is 60885
The model has 102,287 trainable parameters
training...: 100%|██████████| 365/365 [00:19<00:00, 19.08it/s]
evaluating...: 100%|██████████| 53/53 [00:01<00:00, 27.36it/s]
epoch: 1
train_loss: 0.693, train_acc: 0.497
valid_loss: 0.693, valid_acc: 0.511
training...: 100%|██████████| 365/365 [00:19<00:00, 19.15it/s]
evaluating...: 100%|██████████| 53/53 [00:01<00:00, 27.18it/s]
epoch: 2
train_loss: 0.693, train_acc: 0.530
valid_loss: 0.689, valid_acc: 0.545
training...: 100%|██████████| 365/365 [00:18<00:00, 19.46it/s]
evaluating...: 100%|██████████| 53/53 [00:01<00:00, 27.36it/s]
epoch: 3
train_loss: 0.648, train_acc: 0.642
valid_loss: 0.643, valid_acc: 0.669
training...: 100%|██████████| 365/365 [00:19<00:00, 19.14it/s]
evaluating...: 100%|██████████| 53/53 [00:01<00:00, 27.50it/s]
epoch: 4
train_loss: 0.532, train_acc: 0.786
valid_loss: 0.523, valid_acc: 0.801
train

### Lab 2 (b): Study of GRU Optimizers. Hint: For adaptive optimizers, we recommend using a learning rate of 0.001 (instead of 0.01).

In [None]:
# setting the new LR and optimizer
new_hparams = HyperParams()
new_hparams.LR = 0.001
new_hparams.OPTIM = "adagrad"

In [None]:
_ = train_and_test_model_with_hparams(new_hparams, "gru", override_models_with_gru=True)

shape of train data is (35000,)
shape of test data is (10000,)
shape of valid data is (5000,)
Length of vocabulary is 60885
The model has 91,987 trainable parameters
training...: 100%|██████████| 365/365 [00:18<00:00, 19.85it/s]
evaluating...: 100%|██████████| 53/53 [00:01<00:00, 26.82it/s]
epoch: 1
train_loss: 0.693, train_acc: 0.502
valid_loss: 0.693, valid_acc: 0.489
training...: 100%|██████████| 365/365 [00:18<00:00, 19.59it/s]
evaluating...: 100%|██████████| 53/53 [00:01<00:00, 27.74it/s]
epoch: 2
train_loss: 0.693, train_acc: 0.524
valid_loss: 0.692, valid_acc: 0.525
training...: 100%|██████████| 365/365 [00:18<00:00, 19.69it/s]
evaluating...: 100%|██████████| 53/53 [00:01<00:00, 28.15it/s]
epoch: 3
train_loss: 0.662, train_acc: 0.612
valid_loss: 0.545, valid_acc: 0.786
training...: 100%|██████████| 365/365 [00:18<00:00, 19.73it/s]
evaluating...: 100%|██████████| 53/53 [00:01<00:00, 27.20it/s]
epoch: 4
train_loss: 0.514, train_acc: 0.797
valid_loss: 0.524, valid_acc: 0.792
traini

### Lab 2 (c) Deeper LSTMs

In [None]:
new_hparams = HyperParams()
new_hparams.OPTIM = 'rmsprop'
new_hparams.LR = 0.001
new_hparams.N_LAYERS = 3

_ = train_and_test_model_with_hparams(new_hparams, "deeper_lstm")

shape of train data is (35000,)
shape of test data is (10000,)
shape of valid data is (5000,)
Length of vocabulary is 60885
The model has 263,887 trainable parameters
training...: 100%|██████████| 365/365 [00:22<00:00, 16.30it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 25.14it/s]
epoch: 1
train_loss: 0.695, train_acc: 0.514
valid_loss: 0.693, valid_acc: 0.511
training...: 100%|██████████| 365/365 [00:25<00:00, 14.39it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 24.35it/s]
epoch: 2
train_loss: 0.696, train_acc: 0.500
valid_loss: 0.693, valid_acc: 0.511
training...: 100%|██████████| 365/365 [00:24<00:00, 14.72it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 25.35it/s]
epoch: 3
train_loss: 0.695, train_acc: 0.502
valid_loss: 0.693, valid_acc: 0.511
training...: 100%|██████████| 365/365 [00:23<00:00, 15.78it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 24.64it/s]
epoch: 4
train_loss: 0.695, train_acc: 0.500
valid_loss: 0.693, valid_acc: 0.511
train

### Lab 2 (d) Wider LSTMs

In [None]:
new_hparams = HyperParams()
new_hparams.OPTIM = 'rmsprop'
new_hparams.LR = 0.001
new_hparams.HIDDEN_DIM = 157

_ = train_and_test_model_with_hparams(new_hparams, "wider_lstm")

shape of train data is (35000,)
shape of test data is (10000,)
shape of valid data is (5000,)
Length of vocabulary is 60885
The model has 161,681 trainable parameters
training...: 100%|██████████| 365/365 [00:20<00:00, 18.13it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 25.14it/s]
epoch: 1
train_loss: 0.738, train_acc: 0.547
valid_loss: 0.667, valid_acc: 0.572
training...: 100%|██████████| 365/365 [00:20<00:00, 18.06it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 25.37it/s]
epoch: 2
train_loss: 0.531, train_acc: 0.715
valid_loss: 0.359, valid_acc: 0.850
training...: 100%|██████████| 365/365 [00:19<00:00, 18.45it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 25.32it/s]
epoch: 3
train_loss: 0.304, train_acc: 0.877
valid_loss: 0.334, valid_acc: 0.862
training...: 100%|██████████| 365/365 [00:19<00:00, 18.28it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 25.95it/s]
epoch: 4
train_loss: 0.243, train_acc: 0.906
valid_loss: 0.305, valid_acc: 0.873
train

### Lab 2 (e) Larger Embedding Table

In [None]:
# rmsprop - LSTM
new_hparams = HyperParams()
new_hparams.OPTIM = 'rmsprop'
new_hparams.LR = 0.001
new_hparams.EMBEDDING_DIM= 45

_ = train_and_test_model_with_hparams(new_hparams, "embedding_lstm")

shape of train data is (35000,)
shape of test data is (10000,)
shape of valid data is (5000,)
Length of vocabulary is 60885
The model has 2,798,827 trainable parameters
training...: 100%|██████████| 365/365 [00:19<00:00, 18.36it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 26.06it/s]
epoch: 1
train_loss: 0.594, train_acc: 0.661
valid_loss: 0.352, valid_acc: 0.847
training...: 100%|██████████| 365/365 [00:19<00:00, 18.54it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 25.97it/s]
epoch: 2
train_loss: 0.288, train_acc: 0.882
valid_loss: 0.306, valid_acc: 0.868
training...: 100%|██████████| 365/365 [00:19<00:00, 18.44it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 26.08it/s]
epoch: 3
train_loss: 0.194, train_acc: 0.930
valid_loss: 0.325, valid_acc: 0.869
training...: 100%|██████████| 365/365 [00:19<00:00, 18.65it/s]
evaluating...: 100%|██████████| 53/53 [00:02<00:00, 26.07it/s]
epoch: 4
train_loss: 0.130, train_acc: 0.954
valid_loss: 0.393, valid_acc: 0.858
tra

### Lab 2(f) Compound scaling of embedding_dim, hidden_dim, layers

### Lab 2 (g) Bi-Directional LSTM, using best architecture from (f)