<a href="https://colab.research.google.com/github/mitkrieg/dl-assignment-2/blob/main/assignment2_practical.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CS 5787 Deep Learning Assignment 2

This notebook implements the "small" LSTM model as described in "Recurrent Neural Network Regularization" by Zaremba et al (2014).

## Initial Setup

### Install Weights & Biases

In [1]:
!pip install wandb
!wandb login

Collecting wandb
  Downloading wandb-0.18.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Collecting docker-pycreds>=0.4.0 (from wandb)
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl.metadata (1.8 kB)
Collecting gitpython!=3.1.29,>=1.0.0 (from wandb)
  Downloading GitPython-3.1.43-py3-none-any.whl.metadata (13 kB)
Collecting sentry-sdk>=1.0.0 (from wandb)
  Downloading sentry_sdk-2.14.0-py2.py3-none-any.whl.metadata (9.7 kB)
Collecting setproctitle (from wandb)
  Downloading setproctitle-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.9 kB)
Collecting gitdb<5,>=4.0.1 (from gitpython!=3.1.29,>=1.0.0->wandb)
  Downloading gitdb-4.0.11-py3-none-any.whl.metadata (1.2 kB)
Collecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython!=3.1.29,>=1.0.0->wandb)
  Downloading smmap-5.0.1-py3-none-any.whl.metadata (4.3 kB)
Downloading wandb-0.18.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_

### Imports & GPU Check

In [2]:
import torch
from torch import nn
from torch.utils.data import Dataset
from torch import optim
import torch.nn.functional as F
import math
import wandb

torch.manual_seed(123)
torch.cuda.manual_seed(123)
torch.cuda.manual_seed_all(123)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

print("------ ACCELERATION INFO -----")
print('CUDA GPU Available:',torch.cuda.is_available())
print('MPS GPU Available:', torch.backends.mps.is_available())
if torch.cuda.is_available():
  device = torch.device('cuda')
  print('GPU Name:',torch.cuda.get_device_name(0))
  print('GPU Count:',torch.cuda.device_count())
  print('GPU Memory Allocated:',torch.cuda.memory_allocated(0))
  print('GPU Memory Cached:',torch.cuda.memory_reserved(0))
# elif torch.backends.mps.is_available() and torch.backends.mps.is_built():
#   device = torch.device('mps')
#   print('Pytorch GPU Build:',torch.backends.mps.is_built())
else:
  device = torch.device('cpu')
  print('Using CPU')

------ ACCELERATION INFO -----
CUDA GPU Available: True
MPS GPU Available: False
GPU Name: Tesla T4
GPU Count: 1
GPU Memory Allocated: 0
GPU Memory Cached: 0


## Vocabulary & PTBText Dataset Classes

Parse data from raw files & create dataset class to interact with

In [4]:
class Vocab:
    def __init__(self, pre_built_dict: dict=None):
        if pre_built_dict:
            self.vocab = pre_built_dict
        else:
            self.vocab = {'<pad>': 0, '<oov>': 1, '<sos>': 2, '<eos>': 3, '<unk>': 4}
        self.idx = len(self.vocab)

    def add_word(self, word: str) -> None:
        if word not in self.vocab:
            self.vocab[word] = self.idx
            self.idx += 1

    def encode(self, tokens: list[str]) -> list[int]:
        return [self.vocab.get(word, self.vocab['<unk>']) for word in tokens]

    def decode(self, indicies: list[int]) -> list[str]:
        return [list(self.vocab.keys())[list(self.vocab.values()).index(idx)] for idx in indicies]

    def __len__(self):
        return len(self.vocab)


class PTBText(Dataset):
    def __init__(self, path: str, vocab: Vocab=Vocab(), build_vocab=True, batch_size=20, seqence_length=20, device=torch.device('cpu')):
        self.path = path
        self.device = device
        self.vocab = vocab
        self.data = self.load_data(build_vocab)
        self.batch_size = batch_size
        self.chunk_size = len(self.data) // batch_size
        self.seq_len = seqence_length
        self.minibatches = self.create_batches()

    def load_data(self, build_vocab):
        data = []
        with open(self.path, 'r') as f:
            count = 0
            for line in f:
                count += 1
                tokens = line.strip().split() + ['<eos>']
                if build_vocab:
                    for token in tokens:
                        self.vocab.add_word(token)

                encoded_line = self.vocab.encode(tokens)
                data.extend(encoded_line)
        return data

    def create_batches(self):
        return [self.data[i*self.chunk_size: (i+1)*self.chunk_size] for i in range(self.batch_size)]

    def __len__(self):
        return len(self.data)

    def __getitem__(self, j):
        inputs = torch.stack([
            torch.LongTensor(self.minibatches[i][j * self.seq_len : (j + 1) * self.seq_len])
            for i in range(self.batch_size)], dim=0)
        labels = torch.stack([
            torch.LongTensor(self.minibatches[i][j * self.seq_len + 1 : (j + 1) * self.seq_len + 1])
            for i in range(self.batch_size)], dim=0)

        return inputs.to(self.device), labels.to(self.device)

    def get_tokens(self, idx):
        return self.data[idx]

    def get_decoded_tokens(self, idx):
        return self.vocab.decode(self.data[idx])


train = PTBText('./data/ptb.train.txt', device=device)
val = PTBText('./data/ptb.valid.txt', vocab=train.vocab, build_vocab=False, device=device)
test = PTBText('./data/ptb.test.txt', vocab=train.vocab, build_vocab=False, device=device)

datasets = {
    'train': train,
    'val': val,
    'test': test
}

print("Vocab size:", len(train.vocab))
print("Train data size:", len(train))
print("Val data size:", len(val))
print("Test data size:", len(test))

Vocab size: 10003
Train data size: 929589
Val data size: 73760
Test data size: 82430


## Define Model Architecture

The `ZamrembaRNN` module allows for either LSTM or GRU models to be implmented with or without dropout as described in the paper

In [5]:
class ZamrembaRNN(nn.Module):
    def __init__(self, rnn_type, vocab_size, batch_size=20, embedding_dim=200, hidden_dim=200, num_layers=2, dropout=0, rnn_dropout=0):
        super().__init__()
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.num_layers = num_layers
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.rnn_type = rnn_type
        self.batch_size = batch_size
        if rnn_type == 'lstm':
            self.rnn = nn.LSTM(embedding_dim, hidden_dim, num_layers, dropout=rnn_dropout, batch_first=True)
        elif rnn_type == 'gru':
            self.rnn = nn.GRU(embedding_dim, hidden_dim, num_layers, dropout=rnn_dropout, batch_first=True)
        else:
            raise ValueError("Invalid RNN type: must be 'lstm' or 'gru'")
        self.fc = nn.Linear(hidden_dim, vocab_size)
        if dropout > 0:
            self.dropout = nn.Dropout(dropout)
        else:
            self.dropout = None

        self.init_weights()

    def forward(self, input, hidden):
        output = self.embedding(input)
        if self.dropout is not None:
            output = self.dropout(output)

        #LSTM has two states (hidden& cell) where as GRU only has one hidden state
        if self.rnn_type == 'lstm':
            output, hidden = self.rnn(output, hidden)
        elif self.rnn_type == 'gru':
            output, hidden = self.rnn(output, hidden[0])

        if self.dropout is not None:
            output = self.dropout(output)

        output = self.fc(output)
        return output, hidden

    def init_weights(self):
        gen = torch.Generator().manual_seed(132)
        initrange = 0.1
        nn.init.uniform_(self.embedding.weight, -initrange, initrange, generator=gen)
        nn.init.uniform_(self.rnn.weight_ih_l0, -initrange, initrange, generator=gen)
        nn.init.uniform_(self.rnn.weight_hh_l0, -initrange, initrange, generator=gen)
        nn.init.uniform_(self.fc.weight, -initrange, initrange, generator=gen)

## Define Training & Evaluation Loops

In [6]:
def get_new_hidden(model):
    if model.rnn_type == 'lstm':
        return (torch.zeros(model.num_layers, model.batch_size, model.hidden_dim).to(device),
              torch.zeros(model.num_layers, model.batch_size, model.hidden_dim).to(device))
    elif model.rnn_type == 'gru':
        return torch.zeros(model.num_layers, model.batch_size, model.hidden_dim).to(device).unsqueeze(0)
    else:
        raise ValueError("Invalid RNN type: must be 'lstm' or 'gru'")

def detach_hidden(hidden):
    if isinstance(hidden, tuple):
        return tuple([h.detach() for h in hidden])
    else:
        return hidden.detach()

def train_epoch(model, dataset, loss_fn, optimizer, device, epoch, verbosity):
    """Train one epoch of a network"""
    model.train()
    batch_loss = 0

    hidden = get_new_hidden(model)

    for j in range(dataset.chunk_size // dataset.seq_len):

        inputs, labels = dataset[j]

        optimizer.zero_grad()
        hidden = detach_hidden(hidden)

        outputs, hidden = model(inputs, hidden)
        if model.rnn_type == 'gru':
            hidden = hidden.unsqueeze(0)

        loss = loss_fn(outputs.view(-1, outputs.shape[-1]), labels.view(-1))
        loss.backward()
        optimizer.step()

        batch_loss += loss.item()
        if (j + 1) % verbosity == 0:
            print(f'Batch #{j + 1} Loss: {batch_loss / verbosity}')
            batch_loss = 0

def perplexity(loss, batches):
    return math.exp(loss / batches)

def evaluate_model(title, model, dataset, loss_fn, seq_len, batch_size, epoch):
    model.eval()
    total_loss = 0
    num_batches = len(dataset) // (batch_size * seq_len)

    hidden = get_new_hidden(model)

    with torch.no_grad():
        for j in range(num_batches):

            inputs, labels = dataset[j]

            outputs, hidden = model(inputs, hidden)
            if model.rnn_type == 'gru':
                hidden = hidden.unsqueeze(0)
            loss = loss_fn(outputs.view(-1, outputs.shape[-1]), labels.view(-1))
            total_loss += loss.item()

    perp = perplexity(total_loss, num_batches)
    wandb.log({
            f'{title}-loss': total_loss / num_batches,
            f'{title}-perplexity': perp
        }, step=epoch)

    print(f'\033[92m{title} perplexity: {perp:.6f} ||| loss {total_loss / num_batches:.6f}\033[0m')

    return perp

def train_network(model, datasets, loss_fn, optimizer, schedule, device, epochs: int, verbosity: int):
    for epoch in range(epochs):
        lr = optimizer.param_groups[0]['lr']

        print(f'----------- Epoch #{epoch + 1}, LR: {lr} ------------')
        train_epoch(model, datasets['train'], loss_fn, optimizer, device, epoch, verbosity)
        train_perplexity = evaluate_model('Train', model, datasets['train'], loss_fn, datasets['train'].seq_len, datasets['train'].batch_size, epoch)
        val_perplexity = evaluate_model('Validation', model, datasets['val'], loss_fn, datasets['train'].seq_len, datasets['train'].batch_size, epoch)
        test_perplexity = evaluate_model('Test', model, datasets['test'], loss_fn, datasets['train'].seq_len, datasets['train'].batch_size, epoch)
        print('------------------------------------\n')

        schedule.step()
    print('----------- Train Complete! ------------')
    return {
        'train':train_perplexity,
        'val':val_perplexity,
        'test':test_perplexity
    }

## Train Models

### LSTM No Regularization

In [8]:
decay_start = 10
learning_rate_decay = 0.5
lr = 4
dropout_rate = 0

def lr_lambda(epoch):
    if epoch < decay_start:
        return 1
    else:
        return learning_rate_decay ** (epoch - (decay_start-1))

model = ZamrembaRNN('lstm', len(train.vocab)).to(device)
sgd = optim.SGD(model.parameters(), lr=lr)
cross_entropy = nn.CrossEntropyLoss()
schedule = optim.lr_scheduler.LambdaLR(sgd, lr_lambda)


run = wandb.init(project="dl-assignment2-quad", config={
    'batch_size':datasets['train'].batch_size,
    'embedding_size':model.embedding_dim,
    'hidden_units':model.hidden_dim,
    'num_lstm_layers':model.num_layers,
    'dropout_rate':dropout_rate,
    'decay_at':decay_start,
    'learning_rate_decay':learning_rate_decay,
    'learning_rate_start':lr,
    'optimizer':'SGD',
    'seq_len':datasets['train'].seq_len,
    'rnn_type':model.rnn_type
})
final_metrics = train_network(model, datasets, cross_entropy, sgd, schedule, device, 14, 500)
run.finish()

VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

----------- Epoch #1, LR: 4 ------------
Batch #500 Loss: 6.814496244430542
Batch #1000 Loss: 6.190024091720581
Batch #1500 Loss: 5.95184453201294
Batch #2000 Loss: 5.80175147819519
[92mTrain perplexity: 288.813738 ||| loss 5.665782[0m
[92mValidation perplexity: 294.080869 ||| loss 5.683855[0m
[92mTest perplexity: 287.957971 ||| loss 5.662815[0m
------------------------------------

----------- Epoch #2, LR: 4 ------------
Batch #500 Loss: 5.615063290596009
Batch #1000 Loss: 5.535354831695557
Batch #1500 Loss: 5.4458003463745115
Batch #2000 Loss: 5.38455379486084
[92mTrain perplexity: 203.791267 ||| loss 5.317096[0m
[92mValidation perplexity: 220.461390 ||| loss 5.395723[0m
[92mTest perplexity: 214.807699 ||| loss 5.369743[0m
------------------------------------

----------- Epoch #3, LR: 4 ------------
Batch #500 Loss: 5.295789780616761
Batch #1000 Loss: 5.249943510055542
Batch #1500 Loss: 5.1965356426239016
Batch #2000 Loss: 5.156316002845764
[92mTrain perplexity: 166.98

VBox(children=(Label(value='0.017 MB of 0.017 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
Test-loss,█▆▄▃▃▂▂▂▂▂▁▁▁▁
Test-perplexity,█▅▄▃▂▂▂▂▁▁▁▁▁▁
Train-loss,█▆▆▅▄▄▃▃▃▂▂▁▁▁
Train-perplexity,█▅▄▃▃▃▂▂▂▂▁▁▁▁
Validation-loss,█▆▄▃▃▂▂▂▂▁▁▁▁▁
Validation-perplexity,█▅▄▃▂▂▂▁▁▁▁▁▁▁

0,1
Test-loss,4.8057
Test-perplexity,122.20443
Train-loss,4.11458
Train-perplexity,61.22637
Validation-loss,4.83841
Validation-perplexity,126.26872


#### Save and Test Example

In [9]:
#saving
torch.save(model.state_dict(), './models/lstm_noreg.pth')

In [29]:
#testing
sample = 'the financial outlook has become strong for a company that has had a tough'
model = ZamrembaRNN('lstm', len(train.vocab)).to(device)
model.load_state_dict(torch.load('./models/lstm_noreg.pth', weights_only=True))
model.eval()
with torch.no_grad():
    tokens = train.vocab.encode(sample.split())
    inputs = torch.LongTensor(tokens).unsqueeze(0).to(device)
    hidden = (torch.zeros(model.num_layers, 1, model.hidden_dim).to(device),
              torch.zeros(model.num_layers, 1, model.hidden_dim).to(device))
    outputs, hidden = model(inputs, hidden)

    prediction = torch.argmax(outputs, dim=-1)
    print(" ".join(train.vocab.decode(prediction.squeeze().tolist())))

company times for been a <eos> the year <eos> would been a N impact


### LSTM with Dropout

In [34]:
decay_start = 11
learning_rate_decay = 0.75
lr = 6
dropout_rate = 0.5
lstm_dropout = 0.2

def lr_lambda(epoch):
    if epoch < decay_start:
        return 1
    else:
        return learning_rate_decay ** (epoch - (decay_start-1))

model = ZamrembaRNN('lstm', len(train.vocab), dropout=dropout_rate, rnn_dropout=lstm_dropout).to(device)
sgd = optim.SGD(model.parameters(), lr=lr)
cross_entropy = nn.CrossEntropyLoss()
schedule = optim.lr_scheduler.LambdaLR(sgd, lr_lambda)


run = wandb.init(project="dl-assignment2-quad", config={
    'batch_size':datasets['train'].batch_size,
    'embedding_size':model.embedding_dim,
    'hidden_units':model.hidden_dim,
    'num_lstm_layers':model.num_layers,
    'dropout_rate':dropout_rate,
    'lstm_dropout':lstm_dropout,
    'decay_at':decay_start,
    'learning_rate_decay':learning_rate_decay,
    'learning_rate_start':lr,
    'optimizer':'SGD',
    'seq_len':datasets['train'].seq_len,
    'rnn_type':model.rnn_type
})
final_metrics = train_network(model, datasets, cross_entropy, sgd, schedule, device, 25, 500)
run.finish()

VBox(children=(Label(value='0.015 MB of 0.015 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
Test-loss,█▆▅▄▃▃▂▂▁
Test-perplexity,█▆▄▃▂▃▂▁▁
Train-loss,█▆▅▄▃▃▂▂▁
Train-perplexity,█▆▄▃▃▂▂▁▁
Validation-loss,█▆▅▄▃▃▂▂▁
Validation-perplexity,█▆▄▃▃▂▂▁▁

0,1
Test-loss,5.78995
Test-perplexity,326.99583
Train-loss,5.74052
Train-perplexity,311.22588
Validation-loss,5.81553
Validation-perplexity,335.47018


----------- Epoch #1, LR: 6 ------------
Batch #500 Loss: 6.715802015304566
Batch #1000 Loss: 6.174229908943176
Batch #1500 Loss: 5.916929319381714
Batch #2000 Loss: 5.751367631912231
[92mTrain perplexity: 245.219109 ||| loss 5.502152[0m
[92mValidation perplexity: 253.499155 ||| loss 5.535360[0m
[92mTest perplexity: 248.927253 ||| loss 5.517161[0m
------------------------------------

----------- Epoch #2, LR: 6 ------------
Batch #500 Loss: 5.580333048820496
Batch #1000 Loss: 5.504081024169922
Batch #1500 Loss: 5.426043882369995
Batch #2000 Loss: 5.372735411643982
[92mTrain perplexity: 172.691535 ||| loss 5.151507[0m
[92mValidation perplexity: 188.944896 ||| loss 5.241455[0m
[92mTest perplexity: 184.678262 ||| loss 5.218615[0m
------------------------------------

----------- Epoch #3, LR: 6 ------------
Batch #500 Loss: 5.304149119377136
Batch #1000 Loss: 5.26002091217041
Batch #1500 Loss: 5.2167480230331424
Batch #2000 Loss: 5.190609329223633
[92mTrain perplexity: 141.5

VBox(children=(Label(value='0.021 MB of 0.021 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
Test-loss,█▆▅▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁
Test-perplexity,█▅▄▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Train-loss,█▆▅▅▄▄▃▃▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁
Train-perplexity,█▅▄▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁
Validation-loss,█▆▅▄▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁
Validation-perplexity,█▅▄▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
Test-loss,4.56221
Test-perplexity,95.79448
Train-loss,4.11136
Train-perplexity,61.02938
Validation-loss,4.59685
Validation-perplexity,99.17099


In [35]:
torch.save(model.state_dict(), './models/lstm_drop.pth')

### GRU No Regularization

In [36]:
decay_start = 6
learning_rate_decay = 0.5
lr = 1
dropout_rate = 0

def lr_lambda(epoch):
    if epoch < decay_start:
        return 1
    else:
        return learning_rate_decay ** (epoch - (decay_start-1))

model = ZamrembaRNN('gru', len(train.vocab)).to(device)
sgd = optim.SGD(model.parameters(), lr=lr)
cross_entropy = nn.CrossEntropyLoss()
schedule = optim.lr_scheduler.LambdaLR(sgd, lr_lambda)


run = wandb.init(project="dl-assignment2-quad", config={
    'batch_size':datasets['train'].batch_size,
    'embedding_size':model.embedding_dim,
    'hidden_units':model.hidden_dim,
    'num_lstm_layers':model.num_layers,
    'dropout_rate':dropout_rate,
    'decay_at':decay_start,
    'learning_rate_decay':learning_rate_decay,
    'learning_rate_start':lr,
    'optimizer':'SGD',
    'seq_len':datasets['train'].seq_len,
    'rnn_type':model.rnn_type
})
final_metrics = train_network(model, datasets, cross_entropy, sgd, schedule, device, 14, 500)
run.finish()

----------- Epoch #1, LR: 1 ------------
Batch #500 Loss: 6.746936987876892
Batch #1000 Loss: 6.250378631591797
Batch #1500 Loss: 5.963424120903015
Batch #2000 Loss: 5.783058778762817
[92mTrain perplexity: 276.287559 ||| loss 5.621442[0m
[92mValidation perplexity: 283.033889 ||| loss 5.645567[0m
[92mTest perplexity: 276.072023 ||| loss 5.620662[0m
------------------------------------

----------- Epoch #2, LR: 1 ------------
Batch #500 Loss: 5.577802412033081
Batch #1000 Loss: 5.488571425437927
Batch #1500 Loss: 5.389683073043823
Batch #2000 Loss: 5.3236801404953
[92mTrain perplexity: 190.024128 ||| loss 5.247151[0m
[92mValidation perplexity: 208.749742 ||| loss 5.341136[0m
[92mTest perplexity: 203.273951 ||| loss 5.314555[0m
------------------------------------

----------- Epoch #3, LR: 1 ------------
Batch #500 Loss: 5.230614698410034
Batch #1000 Loss: 5.177843736648559
Batch #1500 Loss: 5.1162847738265995
Batch #2000 Loss: 5.0796004562377925
[92mTrain perplexity: 151.1

VBox(children=(Label(value='0.017 MB of 0.017 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
Test-loss,█▅▄▃▃▂▂▁▁▁▁▁▁▁
Test-perplexity,█▅▃▃▂▂▁▁▁▁▁▁▁▁
Train-loss,█▆▅▄▃▂▂▁▁▁▁▁▁▁
Train-perplexity,█▅▄▃▂▂▁▁▁▁▁▁▁▁
Validation-loss,█▅▄▃▃▂▂▁▁▁▁▁▁▁
Validation-perplexity,█▅▃▃▂▂▁▁▁▁▁▁▁▁

0,1
Test-loss,4.76812
Test-perplexity,117.6976
Train-loss,4.32936
Train-perplexity,75.89562
Validation-loss,4.80879
Validation-perplexity,122.58351


In [37]:
torch.save(model.state_dict(), './models/gru_noreg.pth')

### GRU with Dropout

In [38]:
decay_start = 20
learning_rate_decay = 0.75
lr = 1
dropout_rate = 0.5
gru_dropout = 0.2

def lr_lambda(epoch):
    if epoch < decay_start:
        return 1
    else:
        return learning_rate_decay ** (epoch - (decay_start-1))

model = ZamrembaRNN('gru', len(train.vocab), dropout=dropout_rate, rnn_dropout=gru_dropout).to(device)
sgd = optim.SGD(model.parameters(), lr=lr)
cross_entropy = nn.CrossEntropyLoss()
schedule = optim.lr_scheduler.LambdaLR(sgd, lr_lambda)


run = wandb.init(project="dl-assignment2-quad", config={
    'batch_size':datasets['train'].batch_size,
    'embedding_size':model.embedding_dim,
    'hidden_units':model.hidden_dim,
    'num_lstm_layers':model.num_layers,
    'dropout_rate':dropout_rate,
    'lstm_dropout':gru_dropout,
    'decay_at':decay_start,
    'learning_rate_decay':learning_rate_decay,
    'learning_rate_start':lr,
    'optimizer':'SGD',
    'seq_len':datasets['train'].seq_len,
    'rnn_type':model.rnn_type
})
final_metrics = train_network(model, datasets, cross_entropy, sgd, schedule, device, 25, 500)
run.finish()

----------- Epoch #1, LR: 1 ------------
Batch #500 Loss: 6.797517258644104
Batch #1000 Loss: 6.367005604743958
Batch #1500 Loss: 6.148962080001831
Batch #2000 Loss: 6.013281662940979
[92mTrain perplexity: 323.513344 ||| loss 5.779240[0m
[92mValidation perplexity: 326.051946 ||| loss 5.787057[0m
[92mTest perplexity: 317.177546 ||| loss 5.759462[0m
------------------------------------

----------- Epoch #2, LR: 1 ------------
Batch #500 Loss: 5.855956418037414
Batch #1000 Loss: 5.791939136505127
Batch #1500 Loss: 5.711890823364258
Batch #2000 Loss: 5.662339807510376
[92mTrain perplexity: 232.388096 ||| loss 5.448409[0m
[92mValidation perplexity: 243.006195 ||| loss 5.493087[0m
[92mTest perplexity: 235.889802 ||| loss 5.463365[0m
------------------------------------

----------- Epoch #3, LR: 1 ------------
Batch #500 Loss: 5.592837042808533
Batch #1000 Loss: 5.5585172700881955
Batch #1500 Loss: 5.510842499732971
Batch #2000 Loss: 5.479744264602661
[92mTrain perplexity: 195.

VBox(children=(Label(value='0.021 MB of 0.021 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
Test-loss,█▆▅▅▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁
Test-perplexity,█▅▄▄▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁
Train-loss,█▇▆▅▅▄▄▄▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁
Train-perplexity,█▆▅▄▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁
Validation-loss,█▆▅▅▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁
Validation-perplexity,█▅▄▄▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁

0,1
Test-loss,4.6496
Test-perplexity,104.54309
Train-loss,4.20002
Train-perplexity,66.68792
Validation-loss,4.6789
Validation-perplexity,107.65121


In [41]:
torch.save(model.state_dict(), './models/gru_drop.pth')