<a href="https://colab.research.google.com/github/omullo/NLP-Playground/blob/main/4740_FA20_p3_lkn28.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project 3: Emotion detection with Neural Networks
## CS4740/5740 Fall 2020

Names: Luke Nyalala




Netids: lkn28

### Project Submission Due: November 13th
Please submit **pdf file** of this notebook on **Gradescope**, and **ipynb** on **CMS**. For instructions on generating pdf and ipynb files, please refer to project 1 instructions.









## Introduction
In this project we will consider **neural networks**: first a Feedforward Neural Network (FFNN) and second a Recurrent Neural Network (RNN), for performing a 5-class emotion detection task.

The project is divided into parts. In **Part 1**, you will be given an implementation for a FFNN and be asked to debug it in a specific way. In **Part 2**, you will then implement an RNN model for performing the same task. In **Part 3**, you will analyze these two models in two types of comparative studies and in **Part 4** you will answer questions describing what you have learned through this project. You also will be required to submit a description of libraries used, how your group divided up the work, and your feedback regarding the assignment (**Part 5**).

## Advice 🚀
As always, the report is important! The report is where you get to show
that you understand not only what you are doing but also why and how you are doing it. So be clear, organized and concise; avoid vagueness and excess verbiage. Spend time doing error analysis for the models. This is how you understand the advantages and drawbacks of the systems you build. The reports should read more like the papers that we have been writing critiques for.

All throughout the report you may be asked to place images, plots, etc. Feel free to write code that will generate the plots for you and use those or generate them some other way and insert into the colab. To add images in your colab, these are a few possible ways to do it!

1. Copy and paste the image in markdown! Yes this really does work

2. Upload to google drive, get a shareable link. It will be something like:

```
https://drive.google.com/file/d/1xDrydbSbijvK2JBftUz-5ovagN2B_RWH/view?usp=sharing
```
We want just the id which is `1xDrydbSbijvK2JBftUz-5ovagN2B_RWH` and the link we will use is:

```
https://drive.google.com/uc?export=view&id=your_id
```

Then in markdown you'd write the following:

```markdown
![image](https://drive.google.com/uc?export=view&id=1xDrydbSbijvK2JBftUz-5ovagN2B_RWH)
```

3. Using IPython!
```python
from IPython.display import Image
Image(filename="drive/GPU/data/iris.PNG")
```

4. Using your connected GDrive
```markdown
![iris](drive/GPU/data/iris.PNG)
```

## Dataset
You are given access to a set of tweets. These tweets have an associated
emotion $y \in Y := \{anger, fear, joy, love, sadness\}$. For this project, given the review text, you will
need to predict the associated rating, y. This is sometimes called fine-grained sentiment analysis in the literature; we will simply refer to it as sentiment analysis in this project.

We will minimally preprocess the reviews and handle tokenization in what we re-
lease. For this assignment, we do not anticipate any further preprocessing to be done by you. Should you choose to do so, it would be interesting to hear about in the report (along with whether or not it helped performance), but it is not a required aspect of the assignment.


In [None]:
from google.colab import drive
import os
drive.mount('/content/drive', force_remount=True)

train_path = os.path.join(os.getcwd(), "drive", "My Drive", "CS5740", "P3", "p3-cs4740-2020fa", "p3_train.txt") # replace based on your Google drive organization
val_path = os.path.join(os.getcwd(), "drive", "My Drive", "CS5740", "P3", "p3-cs4740-2020fa", "p3_val.txt") # replace based on your Google drive organization
test_path = os.path.join(os.getcwd(), "drive", "My Drive", "CS5740", "P3", "p3-cs4740-2020fa", "p3_test_no_labels.txt") # replace based on your Google drive organization

Mounted at /content/drive


# Part 1: Feedforward Neural Network

In this section, there are two main components relevant to **Part 1**.

1. `Data loader`\
As the name suggests, this section loads the data from the dataset files and handles other preprocessing and setup. You will **not** need to change this file and should **not** change this file throughout the assignment.

2. `ffnn`\
This contains the model and code that uses the model for **Part 1**

In the `ffnn` section, you will find a Feedforward Neural Net serving as the underlying model for performing emotion detection.



## Part 1: Tips

We do not assume you have **any** experience working with neural networks and/or debugging them. You may discover this process, while similar, is quite different from debuging in general software engineering and from debugging in other domains such as algorithms and systems.

We suggest you systematically step through the code and simultanously (perhaps by physically drawing it out) describe what the computations _mean_. What you are looking for is where the code differs from what is expected.

## Part 1: Rules

For **Part 1**, you will not be able to ask any questions on Piazza and we will be unable to provide any meaningful advice in office hours. Unfortunately, this is the nature of debugging, it is unlikely anyone can give you specific advice for most problems you encounter and we have already provided general tips in the preceding section, If you absolutely must ask a question or you believe there is some kind of issue with the assignment for this part, please submit a private Piazza post and we will respond swiftly.

As a reminder **communication about the assignment _between_ distinct groups is not permissed and is a violation of the Academic Integrity policy** For this assignment, we will be _extremely_ stringent about this, given that debugging is entirely pointless if someone else in a different group tells you where the error is.

## Import libraries and connect to Google Drive

In [None]:
import json
import math
import os
from pathlib import Path
import random
import time
from tqdm.notebook import tqdm, trange
from typing import Dict, List, Set, Tuple

import numpy as np
import torch
import torch.nn as nn
from torch.nn import init
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, SubsetRandomSampler
from torch.nn.utils.rnn import pad_packed_sequence, pad_sequence, pack_padded_sequence
from tqdm.notebook import tqdm, trange
import matplotlib.pyplot as plt
import torchtext

## Data loader

In [None]:
emotion_to_idx = {
    "anger": 0,
    "fear": 1,
    "joy": 2,
    "love": 3,
    "sadness": 4,
    "surprise": 5,
}
idx_to_emotion = {v: k for k, v in emotion_to_idx.items()}
UNK = "<UNK>"

In [None]:
def fetch_data(train_data_path, val_data_path, test_data_path):
    """fetch_data retrieves the data from a json/csv and outputs the validation
    and training data

    :param train_data_path:
    :type train_data_path: str
    :return: Training, validation pair where the training is a list of document, label pairs
    :rtype: Tuple[
        List[Tuple[List[str], int]],
        List[Tuple[List[str], int]],
        List[List[str]]
    ]
    """
    with open(train_data_path) as training_f:
        training = training_f.read().split("\n")[1:-1]
    with open(val_data_path) as valid_f:
        validation = valid_f.read().split("\n")[1:-1]
    with open(test_data_path) as testing_f:
        testing = testing_f.read().split("\n")[1:-1]
	
    # If needed you can shrink the training and validation data to speed up somethings but this isn't always safe to do by setting k < 10000
    # k = #fill in
    # training = random.shuffle(training)
    # validation = random.shuffle(validation)
    # training, validation = training[:k], validation[:(k // 10)]

    tra = []
    val = []
    test = []
    for elt in training:
        if elt == '':
            continue
        txt, emotion = elt.split(",")
        tra.append((txt.split(" "), emotion_to_idx[emotion]))
    for elt in validation:
        if elt == '':
            continue
        txt, emotion = elt.split(",")
        val.append((txt.split(" "), emotion_to_idx[emotion]))
    for elt in testing:
        if elt == '':
            continue
        txt = elt
        test.append(txt.split(" "))

    return tra, val, test

In [None]:
def make_vocab(data):
    """make_vocab creates a set of vocab words that the model knows

    :param data: The list of documents that is used to make the vocabulary
    :type data: List[str]
    :returns: A set of strings corresponding to the vocabulary
    :rtype: Set[str]
    """
    vocab = set()
    for document, _ in data:
        for word in document:
            vocab.add(word)
    return vocab 


def make_indices(vocab):
	"""make_indices creates a 1-1 mapping of word and indices for a vocab.

	:param vocab: The strings corresponding to the vocabulary in train data.
	:type vocab: Set[str]
	:returns: A tuple containing the vocab, word2index, and index2word.
		vocab is a set of strings in the vocabulary including <UNK>.
		word2index is a dictionary mapping tokens to its index (0, ..., V-1)
		index2word is a dictionary inverting the mapping of word2index
	:rtype: Tuple[
		Set[str],
		Dict[str, int],
		Dict[int, str],
	]
	"""
	vocab_list = sorted(vocab)
	vocab_list.append(UNK)
	word2index = {}
	index2word = {}
	for index, word in enumerate(vocab_list):
		word2index[word] = index 
		index2word[index] = word 
	vocab.add(UNK)
	return vocab, word2index, index2word 


def convert_to_vector_representation(data, word2index, test=False):
	"""convert_to_vector_representation converts the list of strings into a vector

	:param data: The dataset to be converted into a vectorized format
	:type data: Union[
		List[Tuple[List[str], int]],
		List[str],
	]
	:param word2index: A mapping of word to index
	:type word2index: Dict[str, int]
	:returns: A list of vector representations of the input or pairs of vector
		representations with expected output
	:rtype: List[Tuple[torch.Tensor, int]] or List[torch.Tensor]

	List[Tuple[List[torch.Tensor], int]] or List[List[torch.Tensor]]
	"""
	if test:
		vectorized_data = []
		for document in data:
			vector = torch.zeros(len(word2index)) 
			for word in document:
				index = word2index.get(word, word2index[UNK])
				vector[index] += 1
			vectorized_data.append(vector)
	else:
		vectorized_data = []
		for document, y in data:
			vector = torch.zeros(len(word2index)) 
			for word in document:
				index = word2index.get(word, word2index[UNK])
				vector[index] += 1
			vectorized_data.append((vector, y))
	return vectorized_data

In [None]:
class EmotionDataset(Dataset):
    """EmotionDataset is a torch dataset to interact with the emotion data.

    :param data: The vectorized dataset with input and expected output values
    :type data: List[Tuple[List[torch.Tensor], int]]
    """
    def __init__(self, data):
        self.X = torch.cat([X.unsqueeze(0) for X, _ in data])
        self.y = torch.LongTensor([y for _, y in data])
        self.len = len(data)
    
    def __len__(self):
        """__len__ returns the number of samples in the dataset.

        :returns: number of samples in dataset
        :rtype: int
        """
        return self.len
    
    def __getitem__(self, index):
        """__getitem__ returns the tensor, output pair for a given index

        :param index: index within dataset to return
        :type index: int
        :returns: A tuple (x, y) where x is model input and y is our label
        :rtype: Tuple[torch.Tensor, int]
        """
        return self.X[index], self.y[index]

def get_data_loaders(train, val, batch_size=16):
    """
    """
    # First we create the dataset given our train and validation lists
    dataset = EmotionDataset(train + val)
    print('X dim ', dataset.X.shape)
    print('y dim ', dataset.y.shape)
    # Then, we create a list of indices for all samples in the dataset
    train_indices = [i for i in range(len(train))]
    val_indices = [i for i in range(len(train), len(train) + len(val))]

    # Now we define samplers and loaders for train and val
    train_sampler = SubsetRandomSampler(train_indices)
    train_loader = DataLoader(dataset, batch_size=batch_size, sampler=train_sampler)
    
    val_sampler = SubsetRandomSampler(val_indices)
    val_loader = DataLoader(dataset, batch_size=batch_size, sampler=val_sampler)

    return train_loader, val_loader

In [None]:
train, val, test = fetch_data(train_path, val_path, test_path)

In [None]:
vocab = make_vocab(train)
vocab, word2index, index2word = make_indices(vocab)
train_vectorized = convert_to_vector_representation(train, word2index)
val_vectorized = convert_to_vector_representation(val, word2index)
test_vectorized = convert_to_vector_representation(test, word2index, True)

In [None]:
train_loader, val_loader = get_data_loaders(train_vectorized, val_vectorized, batch_size=1)

X dim  torch.Size([11265, 11832])
y dim  torch.Size([11265])


In [None]:
# Note: Colab has 12 hour limits on GPUs, also potential inactivity may kill the notebook. Save often!

## 1.1 FFNN Implementation

### 1.1 Task
Assume that an onmiscient oracle has told you there are **4 fundamental errors** in the **FFNN** implementation. They may be anywhere in this section unless otherwise indicated. Your objective is to _find_ and _fix_ each of these errors and to include in the report a description of the original error along with the fix. To help your efforts, the oracle has provided you with additional information about the properties of the errors as follows:

* _Correctness_ \
Each error causes the code to be strictly incorrect. There is absolutely no ambiguity that the errant code (or missing code) is incorrect. This means errors are not due to the code being inefficient (in run-time or in memory).

* _Localized_ \
Each error can be judged to be erroneous by strictly looking at the code (along with your knowledge of machine learning as taught through this course). The errors therefore are not due to the model being uncompetitive in terms of performance with state-of-the-art performance for this task nor are they due to the amount of data being insufficient for this task in general.

* _General_ \
Each error is general in nature. They will not be triggered by the model receiving a pathological input, i.e. they will not be something that is triggered specifically when NLP is referenced with negative sentiment.

* _Fundamental_ \
Each error is a fundamental failure in terms of doing what is intended. This means that errors do not hinge on nuanced understanding of specific PyTorch functionality. This also means they will not exploit properties of the dataset in
a subtle way that could only be realized by someone who has comprehensively studied the data.

The bottom line: the errors should be fairly obvious. The oracle further reminds you that performance/accuracy of the (resulting) model should not be how you ensure you have debugged successfully. For example, if you correct some, but not all, of the errors, the remaining errors may mask the impact of your fixes. Further, performance is not guaranteed to improve by fixing any particular error. Consider the case where the training set is also employed as the test set; performance will be very high but there is something very wrong. And fixing the problem will reduce performance.
In fixing each error, the oracle provides some further insight about the fixes:

* _Minimal_ \
A reasonable fix for each error can be achieved in < 5 lines of code being changes. We do not require you to make fixes of 4 of fewer lines, but it should be a cause for concern if your fixes are far more elaborate

* _Ill-posed_ \
While the errors are unambiguous, the method for fixing them is under-specified: You are free to implement any reasonable fix and all such fixes will equally recieve full credit.

In [None]:
# Lambda to switch to GPU if available
get_device = lambda : "cuda:0" if torch.cuda.is_available() else "cpu"
print(get_device())

cuda:0


In [None]:
unk = '<UNK>'

# Consult the PyTorch documentation for information on the functions used below:
# https://pytorch.org/docs/stable/torch.html

class FFNN(nn.Module):
	def __init__(self, input_dim, h, output_dim):
		super(FFNN, self).__init__()
		self.h = h
		self.W1 = nn.Linear(input_dim, h)
		self.activation = nn.ReLU() # The rectified linear unit; one valid choice of activation function
		self.W2 = nn.Linear(h, output_dim) #second arg changed to output_dim
    # The below two lines are not a source for an error
		self.softmax = nn.LogSoftmax(dim=1) # The softmax function that converts vectors into probability distributions; computes log probabilities for computational benefits
		self.loss = nn.NLLLoss() # The cross-entropy/negative log likelihood loss taught in class

	def compute_Loss(self, predicted_vector, gold_label):
		return self.loss(predicted_vector, gold_label)

	def forward(self, input_vector):
		# The z_i are just there to record intermediary computations for your clarity
		z1 = self.activation(self.W1(input_vector)) #activation function applied
		z2 = self.W2(z1)
		predicted_vector = self.softmax(z2)
		return predicted_vector
	
	def load_model(self, save_path):
		self.load_state_dict(torch.load(save_path))
	
	def save_model(self, save_path):
		torch.save(self.state_dict(), save_path)


def train_epoch(model, train_loader, optimizer):
	model.train()
	total = 0
	current_loss = 0
	correct = 0
	for (input_batch, expected_out) in tqdm(train_loader, leave=False, desc="Training Batches"):
		output = model(input_batch.to(get_device()))
		#print(output)
		total += output.size()[0]
		_, predicted = torch.max(output, 1)
		correct += (expected_out == predicted.to("cpu")).cpu().numpy().sum()
		optimizer.zero_grad() #zero out the gradients
		loss = model.compute_Loss(output, expected_out.to(get_device()))
		current_loss += loss.item()
		loss.backward()
		optimizer.step()
	acc = correct/total
	avg_loss = current_loss/len(train_loader)
	# Print accuracy

	return acc, avg_loss


def evaluation(model, val_loader, optimizer):
	model.eval()
	loss = 0
	correct = 0
	total = 0
	for (input_batch, expected_out) in tqdm(val_loader, leave=False, desc="Validation Batches"):
		output = model(input_batch.to(get_device()))
		total += output.size()[0]
		_, predicted = torch.max(output, 1)
		correct += (expected_out.to("cpu") == predicted.to("cpu")).cpu().numpy().sum()

		loss += model.compute_Loss(output, expected_out.to(get_device()))
	loss /= len(val_loader)
	# Print validation metrics
	acc = correct/total
	return acc, loss

def train_and_evaluate(number_of_epochs, model, train_loader, val_loader, optimizer):
	for epoch in trange(number_of_epochs, desc="Epochs"):
		train_acc, train_loss = train_epoch(model, train_loader, optimizer)  #passing train_loader as 2nd arg
		val_acc, val_loss = evaluation(model, val_loader, optimizer)
		print('epoch {0} train loss {1:.2f} train accuracy {2:.2%}'.format(epoch, train_loss, train_acc))
		print('epoch {0} val loss {1:.2f} val accuracy {2:.2%}'.format(epoch, val_loss, val_acc))
	return

In [None]:
h = 512
model = FFNN(len(vocab), h, len(emotion_to_idx)).to(get_device())
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
train_and_evaluate(2, model, train_loader, val_loader, optimizer)
model.save_model("ffnn_fixed.pth") # Save our model!

HBox(children=(FloatProgress(value=0.0, description='Epochs', max=2.0, style=ProgressStyle(description_width='…

HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 0 train loss 1.50 train accuracy 34.31%
epoch 0 val loss 1.25 val accuracy 48.38%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 1 train loss 0.82 train accuracy 72.25%
epoch 1 val loss 0.64 val accuracy 80.08%



In [None]:
# Example of how to load
loaded_model = FFNN(len(vocab), h, len(emotion_to_idx))
loaded_model.load_model("ffnn_fixed.pth")

## 1.2 Part 1 Report
Please include a description of the error, a description of your fix, and a python comment indicating the fix for each of the 4 errors.

### Error 1:
The number of out_features in the second linear layer is h instead of output_dim. We fix this by changing the second argument to output_dim.

### Error 2:
In the forward pass, the activation isn't applied to the output of the first linear layer. We fixed this by applying the activation to the output of w1

### Error 3:
The train_and_evaluate function calls train_epoch passing val_loader as argument instead of train_loader. We fix this by passing train_loader as the second argument.

### Error 4:
In the train_epoch function, the gradients are accumulated across batches because zero_grad isn't called. We fix this by calling zero_grad method before the backward pass.

# Part 2: Recurrent Neural Network
Recurrent neural networks have been the workhorse of NLP for a number of years. A fundamental reason for this success is they can inherently deal with _variable_ length sequences. This is axiomatically important for natural language; words are formed from a variable number of characters, sentences from a variable number of words, paragraphs from a variable number of sentences, and so forth. This differs from a field like Computer Vision where images are (generally) of a fixed size.
<br></br>
This is also very different scenario than that of the classifiers we have studied (e.g.Naive Bayes, Perceptron Learning, Feedforward Neural Networks), which take in a
fixed-length vector.
<br></br>
To clarify this, we can think of the _types_ of the mathematical functions described by a FFNN and an RNN. What is pivotal in what follows is that k need not be constant
across examples.

$\textbf{FFNN.}$ \
$Input: \vec{x} \in \mathcal{R}^d$ \
$Model\text{ }Output: \vec{z} \in \mathcal{R}^{\mid \mathcal{Y}\mid}$
$Final\text{ }Output: \vec{y} \in \mathcal{R}^{\mid \mathcal{Y}\mid}$ \
$\vec{y}$ satisfies the contraint of being a probability distribution, ie $\underset{i \in \mid \mathcal{Y} \mid}{\sum} \vec{y}[i] = 1$ and $\underset{i \in \mid \mathcal{y} \mid}{min} \text{ }\vec{y}[i] \leq 1$, which is achieved via _Softmax_ applied to $\vec{z}$.
<br></br>
$\textbf{RNN.}$ \
$Input: \vec{x}_1,\vec{x}_2, \dots, \vec{x}_k; \vec{x}_i \in \mathcal{R}^d$ \
$Model\text{ }Output: \vec{z}_1,\vec{z}_2, \dots, \vec{z}_k; \vec{z}_i \in \mathcal{R}^{h}$
$Final\text{ }Output: \vec{y} \in \mathcal{R}^{\mid \mathcal{Y}\mid}$ \
$\vec{y}$ satisfies the contraint of being a probability distribution, ie $\underset{i \in \mid \mathcal{Y} \mid}{\sum} \vec{y}[i] = 1$ and $\underset{i \in \mid \mathcal{y} \mid}{min} \text{ }\vec{y}[i] \geq 0$, which is achieved by the process described later in this report and as you have seen in class

Intuitively, an RNN takes in a sequence of vectors and computes a new vector corresponding to each vector in the original sequence. It achieves this by processing the input sequence one vector at a time to (a) compute an updated representation of the entire sequence (which is then re-used when processing the next vector in the input sequence), and (b) produce an output for the current position. The vector computed in (a) therefore not only contains information about the current input vector but also about the previous input vectors. Hence, $\vec{z}_j$ is computed after having observed $\vec{x}_1, \dots, \vec{x}_j$. As such, a simple observation is we can treat the last vector computed by the RNN, ie $\vec{z}_k$ as a representation of the entire sequence. Accordingly, we can use this as the input to a single-layer linear classifier to compute a yector $\vec{y}$ as we will need for classification.

$$\vec{y} = Softmax(W\vec{z}_k); W\in \mathcal{R}^{\mid \mathcal{Y}\mid \times h}$$

## Part 2: Rules
**Part 2** requires implementing a rudimentary RNN in PyTorch for text classification. Countless blog posts, internet tutorials and other implementations available publicly (and privately) do precisely this. In fact, almost every student in [Cornell NLP](https://nlp.cornell.edu/people/) likely has some code for doing this on their Github. You **cannot** use any such code (though you may use anything you find in course notes or course texts) irrespective of whether you cite it or do not.

Submissions will be passed through the MOSS system, which is a sophisticated system for detecting plagiarism in code and is robust in the sense that it tries to find alignments in the underlying semantics of the code and not just the surface level syntax. Similarly, the course staff are also quite astute with respect to programming neural models for NLP and we will strenuously look at your code. We flagged multiple groups for this last year, so we strongly suggest you resist any such temptation (if the Academic Integrity policy alone is insufficient at dissuading you).

## 2.1 RNN Implementation

Similar to **Part 1**, we have the previous `Data loader` section and the new `RNN` component. We don't envision that it will be useful to modify the `Data loader`. We have included some stubs to help give you a place to start for the RNN.

Additionally, we remind you that Part 1 furnishes a near-functional implementation of a similar neural model for the same task. If you successfully do Part 1 correctly, it will be wholely functional. Using it as a template for Part 2 is both prudent and suggested.

In [None]:
!python -m spacy download en_core_web_md

In [None]:
import en_core_web_md
nlp = en_core_web_md.load()

In [None]:
!ls -a

.  ..  .config	drive  ffnn_fixed.pth  sample_data


In [None]:
vectorizer = torchtext.vocab.GloVe(name='twitter.27B', dim=100)

.vector_cache/glove.twitter.27B.zip: 1.52GB [12:24, 2.04MB/s]                           
100%|█████████▉| 1191587/1193514 [00:47<00:00, 25637.30it/s]

In [None]:
def rnn_preprocessing(data, test=False):
    """rnn_preprocessing

    :param data: Dataset for which to generate embedding
    :type data: Union[
		  List[Tuple[List[str], int]],
		  List[str],
	  ]
    :param test: Whether this is test data
    :type test: Bool
    """
    if test:
      vectorized_data = []
      for token_list in data:
        vector_list = [vectorizer[t] for t in token_list]
        vectorized_data.append(torch.stack(vector_list))
    else:
      vectorized_data = []
      for token_list, lbl in data:
        vectors = torch.stack([vectorizer[t] for t in token_list])
        vectorized_data.append((vectors, lbl))
    return vectorized_data

# def rnn_preprocessing(data, test=False):
#     """rnn_preprocessing

#     :param data: Dataset for which to generate embedding
#     :type data: Union[
# 		  List[Tuple[List[str], int]],
# 		  List[str],
# 	  ]
#     :param test: Whether this is test data
#     :type test: Bool
#     """
#     # Do some preprocessing similar to convert_to_vector_representation
#     # For the RNN, remember that instead of a single vector per training
#     # example, you will have a sequence of vectors where each vector
#     # represents some information about a specific token.
#     if test:
#       vectorized_data = []
#       for word_list in data:
#         sentence = " ".join(word_list)
#         doc = nlp(sentence)
#         # assert len(doc) == len(word_list)
#         seq = []
#         for token in doc:
#           seq.append(torch.FloatTensor(token.vector))
#         vectorized_data.append(seq)
#     else:
#       vectorized_data = []
#       for word_list, lbl in data:
#         sentence = " ".join(word_list)
#         doc = nlp(sentence)
#         # assert len(doc) == len(word_list)
#         seq = []
#         for token in doc:
#           seq.append(torch.FloatTensor(token.vector))
#           vectorized_data.append((seq, lbl))
#     return vectorized_data

In [None]:
class TweetDataset(Dataset):
  def __init__(self, data):
      X, y = zip(*data)
      self.X = list(X)
      self.y = list(y)
      self.len = len(data)
    
  def __len__(self):
    return self.len
    
  def __getitem__(self, index):
    return self.X[index], self.y[index]

def pad_batch(batch):
  batch = sorted(batch, key= lambda t : len(t[0]), reverse=True)
  X, y = zip(*batch)
  seq_lens = list(map(len, X))
  y = torch.LongTensor(y)
  padded = pad_sequence(X, batch_first=True)
  packed = pack_padded_sequence(padded, seq_lens, batch_first=True)
  return packed, y

def paddedDataLoader(data, batch_size):
  ds = TweetDataset(data)
  loader = DataLoader(ds, batch_size=batch_size, shuffle=True, collate_fn=pad_batch)
  return loader


In [None]:
class RNN(nn.Module):
	def __init__(self, in_features, h, out_features, layers=1, bi=False): # Add relevant parameters
		super(RNN, self).__init__()
		# Fill in relevant parameters
		self.rnn = nn.RNN(input_size=in_features, hidden_size=h, num_layers=layers, batch_first=True, bidirectional=bi)
		self.activation = nn.ReLU()
		self.U = nn.Linear(h, out_features)
		# Ensure parameters are initialized to small values, see PyTorch documentation for guidance
		self.softmax = nn.LogSoftmax(dim=1)
		self.loss = nn.NLLLoss()

	def compute_Loss(self, predicted_vector, gold_label):
		return self.loss(predicted_vector, gold_label)	

	def forward(self, inputs):
		# begin code
		packed_output, h_n = self.rnn(inputs)
		unpacked_output, seq_lens = pad_packed_sequence(packed_output, batch_first=True)
		inds = seq_lens-1
		z_k = unpacked_output[range(unpacked_output.size(0)), inds]
		x = self.U(z_k)
		predicted_vector = self.softmax(x) # remember to include the predicted unnormalized scores which should be normalized into a (log) probability distribution
		# end code
		return predicted_vector

	def load_model(self, save_path):
		self.load_state_dict(torch.load(save_path))
	
	def save_model(self, save_path):
		torch.save(self.state_dict(), save_path)

In [None]:
train_seq_vectors = rnn_preprocessing(train)
val_seq_vectors = rnn_preprocessing(val)
test_seq_vectors = rnn_preprocessing(test, True)

In [None]:
rnn_train_loader = paddedDataLoader(train_seq_vectors, batch_size=64)
rnn_val_loader = paddedDataLoader(val_seq_vectors, batch_size=64)

In [None]:
h = 512
rnn_model = RNN(vectorizer.dim, h, len(emotion_to_idx)).to(get_device())
optimizer = optim.SGD(rnn_model.parameters(), lr=0.0004, momentum=0.9)
train_and_evaluate(75, rnn_model, rnn_train_loader, rnn_val_loader, optimizer)
model.save_model("rnn_baseline.pth") # Save our model!

HBox(children=(FloatProgress(value=0.0, description='Epochs', max=75.0, style=ProgressStyle(description_width=…

HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 0 train loss 1.69 train accuracy 23.32%
epoch 0 val loss 1.62 val accuracy 27.35%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 1 train loss 1.60 train accuracy 27.56%
epoch 1 val loss 1.59 val accuracy 28.14%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 2 train loss 1.58 train accuracy 30.28%
epoch 2 val loss 1.57 val accuracy 30.91%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 3 train loss 1.57 train accuracy 31.65%
epoch 3 val loss 1.57 val accuracy 31.15%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 4 train loss 1.56 train accuracy 31.37%
epoch 4 val loss 1.56 val accuracy 29.33%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 5 train loss 1.55 train accuracy 32.12%
epoch 5 val loss 1.55 val accuracy 32.41%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 6 train loss 1.54 train accuracy 32.88%
epoch 6 val loss 1.54 val accuracy 32.81%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 7 train loss 1.54 train accuracy 33.59%
epoch 7 val loss 1.54 val accuracy 33.20%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 8 train loss 1.53 train accuracy 34.24%
epoch 8 val loss 1.53 val accuracy 33.99%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 9 train loss 1.52 train accuracy 34.91%
epoch 9 val loss 1.52 val accuracy 34.47%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 10 train loss 1.51 train accuracy 35.70%
epoch 10 val loss 1.51 val accuracy 34.23%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 11 train loss 1.50 train accuracy 36.40%
epoch 11 val loss 1.49 val accuracy 35.10%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 12 train loss 1.47 train accuracy 37.10%
epoch 12 val loss 1.47 val accuracy 36.92%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 13 train loss 1.43 train accuracy 39.09%
epoch 13 val loss 1.42 val accuracy 40.95%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 14 train loss 1.40 train accuracy 40.90%
epoch 14 val loss 1.43 val accuracy 41.26%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 15 train loss 1.39 train accuracy 41.53%
epoch 15 val loss 1.36 val accuracy 43.79%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 16 train loss 1.36 train accuracy 42.39%
epoch 16 val loss 1.34 val accuracy 43.32%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 17 train loss 1.36 train accuracy 42.74%
epoch 17 val loss 1.33 val accuracy 44.82%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 18 train loss 1.38 train accuracy 42.02%
epoch 18 val loss 1.33 val accuracy 44.58%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 19 train loss 1.34 train accuracy 43.73%
epoch 19 val loss 1.32 val accuracy 45.06%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 20 train loss 1.35 train accuracy 43.14%
epoch 20 val loss 1.31 val accuracy 44.82%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 21 train loss 1.33 train accuracy 43.53%
epoch 21 val loss 1.32 val accuracy 44.19%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 22 train loss 1.32 train accuracy 44.85%
epoch 22 val loss 1.30 val accuracy 44.11%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 23 train loss 1.32 train accuracy 44.77%
epoch 23 val loss 1.29 val accuracy 45.85%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 24 train loss 1.32 train accuracy 44.88%
epoch 24 val loss 1.31 val accuracy 44.74%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 25 train loss 1.31 train accuracy 45.67%
epoch 25 val loss 1.30 val accuracy 43.08%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 26 train loss 1.29 train accuracy 46.40%
epoch 26 val loss 1.34 val accuracy 42.37%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 27 train loss 1.30 train accuracy 46.49%
epoch 27 val loss 1.27 val accuracy 45.93%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 28 train loss 1.28 train accuracy 47.63%
epoch 28 val loss 1.29 val accuracy 46.96%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 29 train loss 1.28 train accuracy 47.95%
epoch 29 val loss 1.24 val accuracy 48.77%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 30 train loss 1.26 train accuracy 48.39%
epoch 30 val loss 1.23 val accuracy 49.64%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 31 train loss 1.24 train accuracy 49.83%
epoch 31 val loss 1.24 val accuracy 48.38%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 32 train loss 1.23 train accuracy 50.43%
epoch 32 val loss 1.19 val accuracy 52.09%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 33 train loss 1.22 train accuracy 51.49%
epoch 33 val loss 1.18 val accuracy 52.89%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 34 train loss 1.21 train accuracy 51.66%
epoch 34 val loss 1.24 val accuracy 50.83%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 35 train loss 1.22 train accuracy 51.08%
epoch 35 val loss 1.19 val accuracy 52.02%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 36 train loss 1.20 train accuracy 52.16%
epoch 36 val loss 1.21 val accuracy 53.04%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 37 train loss 1.20 train accuracy 52.14%
epoch 37 val loss 1.17 val accuracy 52.81%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 38 train loss 1.19 train accuracy 52.61%
epoch 38 val loss 1.20 val accuracy 51.86%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 39 train loss 1.18 train accuracy 53.17%
epoch 39 val loss 1.16 val accuracy 53.28%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 40 train loss 1.18 train accuracy 53.37%
epoch 40 val loss 1.25 val accuracy 49.72%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 41 train loss 1.19 train accuracy 52.89%
epoch 41 val loss 1.17 val accuracy 53.52%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 42 train loss 1.17 train accuracy 54.47%
epoch 42 val loss 1.15 val accuracy 54.39%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 43 train loss 1.17 train accuracy 53.64%
epoch 43 val loss 1.18 val accuracy 52.33%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 44 train loss 1.19 train accuracy 52.74%
epoch 44 val loss 1.16 val accuracy 53.91%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 45 train loss 1.15 train accuracy 54.67%
epoch 45 val loss 1.14 val accuracy 54.70%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 46 train loss 1.15 train accuracy 54.20%
epoch 46 val loss 1.19 val accuracy 51.70%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 47 train loss 1.13 train accuracy 56.11%
epoch 47 val loss 1.16 val accuracy 53.99%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 48 train loss 1.14 train accuracy 55.42%
epoch 48 val loss 1.20 val accuracy 53.91%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 49 train loss 1.13 train accuracy 55.59%
epoch 49 val loss 1.14 val accuracy 55.73%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 50 train loss 1.11 train accuracy 56.81%
epoch 50 val loss 1.11 val accuracy 56.44%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 51 train loss 1.11 train accuracy 56.54%
epoch 51 val loss 1.11 val accuracy 55.81%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 52 train loss 1.09 train accuracy 57.17%
epoch 52 val loss 1.12 val accuracy 56.60%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 53 train loss 1.11 train accuracy 56.85%
epoch 53 val loss 1.13 val accuracy 54.70%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 54 train loss 1.10 train accuracy 57.22%
epoch 54 val loss 1.08 val accuracy 56.21%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 55 train loss 1.09 train accuracy 56.79%
epoch 55 val loss 1.09 val accuracy 58.34%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 56 train loss 1.07 train accuracy 58.78%
epoch 56 val loss 1.05 val accuracy 58.89%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 57 train loss 1.08 train accuracy 57.90%
epoch 57 val loss 1.10 val accuracy 54.86%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 58 train loss 1.08 train accuracy 58.83%
epoch 58 val loss 1.08 val accuracy 57.94%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 59 train loss 1.06 train accuracy 58.66%
epoch 59 val loss 1.14 val accuracy 55.34%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 60 train loss 1.05 train accuracy 59.50%
epoch 60 val loss 1.06 val accuracy 59.05%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 61 train loss 1.06 train accuracy 59.18%
epoch 61 val loss 1.10 val accuracy 56.36%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 62 train loss 1.06 train accuracy 58.69%
epoch 62 val loss 1.24 val accuracy 49.64%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 63 train loss 1.05 train accuracy 59.79%
epoch 63 val loss 1.10 val accuracy 57.08%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 64 train loss 1.02 train accuracy 60.90%
epoch 64 val loss 1.12 val accuracy 54.86%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 65 train loss 1.06 train accuracy 59.26%
epoch 65 val loss 1.06 val accuracy 59.13%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 66 train loss 1.02 train accuracy 61.14%
epoch 66 val loss 1.01 val accuracy 60.63%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 67 train loss 1.01 train accuracy 62.10%
epoch 67 val loss 1.03 val accuracy 59.60%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 68 train loss 1.03 train accuracy 61.09%
epoch 68 val loss 1.05 val accuracy 60.87%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 69 train loss 1.01 train accuracy 61.68%
epoch 69 val loss 1.05 val accuracy 59.60%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 70 train loss 1.01 train accuracy 62.17%
epoch 70 val loss 1.02 val accuracy 61.66%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 71 train loss 0.99 train accuracy 62.46%
epoch 71 val loss 1.05 val accuracy 59.05%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 72 train loss 1.00 train accuracy 62.15%
epoch 72 val loss 1.03 val accuracy 61.19%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 73 train loss 0.99 train accuracy 62.86%
epoch 73 val loss 0.99 val accuracy 62.06%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=157.0, style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=20.0, style=ProgressStyle(descri…

epoch 74 train loss 0.99 train accuracy 62.42%
epoch 74 val loss 0.99 val accuracy 61.98%



## 2.2 Part 2 Report
For Part 2, your report should have a description of each major step of implementing the RNN accompanied by the associated code-snippet. Each step should have an explanation for why you decided to do something (when one could reasonably do the same step in a different way); your justification will not be based on empirical results in this section but should relate to something we said in class, something mentioned in any of the course texts, or some other source (i.e. literature in NLP or official PyTorch documentation). **Unjustified, vague, and/or under-substantiated explanations will not receive credit.**

Things to include:

1. _Representation_ \
Each $\vec{x}_i$ needs to be produced in some way and should correspond to word $i$ in the text. This is different from the text classification approaches we have studied previously (BoW for example) where the entire document is represented with a single vector. Where and how is this being done for the RNN?

2. _Initialization_ \
There will be weights that you update in training the RNN. Where and how are these initialized?

3. _Training_ \
You are given the entire training set of N examples. How do you make use of this training set? How does the model modify its weights in training (this likely entails somewhere where gradients are computed and somehwere else where these gradients are used to update the model)?

4. _Model_ \
This is the core model code, ie. where and how you apply the RNN to the $\vec{x}_i$

5. _Linear Classifier_ \
Given the outputs of the RNN, how do you consume these to actually compute $\vec{y}$?

6. _Stopping_ \
How does your training procedure terminate?

7. _Hyperparameters_ \
To run your model, you must fix some hyperparameters, such as $h$ (the hidden dimensionality of the $\vec{z}_i$ referenced above). Be sure to exhaustively describe these hyperparameters and why you set them as you did ( this almost certainly will require some brief exploration: we suggest the course text by Yoav Goldberg as well as possibly the PyTorch official documentation). Be sure to accurately cite either source.



### 2.2.1 Representation


We use pretrained torch.text.Glove.twitter embeddings to convert our tokens to vectors as shown in the code block below. From lecture, embeddings represent meaning of words which helps the model learn semantic relationship between words and sentiment similarity.

`vectorizer = torchtext.vocab.GloVe(name='twitter.27B', dim=100)`

```
def rnn_preprocessing(data, test=False):
if test:
  vectorized_data = []
  for token_list in data:
    vector_list = [vectorizer[t] for t in token_list]
    vectorized_data.append(torch.stack(vector_list))
else:
  vectorized_data = []
  for token_list, lbl in data:
    vectors = torch.stack([vectorizer[t] for t in token_list])
    vectorized_data.append((vectors, lbl))
return vectorized_data
```




### 2.2.2 Initialization


We use torch.nn.rnn layer. According to the pyTorch documentation, the weights are initialized from $\mathcal{N}$$(\sqrt{-k},\sqrt{k})$ where ${k} = \frac{1}{hidden\_size}$
​	
 The linear layer has weights and bias initialized from $\mathcal{N}$$(\sqrt{-k},\sqrt{k})$ where ${k} = \frac{1}{in\_size}$ . In this case since output of the rnn layer is the input to the linear layer,  ${hidden\_size == in\_size}$. We use these default initializations since they have been shown to help prevent vanishing/exploding gradients problem. 

 ```
self.rnn = nn.RNN(input_size=in_features, hidden_size=h, num_layers=layers, batch_first=True, bidirectional=bi)
self.activation = nn.ReLU()
self.U = nn.Linear(h, out_features)
 ```

### 2.2.3 Training


The original training set is split into a train and validation set. We preprocess the training (and validation) set into sequences of vectors each having 100 features as shown in part 2.2.1. The sequences are padded then packed for easy batching.

We then create a Dataset class and a Dataloader for conveniently loading the data into the model. We reuse the `train_and_evaluate` function from part 1.1 for training. For every batch of examples, the model is called. According to pyTorch docs, in this forward phase a computational graph is built by the autograd package. We then calculate the loss using the model's output and call `loss.backward` which computes the gradients by applying chain rule on the graph. We then call `optimizer.step` which updates the parameters (weights and biases for the model layers)  based on the computed gradients. We then call `optimizer.zero_grad` so that we don't accumulate gradients between (mini)batches.

```
class TweetDataset(Dataset):
  def __init__(self, data):
      X, y = zip(*data)
      self.X = list(X)
      self.y = list(y)
      self.len = len(data)
    
  def __len__(self):
    return self.len
    
  def __getitem__(self, index):
    return self.X[index], self.y[index]

def pad_batch(batch):
  batch = sorted(batch, key= lambda t : len(t[0]), reverse=True)
  X, y = zip(*batch)
  seq_lens = list(map(len, X))
  y = torch.LongTensor(y)
  padded = pad_sequence(X, batch_first=True)
  packed = pack_padded_sequence(padded, seq_lens, batch_first=True)
  return packed, y

def paddedDataLoader(data, batch_size):
  ds = TweetDataset(data)
  loader = DataLoader(ds, batch_size=batch_size, shuffle=True, collate_fn=pad_batch)
  return loader
```

```
train_seq_vectors = rnn_preprocessing(train)
val_seq_vectors = rnn_preprocessing(val)
test_seq_vectors = rnn_preprocessing(test, True)

rnn_train_loader = paddedDataLoader(train_seq_vectors, batch_size=64)
rnn_val_loader = paddedDataLoader(val_seq_vectors, batch_size=64)

h = 512
rnn_model = RNN(vectorizer.dim, h, len(emotion_to_idx)).to(get_device())
optimizer = optim.SGD(rnn_model.parameters(), lr=0.0004, momentum=0.9)
train_and_evaluate(75, rnn_model, rnn_train_loader, rnn_val_loader, optimizer)
```

```
def train_epoch(model, train_loader, optimizer):
	model.train()
	total = 0
	current_loss = 0
	correct = 0
	for (input_batch, expected_out) in tqdm(train_loader, leave=False, desc="Training Batches"):
		output = model(input_batch.to(get_device()))
		total += output.size()[0]
		_, predicted = torch.max(output, 1)
		correct += (expected_out == predicted.to("cpu")).cpu().numpy().sum()
		optimizer.zero_grad() #zero out the gradients
		loss = model.compute_Loss(output, expected_out.to(get_device()))
		current_loss += loss.item()
		loss.backward()
		optimizer.step()
	acc = correct/total
	avg_loss = current_loss/len(train_loader)
	return acc, avg_loss
```

### 2.2.4 Model


The input to the RNN layer is a packed sequence which is accepted, according to the referene docs. The dataloader does the packing so the input goes directly to the RNN layer. The input has shape (batch_size, max_seq_len, 100). At each timestep ${t}$ in range(seq_len), the input ${x_t}$ gets multiplied by the weight ${W_ih}$ and a bias added. The previous hidden state ${h_t-1}$ gets multiplied by the weight ${W_hh}$ and a bias added. These are then summed up and a tanh non_linearity is applied, as per the pyTorch docs and lecture notes. 

```
def forward(self, inputs):
  packed_output, h_n = self.rnn(inputs)
  unpacked_output, seq_lens = pad_packed_sequence(packed_output, batch_first=True)
  inds = seq_lens-1
  z_k = unpacked_output[range(unpacked_output.size(0)), inds]
  x = self.U(z_k)
  predicted_vector = self.softmax(x)
  return predicted_vector
```

### 2.2.5 Linear Classifier

Because the rnn's input was packed, the output is also packed. We first unpack it and obtain an output of shape (batch_size, max_seq_len, hidden_size). However, for sentiment analysis we want a many-to-one type of rnn, as discussed in lecture. For every element in the batch we extract the last output in the unpadded sequence, ${z_k}$, which is then fed into the linear classifier where it gets multiplied by the weight ${W}$ and a bias ${b}$ added, as described in the pyTorch docs. The output then goes through a (log)softmax layer that converts it into a (log)probability distribution. 

```
def forward(self, inputs):
  packed_output, h_n = self.rnn(inputs)
  unpacked_output, seq_lens = pad_packed_sequence(packed_output, batch_first=True)
  inds = seq_lens-1
  z_k = unpacked_output[range(unpacked_output.size(0)), inds]
  x = self.U(z_k)
  predicted_vector = self.softmax(x)
  return predicted_vector
```

### 2.2.6 Stopping


We continue training the model while the loss decreases with time. However, we run the risk of overfitting if we train too long, as described in lecture. To combat this, we utilize the validation loss and accuracy to decide the number of train epochs. We stop when the validation loss and accuracy (which are correlated for our model) start going up. For our base model, this is at 8 epochs. Thus we perform early stopping, since the train loss woould have contnued going down with more epochs but the model would overfit. 

```
def train_and_evaluate(number_of_epochs, model, train_loader, val_loader, optimizer):
for epoch in trange(number_of_epochs, desc="Epochs"):
  train_acc, train_loss = train_epoch(model, train_loader, optimizer)  #passing train_loader as 2nd arg
  val_acc, val_loss = evaluation(model, val_loader, optimizer)
  print('epoch {0} train loss {1:.2f} train accuracy {2:.2%}'.format(epoch, train_loss, train_acc))
  print('epoch {0} val loss {1:.2f} val accuracy {2:.2%}'.format(epoch, val_loss, val_acc))
return
```

### 2.2.7 Hyperparameters


The first parameter is the embedding dimension, which is the feature size. Glove provides a choice of 25, 50, 100, 200. Experiments in the glove paper showed accuracy increasing with vector size up to 300. We tried 50, 100 and 200. 50 dimensions gave lower accuracy while 100 and 200 gave similar accuracy. We chose 100 because it makes the computation cheaper by reducing number of parameters. 


The next parameter is the learning rate. This controls the step made in a single optimization step, as described in pyTorch Docs. We started with lr=0.001 which is a recommended start value. With this learning rate, the loss and accuracy became unstable with just a few epochs, a sign that the optimization was taking too large steps so we reduced the learning rate to 0.0004 and increased the number of training epochs to 75 because training is slowed down.

The next parameter was the hidden_size. As mentioned in the course text, the hidden output is a compressed representaton of the entire sequence. So we'd expect this representation to become more expressive as we increase the hidden size. We observed the accuracy increasing from 30% to 40% as as hidden_size was increased from 16 to 32. Increasing the hidden_size further to 64 increased accuracy to 43% and 44% with 128. Further increase to 256 increased accuracy to 52%. Hidden_size set to 512 yielded 53% accuracy. Further increase didn't yield much improvement in accuracy.


The next parameter is the (mini)batch_size which is the number of samples passed through the model at a time followed by an update step. According to the course text, minibatching combines is a good common ground between stochastic gradient descent and batch gradient descent. Increasing the batch size makes the parameter updates more stable and takes better advantage of parallelism. Common settings include 16, 32, 64. We compared the three choices and settled on 64 because it enables faster training and improved accuracy to 60%

We inherited the momentum parameter set to 0.9 from part 1. We noticed that removing it greatly slowed down the training so we left it as is. 

```
rnn_train_loader = paddedDataLoader(train_seq_vectors, batch_size=64)
rnn_val_loader = paddedDataLoader(val_seq_vectors, batch_size=64)

h = 512
rnn_model = RNN(vectorizer.dim, h, len(emotion_to_idx)).to(get_device())
optimizer = optim.SGD(rnn_model.parameters(), lr=0.0004, momentum=0.9)
train_and_evaluate(75, rnn_model, rnn_train_loader, rnn_val_loader, optimizer)
model.save_model("rnn_baseline.pth")
```

# Part 3: Analysis
From **Part 1** and **Part 2**, you will have two different models in hand for performing the same emotion detection task. In **Part 3**, you will conduct a comprehensive analysis of these models, focusing on two comparative settings.

## Part 3 Note
You will be required to submit the code used in finding these results on CMSX. This code should be legible and we will consult it if we find issues in the results. It is worth noting that in **Part 1** and **Part 2**, we primarily are considering the correctness of the code-snippets in the report. If your model is flawed in a way that isn’t exposed by those snippets, this will likely surface in your results for **Part 3**. We will deduct points for correctness in this section to reflect this and we will try to localize where the error is (or think it is, if it is opaque from your code). That said, we will be lenient about absolute performance (within reason) in this section.

## 3.1: Across-Model Comparison
In this section, you will report results detailing the comparison of the two models. Specifically, we will consider the issue of _fair comparison_<sup>5</sup>, which is a fundamental notion in NLP and ML research and practice. In particular, given model $A$, it is likely the case we can make a model $B$ that is computationally more complex and, hence, more costly and achieves superior performance. However, this makes for an unfair comparison. For our purposes, we want to study how the FFNN and RNN compare when we try to control for hyperparameters and other configurable values being of similar computational cost<sup>6</sup>. That said, it is impossible to have identical configurations as these are different models, i.e. the RNN simply has hyperparameters for which there are no analogues in the FFNN.


In the report you will need to begin by describing 3 pairs of configurations, with each pair being comprised of a FFNN configuration and a RNN configuration that constitute a _fair comparison_. You will need to argue for why the two parts of each pair are a fair comparison. Across the pairs, you should try different types of configurations (e.g. trying to resolve like questions of the form: _Does the FFNN perform better or worse when the hidden dimensionality is small as opposed to when it is large?_) and justify what you are trying to study by having the results across the pairs.


Next, you will report the quantitative accuracy of the 6 resulting models. You will
analyze these results and then move on to a more descriptive analysis.

The descriptive analysis can take one of two forms<sup>7</sup>:

1. _Nuanced quantitative analysis_ \
If you choose this option, you will need to further break down the quantitative statistics you reported initially. We provide some initial strategies to prime you for what you should think about in doing this: one possible starting point is to consider: if model $X$ achieves greater accuracy than model $Y$, to what extent is $X$ getting everything correct that $Y$ gets correct? Alternatively, how is model performance affected if you measure performance on a specific strata/subset of the reviews?

2. _Nuanced qualitative analysis_ \
If you choose this option, you will need to select individual examples and try to explain or reason about why one model may be getting them right whereas the other isn’t. Are there any examples that all 6 models get right or wrong and, if so, can you hypothesize a reason why this occurs?

In [None]:
#@markdown ⠀
display(HTML('''<hr><p style="font-family:verdana; font-size:90%;">
5. This term takes on different meanings in different settings. Here we simply mean that we are trying to
compare different models while controlling for similar “complexity”/computational cost. <br></br>

6. We have not taught you how to do this rigorously and the theory for doing this is still underdeveloped. We only expect a reasonable attempt. <br></br>

7. This is the minimal requirement, if you provide other, more elaborate, analyses, we certainly welcome this.
</p>'''))

### 3.1.1 Configuration 1
Modify the code below for this configuration.

In [None]:
#hidden_size
h = 32
ffnn_config_1 = FFNN(len(vocab), h, len(emotion_to_idx)).to(get_device())
rnn_config_1 = RNN(vectorizer.dim, h, len(emotion_to_idx)).to(get_device())

In [None]:
train_loader, val_loader = get_data_loaders(train_vectorized, val_vectorized, batch_size=1)
rnn_train_loader = paddedDataLoader(train_seq_vectors, batch_size=1)
rnn_val_loader = paddedDataLoader(val_seq_vectors, batch_size=1)

X dim  torch.Size([11265, 11832])
y dim  torch.Size([11265])


In [None]:
ffnn_optimizer = optim.SGD(ffnn_config_1.parameters(), lr=0.001, momentum=0.9)
rnn_optimizer = optim.SGD(rnn_config_1.parameters(), lr=0.001, momentum=0.9)
train_and_evaluate(10, ffnn_config_1, train_loader, val_loader, ffnn_optimizer)
train_and_evaluate(10, rnn_config_1, rnn_train_loader, rnn_val_loader, rnn_optimizer)

HBox(children=(FloatProgress(value=0.0, description='Epochs', max=10.0, style=ProgressStyle(description_width=…

HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 0 train loss 1.52 train accuracy 33.44%
epoch 0 val loss 1.35 val accuracy 39.76%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 1 train loss 0.87 train accuracy 70.85%
epoch 1 val loss 0.70 val accuracy 76.60%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 2 train loss 0.42 train accuracy 86.82%
epoch 2 val loss 0.47 val accuracy 84.82%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 3 train loss 0.28 train accuracy 90.81%
epoch 3 val loss 0.52 val accuracy 84.03%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 4 train loss 0.18 train accuracy 94.00%
epoch 4 val loss 0.41 val accuracy 87.83%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 5 train loss 0.15 train accuracy 94.91%
epoch 5 val loss 0.41 val accuracy 88.06%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 6 train loss 0.11 train accuracy 96.40%
epoch 6 val loss 0.37 val accuracy 88.46%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 7 train loss 0.07 train accuracy 97.88%
epoch 7 val loss 0.38 val accuracy 88.38%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 8 train loss 0.06 train accuracy 98.06%
epoch 8 val loss 0.40 val accuracy 88.93%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 9 train loss 0.05 train accuracy 98.53%
epoch 9 val loss 0.45 val accuracy 88.22%



HBox(children=(FloatProgress(value=0.0, description='Epochs', max=10.0, style=ProgressStyle(description_width=…

HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 0 train loss 1.54 train accuracy 31.50%
epoch 0 val loss 1.49 val accuracy 34.70%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 1 train loss 1.47 train accuracy 36.22%
epoch 1 val loss 1.47 val accuracy 35.10%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 2 train loss 1.45 train accuracy 36.70%
epoch 2 val loss 1.37 val accuracy 42.53%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 3 train loss 1.46 train accuracy 36.36%
epoch 3 val loss 1.41 val accuracy 41.50%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 4 train loss 1.47 train accuracy 36.14%
epoch 4 val loss 1.45 val accuracy 37.15%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 5 train loss 1.48 train accuracy 35.52%
epoch 5 val loss 1.48 val accuracy 35.73%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 6 train loss 1.49 train accuracy 35.18%
epoch 6 val loss 1.53 val accuracy 33.91%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 7 train loss 1.50 train accuracy 33.71%
epoch 7 val loss 1.46 val accuracy 38.66%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 8 train loss 1.46 train accuracy 35.67%
epoch 8 val loss 1.42 val accuracy 39.76%


HBox(children=(FloatProgress(value=0.0, description='Training Batches', max=10000.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Validation Batches', max=1265.0, style=ProgressStyle(desc…

epoch 9 train loss 1.52 train accuracy 32.93%
epoch 9 val loss 1.47 val accuracy 37.08%



### 3.1.1 Report
Describe configurations, report the results, and then perform a nuanced analysis

In this pair of configuration we use a small hidden size, 32. We use a batch size of one with a learning rate of 0.001 and train for 10 epochs. Since these hyperparameters are the same for both models, we think that this makes a fair comparison. The only difference is the input dimension. However, there is no way to use the same input for both models since the RNN takes in a variable-length sequence as input. 

### 3.1.2 Configuration 2
Modify the code below for this configuration.

In [None]:
#batch_size
h = 512
ffnn_config_2 = FFNN(len(vocab), h, len(emotion_to_idx)).to(get_device())
rnn_config_2 = RNN(vectorizer.dim, h, len(emotion_to_idx)).to(get_device())

In [None]:
train_loader, val_loader = get_data_loaders(train_vectorized, val_vectorized, batch_size=32)
rnn_train_loader = paddedDataLoader(train_seq_vectors, batch_size=32)
rnn_val_loader = paddedDataLoader(val_seq_vectors, batch_size=32)

In [None]:
ffnn_optimizer = optim.SGD(ffnn_config_1.parameters(), lr=0.001, momentum=0.9)
rnn_optimizer = optim.SGD(rnn_config_1.parameters(), lr=0.001, momentum=0.9)
train_and_evaluate(10, ffnn_config_1, train_loader, val_loader, ffnn_optimizer)
train_and_evaluate(10, rnn_config_1, rnn_train_loader, rnn_val_loader, rnn_optimizer)

### 3.1.2 Report
Describe configurations, report the results, and then perform a nuanced analysis

In this pair of configuration we use a larger batch size, 32. We set the hidden_size to 512 and use a learning rate of 0.001 and train for 10 epochs. Since these hyperparameters are the same for both models, we think that this makes a fair comparison. The only difference is the input dimension. However, there is no way to use the same input for both models since the RNN takes in a variable-length sequence as input. 

### 3.1.3 Configuration 3
Modify the code below for this configuration.

In [None]:
#learning rate
h = 512
ffnn_config_3 = FFNN(len(vocab), h, len(emotion_to_idx)).to(get_device())
rnn_config_3 = RNN(vectorizer.dim, h, len(emotion_to_idx)).to(get_device())

In [None]:
train_loader, val_loader = get_data_loaders(train_vectorized, val_vectorized, batch_size=32)
rnn_train_loader = paddedDataLoader(train_seq_vectors, batch_size=32)
rnn_val_loader = paddedDataLoader(val_seq_vectors, batch_size=32)

In [None]:
ffnn_optimizer = optim.SGD(ffnn_config_1.parameters(), lr=0.0001, momentum=0.9)
rnn_optimizer = optim.SGD(rnn_config_1.parameters(), lr=0.0001, momentum=0.9)
train_and_evaluate(50, ffnn_config_1, train_loader, val_loader, ffnn_optimizer)
train_and_evaluate(50, rnn_config_1, rnn_train_loader, rnn_val_loader, rnn_optimizer)

### 3.1.3 Report
Describe configurations, report the results, and then perform a nuanced analysis

In this pair of configuration we use a lower learning rate 0.0001 and train for 50 epochs. We set the hidden_size to 512 and use a batch_size of 32. Since these hyperparameters are the same for both models, we think that this makes a fair comparison. The only difference is the input dimension. However, there is no way to use the same input for both models since the RNN takes in a variable-length sequence as input. 

## Part 3.2: Within-model comparison
To complement **Part 3.1: Across-Model Comparison**, in **Part 3.2: Within-Model Comparison**, you will need to study what happens when you change parameters within a model. To limit your workload, you need only do this for the RNN; and you may use at most one RNN model from the prior section.

In the prior section, we discussed _fair comparison_. Anothr aspect of rigorous experimentation in NLP (and other domains) is the _ablation study_. In this, we _ablate_ or remove aspects of a more complex model, making it less complex, to evaluate whether each aspect was neccessary. To be concrete, for this part, you should train 4 variants of the RNN model and describe them as we do below:

1. Baseline model
2. Baseline model made more complex by modification $A$ (e.g. changing the hidden dimensionality from $h$ to $2h$).
3. Baseline model made more complex by modification $B$ (where $B$ is an entirely distinct/different update from $A$).
4. Baseline model with both modificatons $A$ and $B$ applied.

Under the framing of an ablation study, you woud describe this as beginning with model 4 and then ablating (i.e. removing) each of the two modifications, in turn; and then removing both to see if they were genuinely neccessary for the performance you observe.

Once you describe each of the four models, report the quantitative accuracy as in the previous section. Conclude by performing the **opposite** nuanced analysis from the one you did in the previous section (i.e. if in **Part 3.1: Across-Model Comparison** you did _Nuanced quanitative analysis_, for **Part 3.2: Within-Model Comparison** perform a _Nuanced qualitative analysis_ and vice versa).

### 3.2.1 Configuration 1
Modify the code below for this configuration.

In [None]:
baseline_rnn = RNN()

### 3.2.1 Report
Describe variants in the ablation style described, report the results, and then perform a nuanced analysis of the opposite type as before.

### 3.2.2 Configuration 2
Modify the code below for this configuration.

In [None]:
mod_a_rnn = RNN()

### 3.2.2 Report
Describe variants in the ablation style described, report the results, and then perform a nuanced analysis of the opposite type as before.

### 3.2.3 Configuration 3
Modify the code below for this configuration.

In [None]:
mod_b_rnn = RNN()

### 3.2.3 Report
Describe variants in the ablation style described, report the results, and then perform a nuanced analysis of the opposite type as before.

### 3.2.4 Configuration 4
Modify the code below for this configuration.

In [None]:
both_mod_rnn = RNN()

### 3.2.4 Report
Describe variants in the ablation style described, report the results, and then perform a nuanced analysis of the opposite type as before.

# Part 4: Questions
In **Part 4**, you will need to answer the three questions below. We expect answers tobe to-the-point; answers that are vague, meandering, or imprecise **will receive fewer points** than a precise but partially correct answer.

## 4.1 Q1
Earlier in the course, we studied models that make use of _Markov_ assumptions. Recurrent neural networks do not make any such assumption. That said, RNNs are known to struggle with long-distance dependencies. What is a fundamental reason for why this is the case?

## 4.2 Q2
In applying RNNs to tasks in NLP, we have discovered that (at least for tasks in English) feeding a sentence into an RNN backwards (i.e. inputting the sequence of vectors corresponding to ($course$, $great$, $a$, $is$, $NLP$) instead of ($NLP$, $is$, $a$, $great$, $course$)) tends to improve performance. Why might this be the case?

## 4.3 Q3
In using RNNs and word embeddings for NLP tasks, we are no longer required to engineer specific features that are useful for the task; the model discovers them automatically. Stated differently, it seems that neural models tend to discover better features than human researchers can directly specify. This comes at the cost of systems having to consume tremendous amounts of data to learn these kinds of patterns from the data. Beyond concerns of dataset size (and the computational resources required to process and train using this data as well as the further environmental harm that results from this process), why might we disfavor RNN models?

# Part 5: Miscellaneous
List the libraries you used and sources you referenced and cited (labelled with the section in which you referred to them). Include a description of how your group split
up the work. Include brief feedback on this asignment.

**Each section must be clearly labelled, complete, and the corresponding pages should be correctly assigned to the corresponding Gradescope rubric item.** If you follow these steps for each of the 4 components requested, you are guaranteed full credit for this section. Otherwise, you will receive no credit for this section.

# Part 6: Kaggle Submission

In [None]:
# Create Kaggle submission function
kaggle_model = None
rnn_document_preprocessor = lambda x: rnn_preprocessor(x, True) # This is for your RNN
file_name = "submission.csv"
ffnn_document_preprocessor = lambda x: convert_to_vector_representation(x, word2index, True)

In [None]:
def generate_submission(filename, model, document_preprocessor, test):
    test_vectorized = document_preprocessor(test)
    with Path(file_name).open("w") as fp:
        fp.write("Id,Predicted\n")
        for idx, input_vector in tqdm(enumerate(test_vectorized), total=len(test_vectorized)):
            output = model(torch.Tensor(input_vector).unsqueeze(0).to(get_device())).cpu()#.squeeze(0)
            _, pred = torch.max(output, 1)
            fp.write(f"{idx},{int(pred)}\n")
    return

In [None]:
generate_submission(file_name, kaggle_model, ffnn_document_preprocessor, test)

# Live running demo

In [None]:
#@title Emotion Detection
#@markdown Enter a sentence to see the emotion
input_string = "I am so joyful!" #@param {type:"string"}
model_type = "baseline_ffnn" #@param ["baseline_ffnn", "baseline_rnn", "mod_a_rnn", "mod_b_rnn", "both_mods_rnn", "ffnn_config_1", "rnn_config_1", "ffnn_config_2", "rnn_config_2", "ffnn_config_3", "rnn_config_3"]
from IPython.display import HTML

output = ""

# BAD THING TO DO BELOW!!
model_used = globals()[model_type]

with torch.no_grad():
    if "ffnn" in model_type:
        vec_in = ffnn_document_preprocessor([[input_string]])[0]
        model_output = model_used(torch.Tensor(vec_in).unsqueeze(0)).cpu().squeeze(0)
    else:
        # RUN MODEL
        vec_in = rnn_document_preprocessor([[input_string]])[0]
        model_output = model_used(torch.Tensor(vec_in).unsqueeze(0)).cpu().squeeze(0)
    #print(torch.cat([torch.Tensor(z).unsqueeze(0) for z in model_inputs]).unsqueeze(0).shape)
    #model_output = model_used(torch.cat([torch.Tensor(z).unsqueeze(0) for z in model_inputs]).unsqueeze(0))
    #print(model_output.shape)
predicted = torch.argmax(model_output)
# MAP BACK TO EMOTION
# print(int(predicted))
emotion = idx_to_emotion[int(predicted)]

# Generate nice display
output += '<p style="font-family:verdana; font-size:110%;">'
output += " Input sequence: "+input_string+"</p>"
output += '<p style="font-family:verdana; font-size:110%;">'
output += f" Emotion detected: {emotion}</p><hr>"
output = "<h3>Results:</h3>" + output

display(HTML(output))