## Programming Assignment (20 points)

In this assignment, you will solve an irony detection task: given a tweet, your job is to classify whether it is ironic or not.

You will implement a new classifier that does not rely on feature engineering as in previous homeworks. Instead, you will use pretrained word embeddings downloaded from using the `irony.py` script as your input feature vectors. Then, you will encode your sequence of word embeddings with an (already implemented) LSTM and classify based on its final hidden state.


In [1]:
# This is so that you don't have to restart the kernel everytime you edit hmm.py
%load_ext autoreload
%autoreload 2

## Data

We will use the dataset from SemEval-2018: https://github.com/Cyvhee/SemEval2018-Task3

In [2]:
# pip install -r requirements.txt

In [3]:
from irony import load_datasets
from sklearn.model_selection import train_test_split

train_sentences, train_labels, test_sentences, test_labels, label2i = load_datasets()

# TODO: Split train into train/dev
train_sentences, dev_sentences, train_labels, dev_labels =\
train_test_split(train_sentences, train_labels, test_size = 0.2, stratify = train_labels)

## Baseline: Naive Bayes

We have provided the solution for the Naive Bayes part from HW2 in [bayes.py](bayes.py)

There are two implementations: NaiveBayesHW2 is what was expected from HW2. However, we will use a more effecient implementation of it that uses vector operations to calculate the probabilities. Please go through it if you would like to

In [4]:
from irony import run_nb_baseline

run_nb_baseline()

Vectorizing Text: 100%|█████████████████████████████████████████████████████████| 3834/3834 [00:00<00:00, 17086.67it/s]
Vectorizing Text: 100%|█████████████████████████████████████████████████████████| 3834/3834 [00:00<00:00, 25291.45it/s]
Vectorizing Text: 100%|███████████████████████████████████████████████████████████| 784/784 [00:00<00:00, 30158.80it/s]

Baseline: Naive Bayes Classifier
F1-score Ironic: 0.6402966625463535
Avg F1-score: 0.6284487265300938





### Task 1: Implement avg_f1_score() in [util.py](util.py). Then re-run the above cell  (2 Points)

So the micro F1-score for the test set of the Ironic Class using a Naive Bayes Classifier is **0.64**

## Logistic Regression with Word2Vec  (Total: 18 Points)

Unlike sentiment, Irony is very subjective, and there is no word list for ironic and non-ironic tweets. This makes hand-engineering features tedious, therefore, we will use word embeddings as input to the classifier, and make the model automatically extract features aka learn weights for the embeddings

## Tokenizer for Tweets


Tweets are very different from normal document text. They have emojis, hashtags and bunch of other special character. Therefore, we need to create a suitable tokenizer for this kind of text.

Additionally, as described in class, we also need to have a consistent input length of the text document in order for the neural networks built over it to work correctly.

### Task 2: Create a Tokenizer with Padding (5 Points)

Our Tokenizer class is meant for tokenizing and padding batches of inputs. This is done
before we encode text sequences as torch Tensors.

Update the following class by completing the todo statements.

In [5]:
from typing import Dict, List, Optional, Tuple
from collections import Counter

import torch
import numpy as np
import spacy


class Tokenizer:
    """Tokenizes and pads a batch of input sentences."""

    def __init__(self, pad_symbol: Optional[str] = "<PAD>"):
        """Initializes the tokenizer

        Args:
            pad_symbol (Optional[str], optional): The symbol for a pad. Defaults to "<PAD>".
        """
        self.pad_symbol = pad_symbol
        self.nlp = spacy.load("en_core_web_sm")
    
    def __call__(self, batch: List[str]) -> List[List[str]]:
        """Tokenizes each sentence in the batch, and pads them if necessary so
        that we have equal length sentences in the batch.

        Args:
            batch (List[str]): A List of sentence strings

        Returns:
            List[List[str]]: A List of equal-length token Lists.
        """
        batch = self.tokenize(batch)
        batch = self.pad(batch)

        return batch

    def tokenize(self, sentences: List[str]) -> List[List[str]]:
        """Tokenizes the List of string sentences into a Lists of tokens using spacy tokenizer.

        Args:
            sentences (List[str]): The input sentence.

        Returns:
            List[str]: The tokenized version of the sentence.
        """
        tokenized_sentences = []

        for sentence in sentences:
            # Tokenize the sentence using spaCy
            doc = self.nlp(sentence)

            # Extract tokens from the spaCy Doc object
            tokens = [token.text for token in doc]

            # Add <SOS> (Start of Sentence) and <EOS> (End of Sentence) tokens
            tokens = ['<SOS>'] + tokens + ['<EOS>']

            # Append the tokenized sentence to the result list
            tokenized_sentences.append(tokens)

        return tokenized_sentences

    def pad(self, batch: List[List[str]]) -> List[List[str]]:
        """Appends pad symbols to each tokenized sentence in the batch such that
        every List of tokens is the same length. This means that the max length sentence
        will not be padded.

        Args:
            batch (List[List[str]]): Batch of tokenized sentences.

        Returns:
            List[List[str]]: Batch of padded tokenized sentences. 
        """
        max_length = max(len(sentence) for sentence in batch)

        # Pad each sentence in the batch to the maximum length
        padded_batch = [sentence + ['<P>'] * (max_length - len(sentence)) for sentence in batch]

        return padded_batch

In [6]:
# create the vocabulary of the dataset: use both training and test sets here

SPECIAL_TOKENS = ['<UNK>', '<PAD>', '<SOS>', '<EOS>']

all_data = train_sentences + test_sentences
my_tokenizer = Tokenizer()

tokenized_data = my_tokenizer.tokenize(all_data)
vocab = sorted(set([w for ws in tokenized_data + [SPECIAL_TOKENS] for w in ws]))

with open('vocab.txt', 'w', encoding='utf-8') as vf:
    vf.write('\n'.join(vocab))

## Embeddings

We use GloVe embeddings https://nlp.stanford.edu/projects/glove/. But these do not necessarily have all of the tokens that will occur in tweets! Hoad the GloVe embeddings, pruning them to only those words in vocab.txt. This is to reduce the memory and runtime of your model.

Then, find the out-of-vocabulary words (oov) and add them to the encoding dictionary and the embeddings matrix.

In [7]:
# Dowload the gloVe vectors for Twitter tweets. This will download a file called glove.twitter.27B.zip

# !curl -0 https://downloads.cs.stanford.edu/nlp/data/glove.twitter.27B.zip

In [8]:
# unzip glove.twitter.27B.zip
# if there is an error, please download the zip file again

# !tar -xf glove.twitter.27B.zip

In [9]:
# Let's see what files are there:

!dir /B | findstr /R "glove.*.txt"

glove.twitter.27B.100d.txt
glove.twitter.27B.200d.txt
glove.twitter.27B.25d.txt
glove.twitter.27B.50d.txt


In [10]:
# For this assignment, we will use glove.twitter.27B.50d.txt which has 50 dimensional word vectors
# Feel free to experiment with vectors of other sizes

embedding_path = 'glove.twitter.27B.50d.txt'
vocab_path = "./vocab.txt"

## Creating a custom Embedding Layer

Now the GloVe file has vectors for about 1.2 million words. However, we only need the vectors for a very tiny fraction of words -> the unique words that are there in the classification corpus. Some of the next tasks will be to create a custom embedding layer that has the vectors for this small set of words

### Task 2: Extracting word vectors from GloVe (3 Points)

In [11]:
from typing import Dict, Tuple

import torch


def read_pretrained_embeddings(
    embeddings_path: str,
    vocab_path: str
) -> Tuple[Dict[str, int], torch.FloatTensor]:
    """Read the embeddings matrix and make a dict hashing each word.

    Note that we have provided the entire vocab for train and test, so that for practical purposes
    we can simply load those words in the vocab, rather than all 27B embeddings

    Args:
        embeddings_path (str): _description_
        vocab_path (str): _description_

    Returns:
        Tuple[Dict[str, int], torch.FloatTensor]: _description_
    """
    word2i = {}
    vectors = []
    
    with open(vocab_path, encoding='utf8') as vf:
        vocab = set([w.strip() for w in vf.readlines()]) 
    
    print(f"Reading embeddings from {embeddings_path}...")
    with open(embeddings_path, "r", encoding='utf-8') as f:
        i = 0
        for line in f:
            word, *weights = line.rstrip().split(" ")
            # TODO: Build word2i and vectors such that
            #       each word points to the index of its vector,
            #       and only words that exist in `vocab` are in our embeddings
            if word in vocab:
                word2i[word] = len(word2i)
                vector = torch.FloatTensor([float(weight) for weight in weights])
                vectors.append(vector)

    return word2i, torch.stack(vectors)

### Task 3: Get GloVe Out of Vocabulary (oov) words (0 Points)

The task is to find the words in the Irony corpus that are not in the GloVe Word list

In [12]:
def get_oovs(vocab_path: str, word2i: Dict[str, int]) -> List[str]:
    """Find the vocab items that do not exist in the glove embeddings (in word2i).
    Return the List of such (unique) words.

    Args:
        vocab_path: List of batches of sentences.
        word2i (Dict[str, int]): _description_

    Returns:
        List[str]: _description_
    """
    with open(vocab_path, encoding='utf8') as vf:
        vocab = set([w.strip() for w in vf.readlines()])
    
    glove_and_vocab = set(word2i.keys())
    vocab_and_not_glove = vocab - glove_and_vocab
    return list(vocab_and_not_glove)

### Task 4: Update the embeddings with oov words (3 Points)

In [13]:
def initialize_new_embedding_weights(num_embeddings: int, dim: int) -> torch.FloatTensor:
    """xavier initialization for the embeddings of words in train, but not in gLove.

    Args:
        num_embeddings (int): _description_
        dim (int): _description_

    Returns:
        torch.FloatTensor: _description_
    """
    # TODO: Initialize a num_embeddings x dim matrix with xiavier initiialization
    #      That is, a normal distribution with mean 0 and standard deviation of dim^-0.5
    weights = torch.empty(num_embeddings, dim)
    torch.nn.init.xavier_uniform_(weights)

    return weights


def update_embeddings(
    glove_word2i: Dict[str, int],
    glove_embeddings: torch.FloatTensor,
    oovs: List[str]
) -> Tuple[Dict[str, int], torch.FloatTensor]:
    # TODO: Add the oov words to the dict, assigning a new index to each

    # TODO: Concatenate a new row to embeddings for each oov
    #       initialize those new rows with `intialize_new_embedding_weights`

    # TODO: Return the tuple of the dictionary and the new embeddings matrix
    
    # Create a copy of the glove_word2i dictionary to avoid modifying the original
    word2i = glove_word2i.copy()

    # Add the out-of-vocabulary (OOV) words to the dictionary, assigning a new index to each
    for oov in oovs:
        word2i[oov] = len(word2i)

    # Initialize new embeddings for OOV words using Xavier initialization
    oov_embeddings = initialize_new_embedding_weights(len(oovs), glove_embeddings.size(1))

    # Concatenate the new rows to the existing embeddings matrix
    new_embeddings = torch.cat([glove_embeddings, oov_embeddings], dim=0)

    return word2i, new_embeddings

In [14]:
def make_batches(sequences: List[str], batch_size: int) -> List[List[str]]:
    """Yield batch_size chunks from sequences."""
    # TODO
    for i in range(0, len(sequences), batch_size):
        yield sequences[i:i + batch_size]


# TODO: Set your preferred batch size
batch_size = 8
tokenizer = Tokenizer()

# We make batches now and use those.
bt_train_st = []
b_train_lb = []
bt_dev_st = []
b_dev_lb = []
# Note: Labels need to be batched in the same way to ensure
# We have train sentence and label batches lining up.
for batch in make_batches(train_sentences, batch_size):
    bt_train_st.append(tokenizer(batch))
for batch in make_batches(train_labels, batch_size):
    b_train_lb.append(batch)
for batch in make_batches(dev_sentences, batch_size):
    bt_dev_st.append(tokenizer(batch))
for batch in make_batches(dev_labels, batch_size):
    b_dev_lb.append(batch)
# print(bt_train_st[:3])
# print(type(bt_train_st[0])) first batch
# print(type(bt_train_st[0][0])) first tweet in first batch
# print(type(bt_train_st[0][0][0])) first token in first tweet

glove_word2i, glove_embeddings = read_pretrained_embeddings(
    embedding_path,
    vocab_path
)

# Find the out-of-vocabularies
oovs = get_oovs(vocab_path, glove_word2i)

# Add the oovs from training data to the word2i encoding, and as new rows
# to the embeddings matrix
word2i, embeddings = update_embeddings(glove_word2i, glove_embeddings, oovs)
# print(word2i)
print(len(embeddings), len(embeddings[0]))

Reading embeddings from glove.twitter.27B.50d.txt...
15068 50


### Encoding words to integers: DO NOT EDIT

In [15]:
# Use these functions to encode your batches before you call the train loop.

def encode_sentences(batch: List[List[str]], word2i: Dict[str, int]) -> torch.LongTensor:
    """Encode the tokens in each sentence in the batch with a dictionary

    Args:
        batch (List[List[str]]): The padded and tokenized batch of sentences.
        word2i (Dict[str, int]): The encoding dictionary.

    Returns:
        torch.LongTensor: The tensor of encoded sentences.
    """
    UNK_IDX = word2i["<UNK>"]
    tensors = []
    for sent in batch:
        tensors.append(torch.LongTensor([word2i.get(w, UNK_IDX) for w in sent]))
    return torch.stack(tensors)


def encode_labels(labels: List[int]) -> torch.FloatTensor:
    """Turns the batch of labels into a tensor

    Args:
        labels (List[int]): List of all labels in the batch

    Returns:
        torch.FloatTensor: Tensor of all labels in the batch
    """
    return torch.LongTensor([int(l) for l in labels])


## Modeling   ( 7 Points)

In [16]:
import torch

# Notice there is a single TODO in the model
class IronyDetector(torch.nn.Module):
    def __init__(
        self,
        input_dim: int,
        hidden_dim: int,
        embeddings_tensor: torch.FloatTensor,
        pad_idx: int,
        output_size: int,
        dropout_val: float = 0.3,
    ):
        super().__init__()
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.pad_idx = pad_idx
        self.dropout_val = dropout_val
        self.output_size = output_size
        # TODO: Initialize the embeddings from the weights matrix.
        #       Check the documentation for how to initialize an embedding layer
        #       from a pretrained embedding matrix. 
        #       Be careful to set the `freeze` parameter!
        #       Docs are here: https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html#torch.nn.Embedding.from_pretrained
        self.embedding = torch.nn.Embedding.from_pretrained(embeddings_tensor, padding_idx=pad_idx)
        # Dropout regularization
        # https://jmlr.org/papers/v15/srivastava14a.html
        self.dropout_layer = torch.nn.Dropout(p=self.dropout_val, inplace=False)
        # Bidirectional 2-layer LSTM. Feel free to try different parameters.
        # https://colah.github.io/posts/2015-08-Understanding-LSTMs/
        self.lstm = torch.nn.LSTM(
            self.input_dim,
            self.hidden_dim,
            num_layers=2,
            dropout=dropout_val,
            batch_first=True,
            bidirectional=True,
        )
        # For classification over the final LSTM state.
        self.classifier = torch.nn.Linear(hidden_dim*2, self.output_size)
        self.log_softmax = torch.nn.LogSoftmax(dim=2)
    
    def encode_text(
        self,
        symbols: torch.Tensor
    ) -> torch.Tensor:
        """Encode the (batch of) sequence(s) of token symbols with an LSTM.
            Then, get the last (non-padded) hidden state for each symbol and return that.

        Args:
            symbols (torch.Tensor): The batch size x sequence length tensor of input tokens

        Returns:
            torch.Tensor: The final hiddens tate of the LSTM, which represents an encoding of
                the entire sentence
        """
        # First we get the embedding for each input symbol
        embedded = self.embedding(symbols)
        embedded = self.dropout_layer(embedded)
        # Packs embedded source symbols into a PackedSequence.
        # This is an optimization when using padded sequences with an LSTM
        lens = (symbols != self.pad_idx).sum(dim=1).to("cpu")
        packed = torch.nn.utils.rnn.pack_padded_sequence(
            embedded, lens, batch_first=True, enforce_sorted=False
        )
        # -> batch_size x seq_len x encoder_dim, (h0, c0).
        packed_outs, (H, C) = self.lstm(packed)
        encoded, _ = torch.nn.utils.rnn.pad_packed_sequence(
            packed_outs,
            batch_first=True,
            padding_value=self.pad_idx,
            total_length=None,
        )
        # Now we have the representation of eahc token encoded by the LSTM.
        encoded, (H, C) = self.lstm(embedded)
        
        # This part looks tricky. All we are doing is getting a tensor
        # That indexes the last non-PAD position in each tensor in the batch.
        last_enc_out_idxs = lens - 1
        # -> B x 1 x 1.
        last_enc_out_idxs = last_enc_out_idxs.view([encoded.size(0)] + [1, 1])
        # -> 1 x 1 x encoder_dim. This indexes the last non-padded dimension.
        last_enc_out_idxs = last_enc_out_idxs.expand(
            [-1, -1, encoded.size(-1)]
        )
        # Get the final hidden state in the LSTM
        last_hidden = torch.gather(encoded, 1, last_enc_out_idxs)
        return last_hidden
    
    def forward(
        self,
        symbols: torch.Tensor,
    ) -> torch.Tensor:
        encoded_sents = self.encode_text(symbols)
        output = self.classifier(encoded_sents)
        return self.log_softmax(output)

## Evaluation

In [17]:
def predict(model: torch.nn.Module, dev_sequences: List[torch.Tensor]):
    preds = []
    # TODO: Get the predictions for the dev_sequences using the model
    # Set the model to evaluation mode
    model.eval()
   
    with torch.no_grad():
        for sequence in dev_sequences:
            # Make predictions using the model
            output = model(sequence)

            predicted_labels = torch.argmax(output, dim=2)
#             print(f'predicted labels = {predicted_labels}')
            labels = predicted_labels.view(-1)
#             print(f'1d labels = {labels}')
            preds.append(labels)
            
#     print(f'preds = {preds}')
    return preds


## Training

In [18]:
from tqdm import tqdm_notebook as tqdm

import random
from util import avg_f1_score, f1_score

def training_loop(
    num_epochs,
    train_features,
    train_labels,
    dev_features,
    dev_labels,
    optimizer,
    model,
):
    print("Training...")
    loss_func = torch.nn.NLLLoss()
    batches = list(zip(train_features, train_labels))
    random.shuffle(batches)
    for i in range(num_epochs):
        losses = []
        for features, labels in tqdm(batches):
            # Empty the dynamic computation graph
            optimizer.zero_grad()
            preds = model(features).squeeze(1)
            loss = loss_func(preds, labels)
            # Backpropogate the loss through our model
            loss.backward()
            optimizer.step()
            losses.append(loss.item())
        
        print(f"epoch {i}, loss: {sum(losses)/len(losses)}")
        # Estimate the f1 score for the development set
        print("Evaluating dev...")
        preds = predict(model, dev_features)
        concatenated_tensor = torch.cat(preds)
        preds = concatenated_tensor.tolist()
        
#         print(f'preds = {preds}\ndev_labels = {dev_labels}')
        #print(f"label2i = {label2i['1']}, label2i, keys = {label2i.keys()}")
        dev_f1 = f1_score(preds, dev_labels, label2i['1'])
        dev_avg_f1 = avg_f1_score(preds, dev_labels, list(label2i.keys()))
        print(f"Dev F1 {dev_f1}")
        print(f"Avg Dev F1 {dev_avg_f1}")
        
    # Return the trained model
    return model

In [20]:
# TODO: Load the model and run the training loop 
#       on your train/dev splits. Set and tweak hyperparameters.
import torch
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from sklearn.metrics import accuracy_score

input_dim = 50
hidden_dim = 64
output_size = 2
pad_idx = 0
model = IronyDetector(  input_dim,
                        hidden_dim,
                        embeddings,
                        pad_idx,
                        output_size,
                        0.3)
    
# Hyperparameters
epochs = 30
learning_rate = 0.001

# encode data
encoded_train_sts = []
encoded_train_lbs = []
encoded_dev_sts = []
encoded_dev_lbs = []
for sentence in bt_train_st:
    encoded_train_sts.append(encode_sentences(sentence,word2i))
for sentence in bt_dev_st:
    encoded_dev_sts.append(encode_sentences(sentence,word2i))
for label in b_train_lb:
    encoded_train_lbs.append(encode_labels(label))
for label in b_dev_lb:
    encoded_dev_lbs.append(encode_labels(label))
    
concatenated_dev_labels = torch.cat(encoded_dev_lbs)
flatten_dev_labels = concatenated_dev_labels.tolist()
# concatenated_train_labels = torch.cat(encoded_train_lbs)
# flatten_train_labels = concatenated_train_labels.tolist()
# print(bt_train_st[:2])
# print(bt_dev_st[:2])
# print(b_train_lb[:2])
# print(b_dev_lb[:2])

# Choose an optimizer and a loss function
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training loop
model = training_loop(  epochs,
                        encoded_train_sts,
                        encoded_train_lbs,
                        encoded_dev_sts,
                        flatten_dev_labels,
                        optimizer,
                        model,
                    )



Training...


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for features, labels in tqdm(batches):


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 0, loss: 0.6939605247850219
Evaluating dev...
Dev F1 0.1074766355140187
Avg Dev F1 0.3810439235436278


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 1, loss: 0.6898551850269238
Evaluating dev...
Dev F1 0.6492693110647182
Avg Dev F1 0.5329679888656924


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 2, loss: 0.6787088136188686
Evaluating dev...
Dev F1 0.44117647058823534
Avg Dev F1 0.5351218578537705


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 3, loss: 0.6573166251182556
Evaluating dev...
Dev F1 0.4902597402597403
Avg Dev F1 0.574105904988258


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 4, loss: 0.6360786396544427
Evaluating dev...
Dev F1 0.4222222222222222
Avg Dev F1 0.5541694612117147


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 5, loss: 0.6182011573109776
Evaluating dev...
Dev F1 0.4959481361426257
Avg Dev F1 0.5783993679622615


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 6, loss: 0.5952150494946787
Evaluating dev...
Dev F1 0.524031007751938
Avg Dev F1 0.5893495871155641


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 7, loss: 0.5754700987987841
Evaluating dev...
Dev F1 0.5765517241379311
Avg Dev F1 0.5985354417970249


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 8, loss: 0.5590920712177953
Evaluating dev...
Dev F1 0.5668016194331984
Avg Dev F1 0.5810048450255526


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 9, loss: 0.5346733088760326
Evaluating dev...
Dev F1 0.46204620462046203
Avg Dev F1 0.5553765505860931


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 10, loss: 0.5203057224086175
Evaluating dev...
Dev F1 0.477124183006536
Avg Dev F1 0.5650262997462181


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 11, loss: 0.4950118519676228
Evaluating dev...
Dev F1 0.5912806539509537
Avg Dev F1 0.6081403269754769


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 12, loss: 0.4703482612967491
Evaluating dev...
Dev F1 0.5665236051502147
Avg Dev F1 0.6018246768266045


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 13, loss: 0.4374037224139708
Evaluating dev...
Dev F1 0.5819793205317578
Avg Dev F1 0.6258788084572442


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 14, loss: 0.42074513968933996
Evaluating dev...
Dev F1 0.624516129032258
Avg Dev F1 0.620558459773046


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 15, loss: 0.40948804636718705
Evaluating dev...
Dev F1 0.5767045454545455
Avg Dev F1 0.6088342004381162


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 16, loss: 0.39156253115894896
Evaluating dev...
Dev F1 0.5832147937411096
Avg Dev F1 0.6153137747285572


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 17, loss: 0.35936069533151266
Evaluating dev...
Dev F1 0.5553869499241274
Avg Dev F1 0.6102649035334923


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 18, loss: 0.3315050581489534
Evaluating dev...
Dev F1 0.48067226890756304
Avg Dev F1 0.5757993932397241


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 19, loss: 0.33166139950238477
Evaluating dev...
Dev F1 0.5454545454545455
Avg Dev F1 0.5960181186137155


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 20, loss: 0.300435910714441
Evaluating dev...
Dev F1 0.5694050991501417
Avg Dev F1 0.6011276703480177


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 21, loss: 0.3098613319840903
Evaluating dev...
Dev F1 0.4933333333333334
Avg Dev F1 0.583925767309065


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 22, loss: 0.2831393424470055
Evaluating dev...
Dev F1 0.5454545454545454
Avg Dev F1 0.5908023900981647


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 23, loss: 0.3003632361505879
Evaluating dev...
Dev F1 0.592797783933518
Avg Dev F1 0.6153644092081383


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 24, loss: 0.24549261374098327
Evaluating dev...
Dev F1 0.6077348066298344
Avg Dev F1 0.6285587613396085


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 25, loss: 0.26286153794353595
Evaluating dev...
Dev F1 0.5577211394302849
Avg Dev F1 0.608733695436019


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 26, loss: 0.22223072679480538
Evaluating dev...
Dev F1 0.6068027210884354
Avg Dev F1 0.6225502967144304


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 27, loss: 0.2039347632768719
Evaluating dev...
Dev F1 0.6053333333333333
Avg Dev F1 0.613891156462585


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 28, loss: 0.21068098773927582
Evaluating dev...
Dev F1 0.6050870147255689
Avg Dev F1 0.6151229228646904


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 29, loss: 0.2212366730091162
Evaluating dev...
Dev F1 0.6084656084656085
Avg Dev F1 0.6140014417649379


## Written Assignment (30 Points)

### 1. Describe what the task is, and how it could be useful.

### 2. Describe, at the high level, that is, without mathematical rigor, how pretrained word embeddings like the ones we relied on here are computed. Your description can discuss the Word2Vec class of algorithms, GloVe, or a similar method.

### 3. What are some of the benefits of using word embeddings instead of e.g. a bag of words?

### 4. What is the difference between Binary Cross Entropy loss and the negative log likelihood loss we used here (`torch.nn.NLLLoss`)?

### 5. Show your experimental results. Indicate any changes to hyperparameters, data splits, or architectural changes you made, and how those effected results.