<a href="https://colab.research.google.com/github/dbamman/nlp20/blob/master/HW_4/HW_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Homework 4: Neural Sequence Labeling

**Due March 4, 2020 at 11:59PM**


In this homework, you will be implementing, training, and evaluating an LSTM for part-of-speech tagging using the PyTorch library.

**Before beginning, please switch your Colab session to a GPU runtime** 

Go to Runtime > Change runtime type > Hardware accelerator > GPU

### Setup

In [3]:
# import libraries
import torch
import numpy as np
import torch.nn as nn
from torch.nn.utils.rnn import pad_sequence, pad_packed_sequence, pack_padded_sequence

In [4]:
# if this cell prints "Running on cpu", you must switch runtime environments
# go to Runtime > Change runtime type > Hardware accelerator > GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Running on {}".format(device))

Running on cuda


### Download & Load Pretrained Embeddings

In this assignment, we will be using GloVe pretrained word embeddings. You can read more about GloVe here: https://nlp.stanford.edu/projects/glove/

**Note**: this section will take *several minutes*, since the embedding files are large. Files in Colab may be cached between sessions, so you may or may not need to redownload the files each time you reconnect. 


In [5]:
# download pretrained word embeddings
!wget -nc http://nlp.stanford.edu/data/glove.6B.zip
!unzip -n glove*.zip

File ‘glove.6B.zip’ already there; not retrieving.

Archive:  glove.6B.zip


In [6]:
%load_ext nb_black

<IPython.core.display.Javascript object>

In [7]:
def read_embeddings(filename, vocab_size=10000):
    """
    Utility function, loads in the `vocab_size` most common embeddings from `filename`

    Arguments:
    - filename:     path to file
                  automatically infers correct embedding dimension from filename
    - vocab_size:   maximum number of embeddings to load

    Returns 
    - embeddings:   torch.FloatTensor matrix of size (vocab_size x word_embedding_dim)
    - vocab:        dictionary mapping word (str) to index (int) in embedding matrix
    """

    # get the embedding size from the first embedding
    with open(filename, encoding="utf-8") as file:
        word_embedding_dim = len(file.readline().split(" ")) - 1

    vocab = {}

    embeddings = np.zeros((vocab_size, word_embedding_dim))

    with open(filename, encoding="utf-8") as file:
        for idx, line in enumerate(file):

            if idx + 2 >= vocab_size:
                break

            cols = line.rstrip().split(" ")
            val = np.array(cols[1:])
            word = cols[0]
            embeddings[idx + 2] = val
            vocab[word] = idx + 2

    # a FloatTensor is a multidimensional matrix
    # that contains 32-bit floats in every entry
    # https://pytorch.org/docs/stable/tensors.html
    return torch.FloatTensor(embeddings), vocab

<IPython.core.display.Javascript object>

Running the cell below lists all the files in the current directory. 

In [8]:
!ls -lh

total 3.0G
-rw-r--r-- 1 root root 457K Feb 29 18:06 HW_4.html
-rw-r--r-- 1 root root  47K Feb 29 18:34 HW_4.ipynb
lrwxrwxrwx 1 root root    9 Feb 29 18:12 datasets -> /datasets
-rw-r--r-- 1 root root 332M Feb 29 18:07 glove.6B.100d.txt
-rw-r--r-- 1 root root 662M Feb 29 18:08 glove.6B.200d.txt
-rw-r--r-- 1 root root 990M Feb 29 18:10 glove.6B.300d.txt
-rw-r--r-- 1 root root 164M Feb 29 18:10 glove.6B.50d.txt
-rw-r--r-- 1 root root 823M Feb 29 18:12 glove.6B.zip
-rw-r--r-- 1 root root 209K Feb 29 18:12 pos.dev
-rw-r--r-- 1 root root  319 Feb 29 18:12 pos.tagset
-rw-r--r-- 1 root root 128K Feb 29 18:12 pos.test
-rw-r--r-- 1 root root 1.7M Feb 29 18:12 pos.train
-rw-r--r-- 1 root root  70K Feb 29 18:12 predictions.txt
lrwxrwxrwx 1 root root    8 Feb 29 18:12 storage -> /storage


<IPython.core.display.Javascript object>

You should see several embedding files, which are all formatted as

```
glove.6B.<emb_dim>d.txt
```

Each `txt` file contains `emb_dim` dimensional embeddings for 400,000 unique, uncased words. The script below loads the `vocab_size` most common words from the embedding file into a matrix we can give to our model. All other words will later be mapped to the `UNKNOWN` embedding.

In [111]:
# this loads the 10,000 most common word 50-dimensional embeddings
vocab_size = 10000
embeddings, vocab = read_embeddings("glove.6B.50d.txt", vocab_size)

<IPython.core.display.Javascript object>

## Part 1: Batching the data

Implement the `get_batches` function in the `Dataset` class below. 

**Please make sure that**

*   Your implementation is self-contained. That is, all helper functions and variables are defined within `get_batches`.
*   Your implementation can handle variable batch sizes. You may not assume that the value with always be 32



In [10]:
class Dataset:
    def __init__(self, filename, is_labeled):
        self.is_labeled = is_labeled
        # if the file is not labeled, the Dataset has no tags (see read_data)
        if is_labeled:
            self.sentences, self.tags = self.read_data(filename, is_labeled)
        else:
            self.sentences = self.read_data(filename, is_labeled)
            self.tags = None

    def read_data(self, filename, is_labeled):
        """
        Utility function, loads text file into a list of sentence and tag strings

        Arguments:
        - filename:     path to file
        - is_labeled:   whether the file contains tags for each word or not
            > if True, we assume each line is formatted as "<word>\t<tag>\n"
            > if False, we assume each line is formatted as "<word>\n"

        Returns:
        - sentences:    a list of sentences, where each sentence is a list 
                        words (strings)

        if is_labeled=True, also returns
        - tags:         a list of tags for each sentence, where tags[i] contains
                        a list of tags (strings) that correspond to the words in 
                        sentences[i]
        """
        sentences = []
        tags = []

        current_sentence = []
        current_tags = []

        with open(filename, encoding="utf8") as f:
            # iterate over the lines in the file
            for line in f:
                if len(line) == 0:
                    continue
                if line == "\n":
                    if len(current_sentence) != 0:
                        sentences.append(current_sentence)
                        tags.append(current_tags)

                    current_sentence = []
                    current_tags = []
                else:
                    if is_labeled:
                        columns = line.rstrip().split("\t")
                        word = columns[0].lower()
                        tag = columns[1]

                        current_sentence.append(word)
                        current_tags.append(tag)
                    else:
                        column = line.rstrip().split("\t")
                        word = column[0].lower()
                        current_sentence.append(word)

            if is_labeled:
                return sentences, tags
            else:
                return sentences

    def get_batches(self, batch_size, vocab, tagset):
        """

        Batches the data into mini-batches of size `batch_size`

        Arguments:
        - batch_size:       the desired output batch size
        - vocab:            a dictionary mapping word strings to indices
        - tagset:           a dictionary mapping tag strings to indices

        Outputs:

        if is_labeled=True:
        - batched_word_indices:     a list of matrices of dimension (batch_size x max_seq_len)
        - batched_tag_indices:      a list of matrices of dimension (batch_size x max_seq_len)
        - batched_lengths:          a list of arrays of length (batch_size)

        if is_labeled=False:
        - batched_word_indices:     a list of matrices of dimension (batch_size x max_seq_len)
        - batched_lengths:          a list of arrays of length (batch_size)


        Description: 

        This function partitions the data into batches of size batch_size. If the number
        of sentences in the document is not an even multiple of batch_size, the final batch
        will contain the remaining elements. For example, if there are 82 sentences in the 
        dataset and batch_size=32, we return a list containing two batches of size 32 
        and one final batch of size 18.

        batched_word_indices[b] is a (batch_size x max_seq_len) matrix of integers, 
        containing index representations for sentences in the b-th batch in the document. 
        The `vocab` dictionary provides the correct mapping from word strings to indices. 
        If a word is not in the vocabulary, it gets mapped to UNKNOWN_INDEX (1).
        `max_seq_len` is the maximum sentence length among the sentences in the current batch, 
        which will vary between different batches. All sentences shorter than max_seq_len 
        should be padded on the right with PAD_INDEX (0).

        If the document is labeled, we also batch the document's tags. Analogous to 
        batched_word_indices, batched_tag_indices[b] contains the index representation
        for the tags corresponding to the sentences in the b-th batch  in the document. 
        The `tagset` dictionary provides the correct mapping from tag strings to indicies. 
        All tag lists shorter than `max_seq_len` are padded with IGNORE_TAG_INDEX (-100).

        batched_lengths[b] is a vector of length (batch_size). batched_lengths[b][i] 
        contains the original sentence length *before* padding for the i-th sentence
        in the currrent batch. 

        """
        PAD_INDEX = 0  # reserved for padding words
        UNKNOWN_INDEX = 1  # reserved for unknown words
        IGNORE_TAG_INDEX = -100  # reserved for padding tags

        # randomly shuffle the data
        np.random.seed(159)  # DON'T CHANGE THIS
        shuffle = np.random.permutation(range(len(self.sentences)))

        sentences = [self.sentences[i] for i in shuffle]
        if self.is_labeled:
            tags = [self.tags[i] for i in shuffle]
        else:
            tags = None

        batched_word_indices = []
        batched_tag_indices = []
        batched_lengths = []

        # prepare sentences and lengths
        for i in range(0, len(sentences), batch_size):

            sent_chunk = sentences[i : i + batch_size]
            max_seq_len = max(len(sent) for sent in sent_chunk)
            word_matrix = []

            # prepare lengths
            lengths = [len(sent) for sent in sent_chunk]
            batched_lengths.append(np.array(lengths))

            # prepare sentences
            for sent in sent_chunk:
                sent_word_indices = [PAD_INDEX] * max_seq_len

                for idx, word in enumerate(sent):
                    sent_word_indices[idx] = vocab.get(word, UNKNOWN_INDEX)

                word_matrix.append(sent_word_indices)

            batched_word_indices.append(np.array(word_matrix))

        # prepare tags
        if tags:
            for i in range(0, len(tags), batch_size):
                chunk = tags[i : i + batch_size]
                max_seq_len = max(len(x) for x in chunk)
                tag_matrix = []

                for sentence_tags in chunk:
                    tag_indices = [IGNORE_TAG_INDEX] * max_seq_len

                    for idx, tag in enumerate(sentence_tags):
                        tag_indices[idx] = tagset[tag]

                    tag_matrix.append(tag_indices)

                batched_tag_indices.append(np.array(tag_matrix))

        #############################
        #       DO NOT MODIFY       #
        #############################
        if self.is_labeled:
            return batched_word_indices, batched_tag_indices, batched_lengths
        else:
            return batched_word_indices, batched_lengths

<IPython.core.display.Javascript object>

In [11]:
def read_tagset(tag_file):
    """
    Utility function, loads tag file into a dictionary from tag string to tag index

    Arguments:
    - tag_file:   file location of the tagset

    Outputs:
    - tagset:     a dictionary mapping tag strings (e.g. "VB") to a unique index
    """
    tagset = {}
    with open(tag_file, encoding="utf8") as f:
        for line in f:
            columns = line.rstrip().split("\t")
            tag = columns[0]
            tag_id = int(columns[1])
            tagset[tag] = tag_id

    return tagset

<IPython.core.display.Javascript object>

The cells below download the data files and construct the corresponding `Dataset` objects. 

In [12]:
%%capture
!wget -nc https://raw.githubusercontent.com/dbamman/nlp20/master/HW_4/pos.train
!wget -nc https://raw.githubusercontent.com/dbamman/nlp20/master/HW_4/pos.dev
!wget -nc https://raw.githubusercontent.com/dbamman/nlp20/master/HW_4/pos.test
!wget -nc https://raw.githubusercontent.com/dbamman/nlp20/master/HW_4/pos.tagset

<IPython.core.display.Javascript object>

In [13]:
# read the files
tagset = read_tagset("pos.tagset")
train_dataset = Dataset("pos.train", is_labeled=True)
dev_dataset = Dataset("pos.dev", is_labeled=True)
test_dataset = Dataset("pos.test", is_labeled=False)

BATCH_SIZE = 32

# these should run without errors if implemented correctly
train_batch_idx, train_batch_tags, train_batch_lens = train_dataset.get_batches(
    BATCH_SIZE, vocab, tagset
)
dev_batch_idx, dev_batch_tags, dev_batch_lens = dev_dataset.get_batches(
    BATCH_SIZE, vocab, tagset
)
test_batch_idx, test_batch_lens = test_dataset.get_batches(BATCH_SIZE, vocab, tagset)

<IPython.core.display.Javascript object>

### Part 2: Evaluation

Next, we will implement utility functions that will later be used to assess our model's perfomance. 

**Please make sure that**

*   Your implementation is self-contained. That is, keep all helper functions or variables inside of your function.
*   Your implementation does not import any additional libraries. You will not receive credit if you do.

In [14]:
# The accuracy function has been implemented for you


def accuracy(true, pred):
    """
    Arguments:
    - true:       a list of true label values (integers)
    - pred:       a list of predicted label values (integers)

    Output:
    - accuracy:   the prediction accuracy
    """
    true = np.array(true)
    pred = np.array(pred)

    num_correct = sum(true == pred)
    num_total = len(true)
    return num_correct / num_total

<IPython.core.display.Javascript object>

In [15]:
def confusion_matrix(true, pred, num_tags):
    """
    Arguments:
    - true:       a list of true label values (integers)
    - pred:       a list of predicted label values (integers)
    - num_tags:   the number of possible tags
                true and pred will both contain integers between
                0 and num_tags - 1 (inclusive)

    Output: 
    - confusion_matrix:   a (num_tags x num_tags) matrix of integers

    confusion_matrix[i][j] = # predictions where true label
    was i and predicted label was j

    """

    confusion_matrix = np.zeros((num_tags, num_tags))
    true = np.array(true)
    pred = np.array(pred)

    for i in range(len(true)):
        confusion_matrix[true[i]][pred[i]] += 1

    return confusion_matrix

<IPython.core.display.Javascript object>

In [16]:
def precision(true, pred, num_tags):
    """
  Arguments:
  - true:       a list of true label values (integers)
  - pred:       a list of predicted label values (integers)
  - num_tags:   the number of possible tags
                true and pred will both contain integers between
                0 and num_tags - 1 (inclusive)

  Output: 
  - precision:  an array of length num_tags, where precision[i]
                gives the precision of class i

  Hints:  the confusion matrix may be useful
          be careful about zero division
  """

    precision = np.zeros(num_tags)

    cm = confusion_matrix(true, pred, num_tags)

    for i in range(num_tags):
        tp = cm[i][i]
        tp_fp = np.sum(cm[:, i])
        precision[i] = tp / tp_fp if tp_fp > 0 else 0

    return precision

<IPython.core.display.Javascript object>

In [17]:
def recall(true, pred, num_tags):
    """
    Arguments:
    - true:       a list of true label values (integers)
    - pred:       a list of predicted label values (integers)
    - num_tags:   the number of possible tags
                true and pred will both contain integers between
                0 and num_tags - 1 (inclusive)

    Output: 
    - recall:     an array of length num_tags, where recall[i]
                gives the recall of class i

    Hints:  the confusion matrix may be useful
          be careful about zero division
    """

    recall = np.zeros(num_tags)

    cm = confusion_matrix(true, pred, num_tags)

    for i in range(num_tags):
        tp = cm[i][i]
        tp_fn = np.sum(cm[i, :])
        recall[i] = tp / tp_fn if tp_fn > 0 else 0

    return recall

<IPython.core.display.Javascript object>

In [18]:
def f1_score(true, pred, num_tags):
    """
    Arguments:
    - true:       a list of true label values (integers)
    - pred:       a list of predicted label values (integers)
    - num_tags:   the number of possible tags
                true and pred will both contain integers between
                0 and num_tags - 1 (inclusive)

    Output: 
    - f1:         an array of length num_tags, where f1[i]
                gives the recall of class i
    """
    f1 = np.zeros(num_tags)

    p = precision(true, pred, num_tags)
    r = recall(true, pred, num_tags)

    for i in range(num_tags):
        if p[i] + r[i] > 0:
            f1[i] = 2 * (p[i] * r[i]) / (p[i] + r[i])

    return f1

<IPython.core.display.Javascript object>

In [19]:
# test_case
true = [1, 1, 0, 1, 0, 0, 0, 1, 1, 0]
pred = [1, 1, 1, 1, 1, 1, 0, 1, 1, 1]
num_tags = 2

print(confusion_matrix(true, pred, num_tags))
print(recall(true, pred, num_tags))
print(precision(true, pred, num_tags))
print(f1_score(true, pred, num_tags))

[[1. 4.]
 [0. 5.]]
[0.2 1. ]
[1.         0.55555556]
[0.33333333 0.71428571]


<IPython.core.display.Javascript object>

### Part 3: Building the model

Fill in the blanks in `LSTMTagger`'s `__init__` function. If you get stuck, you can reference PyTorch's [torch.nn documentation](https://pytorch.org/docs/stable/nn.html) or [this official tutorial](https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html) on LSTM sequence labeling.

In [20]:
class LSTMTagger(nn.Module):
    """
    An LSTM model for sequence labeling

    Initialization Arguments:
    - embeddings:   a matrix of size (vocab_size, emb_dim)
                    containing pretrained embedding weights
    - hidden_dim:   the LSTM's hidden layer size
    - tagset_size:  the number of possible output tags

    """

    def __init__(self, embeddings, hidden_dim, tagset_size):
        super().__init__()

        self.hidden_dim = hidden_dim
        self.num_labels = tagset_size

        # Initialize a PyTorch embeddings layer using the pretrained embedding weights
        self.embeddings = nn.Embedding.from_pretrained(embeddings, freeze=False)

        # Initialize an LSTM layer
        embedding_dim = int(embeddings.size()[1])
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)

        # Initialize a single feedforward layer
        self.hidden2tag = nn.Linear(hidden_dim, tagset_size)

    def forward(self, indices, lengths):
        """
        Runs a batched sequence through the model and returns output logits

        Arguments:
        - indices:  a matrix of size (batch_size x max_seq_len)
                    containing the word indices of sentences in the batch
        - lengths:  a vector of size (batch_size) containing the
                    original lengths of the sequences before padding

        Output:
        - logits:   a matrix of size (batch_size x max_seq_len x num_tags)
                    gives a score to each possible tag for each word
                    in each sentence 
        """
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        # cast arrays as PyTorch data types and move to GPU memory
        indices = torch.LongTensor(indices).to(device)
        lengths = torch.LongTensor(lengths).to(device)

        # convert word indices to word embeddings
        embeddings = self.embeddings(indices)

        # pack/pad handles variable length sequence batching
        # see here if you're curious: https://gist.github.com/HarshTrivedi/f4e7293e941b17d19058f6fb90ab0fec
        packed_input_embs = pack_padded_sequence(
            embeddings, lengths, batch_first=True, enforce_sorted=False
        )
        # run input through LSTM layer
        packed_output, _ = self.lstm(packed_input_embs)
        # unpack sequences into original format
        padded_output, output_lengths = pad_packed_sequence(
            packed_output, batch_first=True
        )

        logits = self.hidden2tag(padded_output)
        return logits

    def run_training(
        self,
        train_dataset,
        dev_dataset,
        batch_size,
        vocab,
        tagset,
        lr=5e-4,
        num_epochs=100,
        eval_every=5,
    ):
        """
        Trains the model on the training data with a learning rate of lr
        for num_epochs. Evaluates the model on the dev data eval_every epochs.

        Arguments:
        - train_dataset:  Dataset object containing the training data
        - dev_dataset:    Dataset object containing the dev data
        - batch_size:     batch size for train/dev data
        - vocab:          a dictionary mapping word strings to indices
        - tagset:         a dictionary mapping tag strings to indices
        - lr:             learning rate
        - num_epochs:     number of epochs to train for
        - eval_every:     evaluation is run eval_every epochs
        """
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        if str(device) == "cpu":
            print("Training only supported in GPU environment")
            return

        # clear unreferenced data/models from GPU memory
        torch.cuda.empty_cache()
        # move model to GPU memory
        self.to(device)

        # set the optimizer (Adam) and loss function (CrossEnt)
        optimizer = torch.optim.Adam(model.parameters(), lr=lr)
        loss_function = nn.CrossEntropyLoss(ignore_index=-100)

        # batch training and dev data
        train_batch_idx, train_batch_tags, train_batch_lens = train_dataset.get_batches(
            BATCH_SIZE, vocab, tagset
        )
        dev_batch_idx, dev_batch_tags, dev_batch_lens = dev_dataset.get_batches(
            BATCH_SIZE, vocab, tagset
        )

        print("**** TRAINING *****")
        for i in range(num_epochs):
            # sets the model in train mode
            self.train()

            total_loss = 0
            for b in range(len(train_batch_idx)):
                # compute the logits
                logits = model.forward(train_batch_idx[b], train_batch_lens[b])
                # move labels to GPU memory
                labels = torch.LongTensor(train_batch_tags[b]).to(device)
                # compute the loss with respect to true labels
                loss = loss_function(logits.view(-1, len(tagset)), labels.view(-1))
                total_loss += loss
                # propagate gradients backward
                loss.backward()
                optimizer.step()
                # set model gradients to zero before performing next forward pass
                self.zero_grad()

            print("Epoch {} | Loss: {}".format(i, total_loss))

            if (i + 1) % eval_every == 0:
                print("**** EVALUATION *****")
                # sets the model in evaluate mode (no gradients)
                self.eval()
                # compute dev f1 score
                acc, true, pred = self.evaluate(
                    dev_batch_idx, dev_batch_lens, dev_batch_tags, tagset
                )
                print("Dev Accuracy: {}".format(acc))
                print("**********************")

    def evaluate(self, batched_sentences, batched_lengths, batched_labels, tagset):
        """
        Evaluate the model's predictions on the provided dataset. 

        Arguments:
        - batched_sentences:  a list of matrices, each of size (batch_size x max_seq_len),
                            containing the word indices of sentences in the batch
        - batched_lengths:    a list of vectors, each of size (batch_size), containing the
                            original lengths of the sequences before padding
        - batched_labels:     a list of matrices, each of size (batch_size x max_seq_len),
                            containing the tag indices corresponding to sentences in the batch
        - num_tags:           the number of possible output tags

        Output:
        - accuracy:           the model's prediction accuracy
        - all_true_labels:    a flattened list of all true labels
        - all_predictions:    a flattened list of all of the model's corresponding predictions

        """

        all_true_labels = []
        all_predictions = []

        for b in range(len(batched_sentences)):
            logits = self.forward(batched_sentences[b], batched_lengths[b])
            batch_predictions = torch.argmax(logits, dim=-1).cpu().numpy()

            batch_size, _ = batched_sentences[b].shape

            for i in range(batch_size):
                tags = batched_labels[b][i]
                preds = batch_predictions[i]

                seq_len = int(batched_lengths[b][i])
                for j in range(seq_len):
                    all_predictions.append(int(preds[j]))
                    all_true_labels.append(int(tags[j]))

        acc = accuracy(all_true_labels, all_predictions)

        return acc, all_true_labels, all_predictions

<IPython.core.display.Javascript object>

In [21]:
def set_seed(seed):
    """
    Sets random seeds and sets model in deterministic
    training mode. Ensures reproducible results
    """
    torch.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(seed)

<IPython.core.display.Javascript object>

## Training the model

Run the cells below to train your model. If all of the previous sections are implemented correctly, you should see


*   the loss decreasing consistently for every epoch
*   the dev accuracy increasing until convergence around ~0.88

The staff solution achieves an accuracy of 0.880 after 25 epochs.

In [112]:
# sets the random seed – DO NOT change this
# this ensures deterministic results that are comparable with the staff values
set_seed(159)

HIDDEN_SIZE = 64
# intialize a new LSTMTagger model
model = LSTMTagger(embeddings, HIDDEN_SIZE, len(tagset))
# train the model
model.run_training(
    train_dataset,
    dev_dataset,
    BATCH_SIZE,
    vocab,
    tagset,
    lr=5e-4,
    num_epochs=25,
    eval_every=5,
)

**** TRAINING *****
Epoch 0 | Loss: 999.9422607421875
Epoch 1 | Loss: 442.25811767578125
Epoch 2 | Loss: 275.37127685546875
Epoch 3 | Loss: 212.16624450683594
Epoch 4 | Loss: 181.4229736328125
**** EVALUATION *****
Dev Accuracy: 0.8586200039769338
**********************
Epoch 5 | Loss: 163.11807250976562
Epoch 6 | Loss: 150.56387329101562
Epoch 7 | Loss: 141.14605712890625
Epoch 8 | Loss: 133.68817138671875
Epoch 9 | Loss: 127.54783630371094
**** EVALUATION *****
Dev Accuracy: 0.8754026645456353
**********************
Epoch 10 | Loss: 122.3573989868164
Epoch 11 | Loss: 117.87614440917969
Epoch 12 | Loss: 113.92591094970703
Epoch 13 | Loss: 110.38809967041016
Epoch 14 | Loss: 107.1690444946289
**** EVALUATION *****
Dev Accuracy: 0.8791409823026447
**********************
Epoch 15 | Loss: 104.20204162597656
Epoch 16 | Loss: 101.43944549560547
Epoch 17 | Loss: 98.83808135986328
Epoch 18 | Loss: 96.3730239868164
Epoch 19 | Loss: 94.02762603759766
**** EVALUATION *****
Dev Accuracy: 0.880373

<IPython.core.display.Javascript object>

Once the model is trained, run the cells below to print the precision, recall, and $F_1$ score per class.

In [109]:
def eval_per_class(model, dataset, vocab, tagset):
    """
    Prints precision, recall, and F1 for each class in the tagset
    """
    # batch the data
    batched_idx, batched_tags, batched_lens = dev_dataset.get_batches(
        BATCH_SIZE, vocab, tagset
    )
    # compute idx --> tag from tag --> idx
    reverse_tagset = {v: k for k, v in tagset.items()}
    # evaluate model on hold-out set
    acc, true, pred = model.evaluate(batched_idx, batched_lens, batched_tags, tagset)
    true = np.array(true)
    pred = np.array(pred)

    pr = precision(true, pred, len(tagset))
    re = recall(true, pred, len(tagset))
    f1 = f1_score(true, pred, len(tagset))

    for idx, tag in reverse_tagset.items():
        print("***********************")
        print("TAG: {}".format(tag))
        num_pred = np.sum(pred == idx)
        num_true = np.sum(true == idx)
        print("({} pred, {} true)".format(num_pred, num_true))

        print("PRECISION: \t{:.3f}".format(pr[idx]))
        print("RECALL: \t{:.3f}".format(re[idx]))
        print("F1 SCORE: \t{:.3f}".format(f1[idx]))

<IPython.core.display.Javascript object>

In [110]:
eval_per_class(model, dev_dataset, vocab, tagset)

***********************
TAG: $
(14 pred, 14 true)
PRECISION: 	1.000
RECALL: 	1.000
F1 SCORE: 	1.000
***********************
TAG: ''
(91 pred, 88 true)
PRECISION: 	0.956
RECALL: 	0.989
F1 SCORE: 	0.972
***********************
TAG: ,
(937 pred, 936 true)
PRECISION: 	0.965
RECALL: 	0.966
F1 SCORE: 	0.965
***********************
TAG: -LRB-
(122 pred, 117 true)
PRECISION: 	0.959
RECALL: 	1.000
F1 SCORE: 	0.979
***********************
TAG: -RRB-
(125 pred, 120 true)
PRECISION: 	0.952
RECALL: 	0.992
F1 SCORE: 	0.971
***********************
TAG: .
(1512 pred, 1503 true)
PRECISION: 	0.987
RECALL: 	0.993
F1 SCORE: 	0.990
***********************
TAG: :
(103 pred, 106 true)
PRECISION: 	0.942
RECALL: 	0.915
F1 SCORE: 	0.928
***********************
TAG: ADD
(99 pred, 81 true)
PRECISION: 	0.576
RECALL: 	0.704
F1 SCORE: 	0.633
***********************
TAG: AFX
(2 pred, 4 true)
PRECISION: 	0.000
RECALL: 	0.000
F1 SCORE: 	0.000
***********************
TAG: CC
(776 pred, 781 true)
PRECISION: 	0.994
RECALL

<IPython.core.display.Javascript object>

## Part 4: Model Exploration

Congratulations, you've just trained a neural network!

Now, improve the `LSTMTagger` model and implementing the `init` function in the `FancyTagger` class below. 
* Feel free to replace the `forward` function inherited from `LSTMTagger` if 
you need to, but it should not be necessary to receive full credit. Credit will be awarded based on the performance on a holdout test set. 
* Do not modify any of the cells above when completing part 4. Instead, insert cells below if you need to perform any additional computations. 
* You are allowed to use any function in `torch.nn`. You are **not** allowed to import any libraries or use implementations copied from the internet. 

Before submitting, please describe your modifications below:


1. Reduced number of unknown words by increasing vocab size to max vocab size(400k).
2. Increased GloVe embedding size to 300d.
3. Used 4 layers of stacked BiLSTM and increased hidden size to 512.
4. Used dropout for regularization.
5. Increased batch size to 256 for faster training.

In [100]:
# parameters
NUM_LAYERS = 4
BI = True
DROPOUT = 0.3
HIDDEN_SIZE = 512

VOCAB_SIZE = 400000
EMBEDDING_SIZE = 300

BATCH_SIZE = 256

<IPython.core.display.Javascript object>

In [101]:
embeddings, vocab = read_embeddings(f"glove.6B.{EMBEDDING_SIZE}d.txt", VOCAB_SIZE)

<IPython.core.display.Javascript object>

In [102]:
# read the files
tagset = read_tagset("pos.tagset")
train_dataset = Dataset("pos.train", is_labeled=True)
dev_dataset = Dataset("pos.dev", is_labeled=True)
test_dataset = Dataset("pos.test", is_labeled=False)

# these should run without errors if implemented correctly
train_batch_idx, train_batch_tags, train_batch_lens = train_dataset.get_batches(
    BATCH_SIZE, vocab, tagset
)
dev_batch_idx, dev_batch_tags, dev_batch_lens = dev_dataset.get_batches(
    BATCH_SIZE, vocab, tagset
)
test_batch_idx, test_batch_lens = test_dataset.get_batches(BATCH_SIZE, vocab, tagset)

<IPython.core.display.Javascript object>

In [103]:
# check number of unknown words
for dataset in [train_batch_idx, dev_batch_idx, test_batch_idx]:
    count = 0
    for x in dataset:
        count += list(x.flatten()).count(1)
    print(f"number of unknown words: {count}")

number of unknown words: 2440
number of unknown words: 463
number of unknown words: 522


<IPython.core.display.Javascript object>

In [104]:
class FancyTagger(LSTMTagger):
    """
  An improved neural model for sequence labeling

  Starter code from LSTMTagger has already been provided, but
  feel free to change the init and forward function internals
  if your model design requires it (though this is not necessary
  to receive full credit).

  You may use any component in torch.nn. You may NOT
  import any additional libraries/modules. 

  """

    def __init__(self, embeddings, embedding_dim, hidden_dim, tagset_size):
        # initializes the parent LSTMTagger class
        # inherits forward, evaluate, and run_training methods
        super().__init__(embeddings, hidden_dim, tagset_size)

        self.hidden_dim = hidden_dim
        self.num_labels = tagset_size

        # Initialize a PyTorch embeddings layer using the pretrained embedding weights
        self.embeddings = nn.Embedding.from_pretrained(embeddings, freeze=False)

        # Initialize an Bi-LSTM layer
        self.lstm = nn.LSTM(
            embedding_dim,
            hidden_dim,
            batch_first=True,
            num_layers=NUM_LAYERS,
            dropout=DROPOUT,
            bidirectional=BI,
        )

        # Initialize a feedforward layer
        self.hidden2tag = nn.Linear(hidden_dim * 2, tagset_size)

<IPython.core.display.Javascript object>

Run the training script below to train the `FancyTagger` model. Again, feel free to adjust any hyperparameters if necessary.

In [None]:
model = FancyTagger(embeddings, EMBEDDING_SIZE, HIDDEN_SIZE, len(tagset))
print(model)
model.run_training(
    train_dataset,
    dev_dataset,
    BATCH_SIZE,
    vocab,
    tagset,
    lr=5e-4,
    num_epochs=40,
    eval_every=5,
)

In [None]:
# remove gpu memory
# del embeddings
# del model
# torch.cuda.empty_cache()

### Save Predictions

When you are satisfied with your `FancyTagger`'s performance on the dev set, run the cell below to write your predictions on the test set to a text file. 

You can download `predictions.txt` by going to 
**View > Table of Contents > Files**

Please submit this `predictions.txt` file to Gradescope. 

In [None]:
assert isinstance(
    model, FancyTagger
), "Please assign your FancyTagger to a variable named model"

BATCH_SIZE = 32
test_batch_idx, test_batch_lens = test_dataset.get_batches(BATCH_SIZE, vocab, tagset)

predictions = []

for b in range(len(test_batch_idx)):
    logits = model.forward(test_batch_idx[b], test_batch_lens[b])
    batch_predictions = torch.argmax(logits, dim=-1).cpu().numpy()

    batch_size, _ = test_batch_idx[b].shape

    for i in range(batch_size):
        preds = batch_predictions[i]

        seq_len = int(test_batch_lens[b][i])
        for j in range(seq_len):
            predictions.append(int(preds[j]))


with open("predictions.txt", "w") as f:
    for p in predictions:
        f.write(str(p) + "\n")

In [106]:
assert isinstance(
    model, FancyTagger
), "Please assign your FancyTagger to a variable named model"

BATCH_SIZE = 32
test_batch_idx, test_batch_lens = test_dataset.get_batches(BATCH_SIZE, vocab, tagset)

predictions = []

for b in range(len(test_batch_idx)):
    logits = model.forward(test_batch_idx[b], test_batch_lens[b])
    batch_predictions = torch.argmax(logits, dim=-1).cpu().numpy()

    batch_size, _ = test_batch_idx[b].shape

    for i in range(batch_size):
        preds = batch_predictions[i]

        seq_len = int(test_batch_lens[b][i])
        for j in range(seq_len):
            predictions.append(int(preds[j]))


with open("predictions.txt", "w") as f:
    for p in predictions:
        f.write(str(p) + "\n")

FancyTagger(
  (embeddings): Embedding(400000, 300)
  (lstm): LSTM(300, 512, num_layers=4, batch_first=True, dropout=0.3, bidirectional=True)
  (hidden2tag): Linear(in_features=1024, out_features=50, bias=True)
)
**** TRAINING *****
Epoch 0 | Loss: 144.7250213623047
Epoch 1 | Loss: 81.71239471435547
Epoch 2 | Loss: 37.13271713256836
Epoch 3 | Loss: 21.516273498535156
Epoch 4 | Loss: 15.11073112487793
**** EVALUATION *****
Dev Accuracy: 0.9001391926824418
**********************
Epoch 5 | Loss: 11.63393497467041
Epoch 6 | Loss: 9.56954574584961
Epoch 7 | Loss: 8.27454948425293
Epoch 8 | Loss: 7.372381210327148
Epoch 9 | Loss: 6.488227844238281
**** EVALUATION *****
Dev Accuracy: 0.9179956253728375
**********************
Epoch 10 | Loss: 5.761561870574951
Epoch 11 | Loss: 5.354988098144531
Epoch 12 | Loss: 5.03249979019165
Epoch 13 | Loss: 4.656944751739502
Epoch 14 | Loss: 4.516438961029053
**** EVALUATION *****
Dev Accuracy: 0.920381785643269
**********************
Epoch 15 | Loss: 3.82

<IPython.core.display.Javascript object>

In [98]:
# remove gpu memory
# del embeddings
# del model
# torch.cuda.empty_cache()

<IPython.core.display.Javascript object>

### Save Predictions

When you are satisfied with your `FancyTagger`'s performance on the dev set, run the cell below to write your predictions on the test set to a text file. 

You can download `predictions.txt` by going to 
**View > Table of Contents > Files**

Please submit this `predictions.txt` file to Gradescope. 

In [107]:
assert isinstance(
    model, FancyTagger
), "Please assign your FancyTagger to a variable named model"

BATCH_SIZE = 32
test_batch_idx, test_batch_lens = test_dataset.get_batches(BATCH_SIZE, vocab, tagset)

predictions = []

for b in range(len(test_batch_idx)):
    logits = model.forward(test_batch_idx[b], test_batch_lens[b])
    batch_predictions = torch.argmax(logits, dim=-1).cpu().numpy()

    batch_size, _ = test_batch_idx[b].shape

    for i in range(batch_size):
        preds = batch_predictions[i]

        seq_len = int(test_batch_lens[b][i])
        for j in range(seq_len):
            predictions.append(int(preds[j]))


with open("predictions.txt", "w") as f:
    for p in predictions:
        f.write(str(p) + "\n")

<IPython.core.display.Javascript object>