# CS 584 Assignment 5 -- Dependency Parsing

#### Name: Prithvi Vadlamani
#### Stevens ID: 10476457

## In this assignment, you are required to follow the steps below:
1. Review the lecture slides.
2. implementing a neural-network based dependency parser with the goal of maximizing performance on the UAS (Unlabeled Attachment Score) metric

In this assignment, we will use Tensorflow.

```console
pip install -r requirements.txt
```
- It's better to train the Tensorflow model with GPU and CUDA. If they are not available on your local machine, please consider Google CoLab. You can check `CoLab.md` in this assignments.
- You are **NOT** allowed to use other packages unless otherwise specified.
- You are **ONLY** allowed to edit the code between `# Start your code here` and `# End` for each block.

In [None]:
pip install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting numpy
  Downloading numpy-1.22.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.9 MB)
[K     |████████████████████████████████| 16.9 MB 248 kB/s 
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.23.5
    Uninstalling numpy-1.23.5:
      Successfully uninstalled numpy-1.23.5
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires notebook~=5.7.16, but you have notebook 6.5.2 which is incompatible.
google-colab 1.0.0 requires tornado~=6.0.4, but you have tornado 6.2 which is incompatible.[0m
Successfully installed numpy-1.22.4


In [None]:
!pip3 install numpy --upgrade

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting numpy
  Using cached numpy-1.23.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.22.4
    Uninstalling numpy-1.22.4:
      Successfully uninstalled numpy-1.22.4
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
scipy 1.7.3 requires numpy<1.23.0,>=1.16.5, but you have numpy 1.23.5 which is incompatible.
google-colab 1.0.0 requires notebook~=5.7.16, but you have notebook 6.5.2 which is incompatible.
google-colab 1.0.0 requires tornado~=6.0.4, but you have tornado 6.2 which is incompatible.[0m
Successfully installed numpy-1.23.5


In [None]:
import sys
import os


def print_line(*args):
    """ Inline print and go to the begining of line
    """
    args1 = [str(arg) for arg in args]
    str_ = ' '.join(args1)
    print('\r' + str_, end='')

In [None]:
import tensorflow as tf


# If you are going to use GPU, make sure the GPU in in the output
tf.config.list_physical_devices('GPU')



[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

## Dependency Parsing

A dependency parser analyzes the grammatical structure of a sentence, establishing relationships between head words, and words which modify those heads. There are multiple types of dependency parsers, including transition-based parsers, graph-based parsers, and feature-based parsers. Your implementation will be a transition-based parser, which incrementally builds up a parse one step at a time. At every step it maintains a partial parse, which is represented as follows:
* A *stack* of words that are currently being processed
* A *buffer* of words yet to be processed.
* A list of *dependencies* predicted by the parser.


Initially, the stack only contains ROOT, the dependencies list is empty, and the buffer contains all words of the sentence in order. At each step, the parser applies a transition to the partial parse until its buffer is empty and the stack size is 1. The following transitions can be applied:
* **SHIFT**: removes the first word from the buffer and pushes it onto the stack.
* **LEFT-ARC**: marks the second (second most recently added) item on the stack as a dependent of the first item and removes the second item from the stack, adding a first word → second word dependency to the dependency list.
* **RIGHT-ARC**: marks the first (most recently added) item on the stack as a dependent of the second item and removes the first item from the stack, adding a second word → first word dependency to the dependency list.

On each step, your parser will decide among the three transitions using a neural network classifier.


## 1. Transition Mechanics (60 points)
In this section, you need to implement the transition mechanics your parser will use.


### 1.1 and 1.2 are written questions, please check the pdf handout for the details. (20 points)

### 1.3 Parsing (Fill in the code, 20 points)

There are two functions
1. \_\_init\_\_() (10 points)
2. parse_step() (10 points)

Please follow the comments and fill in your code.

In [None]:
class PartialParse(object):
    def __init__(self, sentence):
        """Initializes this partial parse.

        @param sentence (list of str): The sentence to be parsed as a list of words.
                                        Your code should not modify the sentence.
        """
        # The sentence being parsed is kept for bookkeeping purposes. Do NOT alter it in your code.
        self.sentence = sentence

        ### Start your code
        ### Your code should initialize the following fields:
        ###     self.stack: The current stack represented as a list with the top of the stack as the
        ###                 last element of the list.
        ###     self.buffer: The current buffer represented as a list with the first item on the
        ###                  buffer as the first item of the list
        ###     self.dependencies: The list of dependencies produced so far. Represented as a list of
        ###             tuples where each tuple is of the form (head, dependent).
        ###             Order for this list doesn't matter.
        ###
        ### Note: The root token should be represented with the string "ROOT"
        ### Note: If you need to use the sentence object to initialize anything, make sure to not directly 
        ###       reference the sentence object.  That is, remember to NOT modify the sentence object. 
        
        self.stack = ["ROOT"]
        self.buffer = []
        for sent in self.sentence:
          self.buffer.append(sent)
        self.dependencies = []
        ### End


    def parse_step(self, transition):
        """Performs a single parse step by applying the given transition to this partial parse

        @param transition (str): A string that equals "S", "LA", or "RA" representing the shift,
                                left-arc, and right-arc transitions. You can assume the provided
                                transition is a legal transition.
        """
        ### Start your code
        ### TODO:
        ###     Implement a single parsing step, i.e. the logic for the following as
        ###     described above:
        ###         1. Shift
        ###         2. Left Arc
        ###         3. Right Arc

        if transition == "S":
            t = self.buffer[0]
            self.stack.append(t)
            self.buffer.pop(0)
        elif transition == "LA":
            self.dependencies.append((self.stack[-1],self.stack[-2]))
            self.stack.pop(-2)
        else:
            self.dependencies.append((self.stack[-2],self.stack[-1]))
            self.stack.pop(-1)

        ### End
        
    def should_stop(self):
        return len(self.buffer) == 0 and len(self.stack) <= 1

    def parse(self, transitions):
        """Applies the provided transitions to this PartialParse

        @param transitions (list of str): The list of transitions in the order they should be applied

        @return dependencies (list of string tuples): The list of dependencies produced when
                                                        parsing the sentence. Represented as a list of
                                                        tuples where each tuple is of the form (head, dependent).
        """
        for transition in transitions:
            self.parse_step(transition)
        return self.dependencies

Execute the following cell to test your implementation.

In [None]:
def test_step(name, transition, stack, buf, deps,
              ex_stack, ex_buf, ex_deps):
    """Tests that a single parse step returns the expected output"""
    pp = PartialParse([])
    pp.stack, pp.buffer, pp.dependencies = stack, buf, deps

    pp.parse_step(transition)
    stack, buf, deps = (tuple(pp.stack), tuple(pp.buffer), tuple(sorted(pp.dependencies)))
    assert stack == ex_stack, \
        "{:} test resulted in stack {:}, expected {:}".format(name, stack, ex_stack)
    assert buf == ex_buf, \
        "{:} test resulted in buffer {:}, expected {:}".format(name, buf, ex_buf)
    assert deps == ex_deps, \
        "{:} test resulted in dependency list {:}, expected {:}".format(name, deps, ex_deps)
    print("{:} test passed!".format(name))

def test_parse_step():
    """Simple tests for the PartialParse.parse_step function
    Warning: these are not exhaustive
    """
    test_step("SHIFT", "S", ["ROOT", "the"], ["cat", "sat"], [],
              ("ROOT", "the", "cat"), ("sat",), ())
    test_step("LEFT-ARC", "LA", ["ROOT", "the", "cat"], ["sat"], [],
              ("ROOT", "cat",), ("sat",), (("cat", "the"),))
    test_step("RIGHT-ARC", "RA", ["ROOT", "run", "fast"], [], [],
              ("ROOT", "run",), (), (("run", "fast"),))
    
def test_parse():
    """Simple tests for the PartialParse.parse function
    Warning: these are not exhaustive
    """
    sentence = ["parse", "this", "sentence"]
    dependencies = PartialParse(sentence).parse(["S", "S", "S", "LA", "RA", "RA"])
    dependencies = tuple(sorted(dependencies))
    expected = (('ROOT', 'parse'), ('parse', 'sentence'), ('sentence', 'this'))
    assert dependencies == expected,  \
        "parse test resulted in dependencies {:}, expected {:}".format(dependencies, expected)
    assert tuple(sentence) == ("parse", "this", "sentence"), \
        "parse test failed: the input sentence should not be modified"
    print("parse test passed!")

test_parse_step()
test_parse()

SHIFT test passed!
LEFT-ARC test passed!
RIGHT-ARC test passed!
parse test passed!


### 1.4 Minibatch Dependecy Parsing (Fill in the code, 20 points)

Since neural networks run much more efficiently when making predictions about batches of data at a time, in this section, you need to implement this Minibatch algorithm.

In [None]:
def minibatch_parse(sentences, model, batch_size):
    """Parses a list of sentences in minibatches using a model.

    @param sentences (list of list of str): A list of sentences to be parsed
                                            (each sentence is a list of words and each word is of type string)
    @param model (ParserModel): The model that makes parsing decisions. It is assumed to have a function
                                model.predict(partial_parses) that takes in a list of PartialParses as input and
                                returns a list of transitions predicted for each parse. That is, after calling
                                    transitions = model.predict(partial_parses)
                                transitions[i] will be the next transition to apply to partial_parses[i].
    @param batch_size (int): The number of PartialParses to include in each minibatch


    @return dependencies (list of dependency lists): A list where each element is the dependencies
                                                    list for a parsed sentence. Ordering should be the
                                                    same as in sentences (i.e., dependencies[i] should
                                                    contain the parse for sentences[i]).
    """
    dependencies = []

    ### Start to code
    ### TODO:
    ###     Implement the minibatch parse algorithm.
    ###     1. Build a parser list for all sentences
    ###     2. Build an unfinished parser list that equals to the parser list
    ###     3. Run a while loop when the unfinished_parser list is not empty
    ###     4.     In the while loop, retrieve the first batch of the parser list
    ###     5.     Use mode.predict to predict transition strings for this batch.
    ###     6.     for every parser in this batch, do the parse in the PartialParse class with the predicted transition
    ###     7.     Iterate all parser and only keep parser that should not stop in the unfinished parser list
    ###     8. After all parser is finished, retrieve the dependencies

    partial_parses = [PartialParse(sent) for sent in sentences]
    unfinished_parse = partial_parses

    while len(unfinished_parse) > 0:
        minibatch = unfinished_parse[0:batch_size]        
        while len(minibatch) > 0:
            transitions = model.predict(minibatch)
            for index, action in enumerate(transitions):
                minibatch[index].parse_step(action)
            minibatch = [parse for parse in minibatch if len(parse.stack) > 1 or len(parse.buffer) > 0]        
        unfinished_parse = unfinished_parse[batch_size:]

    dependencies = []
    
    for n in range(len(sentences)):
        dependencies.append(partial_parses[n].dependencies)

    ### End

    return dependencies

Run the following cell to test your implementation.

**Note:** You will need minibatch parse to be correctly implemented to evaluate the model you will build in the next sections.

In [None]:
class DummyModel(object):
    """Dummy model for testing the minibatch_parse function
    """
    def __init__(self, mode = "unidirectional"):
        self.mode = mode

    def predict(self, partial_parses):
        if self.mode == "unidirectional":
            return self.unidirectional_predict(partial_parses)
        elif self.mode == "interleave":
            return self.interleave_predict(partial_parses)
        else:
            raise NotImplementedError()

    def unidirectional_predict(self, partial_parses):
        """First shifts everything onto the stack and then does exclusively right arcs if the first word of
        the sentence is "right", "left" if otherwise.
        """
        return [("RA" if pp.stack[1] == "right" else "LA") if len(pp.buffer) == 0 else "S"
                for pp in partial_parses]

    def interleave_predict(self, partial_parses):
        """First shifts everything onto the stack and then interleaves "right" and "left".
        """
        return [("RA" if len(pp.stack) % 2 == 0 else "LA") if len(pp.buffer) == 0 else "S"
                for pp in partial_parses]

def test_dependencies(name, deps, ex_deps):
    """Tests the provided dependencies match the expected dependencies"""
    deps = tuple(sorted(deps))
    assert deps == ex_deps, \
        "{:} test resulted in dependency list {:}, expected {:}".format(name, deps, ex_deps)


def test_minibatch_parse():
    """Simple tests for the minibatch_parse function
    Warning: these are not exhaustive
    """

    # Unidirectional arcs test
    sentences = [["right", "arcs", "only"],
                 ["right", "arcs", "only", "again"],
                 ["left", "arcs", "only"],
                 ["left", "arcs", "only", "again"]]
    deps = minibatch_parse(sentences, DummyModel(), 2)
    
    test_dependencies("minibatch_parse", deps[0],
                      (('ROOT', 'right'), ('arcs', 'only'), ('right', 'arcs')))
    test_dependencies("minibatch_parse", deps[1],
                      (('ROOT', 'right'), ('arcs', 'only'), ('only', 'again'), ('right', 'arcs')))
    test_dependencies("minibatch_parse", deps[2],
                      (('only', 'ROOT'), ('only', 'arcs'), ('only', 'left')))
    test_dependencies("minibatch_parse", deps[3],
                      (('again', 'ROOT'), ('again', 'arcs'), ('again', 'left'), ('again', 'only')))

    # Out-of-bound test
    sentences = [["right"]]
    deps = minibatch_parse(sentences, DummyModel(), 2)
    test_dependencies("minibatch_parse", deps[0], (('ROOT', 'right'),))

    # Mixed arcs test
    sentences = [["this", "is", "interleaving", "dependency", "test"]]
    deps = minibatch_parse(sentences, DummyModel(mode="interleave"), 1)
    test_dependencies("minibatch_parse", deps[0],
                      (('ROOT', 'is'), ('dependency', 'interleaving'),
                      ('dependency', 'test'), ('is', 'dependency'), ('is', 'this')))
    print("minibatch_parse test passed!")

test_minibatch_parse()

minibatch_parse test passed!


## 2. Neural Networks for parsing (40 points)

In this section, you are going to build and train a neural network to predict, given the state of the stack, buffer, and dependencies, which transition should be applied next.

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import ReLU, Softmax
from tensorflow.keras.initializers import GlorotUniform, RandomUniform

## 2.1 Create your model (fill the code, 20 points)
The input of model is a list of integers $\mathbf{w} =[w_1, w_2, ..., w_m]$ where $m$ is the number of features. Then our network looks up an embedding for each word and concatenates them into a single input vector:

$\mathbf{x} = [\mathbf{E}_{w_1}, \mathbf{E}_{w_2}, ..., \mathbf{E}_{w_m}]$

where $\mathbf{E}\in \mathbb{R}^{|V|\times d}$ is an embedding matrix with each row $\mathbf{E}_w$ as the vector for a particular word $w$.

Then, we compuate our prediction as:

> $\mathbf{h} = ReLU(xW + b_1)$

> $\mathbf{l} = hU + b_2$

> $\hat{y} = softmax(l)$

where $\mathbf{h}$ is referred to as the hidden layer, $\mathbf{l}$ is referred to as the logits, $\hat{y}$ is referred to as the predictions, and $ReLU(z)=max(z, 0)$.

We will train the model to minimize cross-entropty loss:

> $J(\theta)=CE(y, \hat{y})=-\sum_{i=1}^3 y_i \log \hat{y}_i$

To compute the loss for the training set, we average this J(θ) across all training examples.
We will use UAS score as our evaluation metric. UAS refers to Unlabeled Attachment Score, which is computed as the ratio between number of correctly predicted dependencies and the number of total dependencies despite of the relations (our model doesn’t predict this).

**Note:**
To test your understanding of embedding lookup, so **DO NOT** use any high level API like **tf.keras.layers.Embedding** and **tf.keras.layers.Dense** in your code, otherwise you will receive deductions.

**Hints:**
* Each of the variables you are asked to declare (self.embed to hidden weight, self.embed to hidden bias, self.hidden to logits weight, self.hidden to logits bias) corresponds to one of the variables above (W, b1, U, b2).
* It should take about 1 hour to train the model on the entire the training dataset, i.e., when debug mode is disabled.

In [None]:
class ParserModel(Model):
    """ Feedforward neural network with an embedding layer and two hidden layers.
    The ParserModel will predict which transition should be applied to a
    given partial parse configuration.
    """
    def __init__(self, embeddings, n_features=36, hidden_size=200, n_classes=3):
        """ Initialize the parser model.

        @param embeddings (ndarray): word embeddings (num_words, embedding_size)
        @param n_features (int): number of input features
        @param hidden_size (int): number of hidden units
        @param n_classes (int): number of output classes
        """
        super(ParserModel, self).__init__()
        self.n_features = n_features
        self.n_classes = n_classes
        self.embed_size = embeddings.shape[1]
        self.hidden_size = hidden_size
        self.embeddings = tf.convert_to_tensor(embeddings, dtype=tf.float32)

        ### Start your code
        ### TODO:
        ###     1) Declare `self.w` and `self.b1` using `self.add_weight`.
        ###        Initialize weight with the `GlorotUniform` function and bias with `RandomUniform`
        ###        with default parameters.
        ###     2) Declare `self.u` and `self.b2` using `self.add_weight`.
        ###        Initialize weight with the `GlorotUniform` function and bias with `RandomUniform`
        ###        with default parameters.
        ###     3) Declare `self.relu`
        ###
        ### Please see the following docs for support:
        ###     add_weight: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer#add_weight
        ###     Initialization: https://www.tensorflow.org/api_docs/python/tf/keras/initializers
        ### 

        self.w = self.add_weight(name = 'input_weight', shape=(self.embed_size*self.n_features, self.hidden_size), initializer = GlorotUniform, trainable = True)
        self.b1 = self.add_weight(name = 'input_bias',shape=(1, self.hidden_size), initializer = RandomUniform, trainable = True)
        
        self.u = self.add_weight(name = 'hidden_weight', shape=(self.hidden_size, self.n_classes), initializer=GlorotUniform, trainable=True)
        self.b2 = self.add_weight(name = 'hidden_bias', shape=(1, self.n_classes), initializer=RandomUniform, trainable= True)
        
        self.relu = ReLU()

        ### End

    def embedding_lookup(self, w):
        """ Utilize `w` to select embeddings from embedding matrix `self.embeddings`
            @param w (Tensor): input tensor of word indices (batch_size, n_features)

            @return x (Tensor): tensor of embeddings for words represented in w
                                (batch_size, n_features * embed_size)
        """

        ### Start your code
        ### TODO:
        ###     1) For each index `i` in `w`, select `i`th vector from self.embeddings
        ###     2) Reshape the tensor using `tf.reshape` to concatenates them into a single vector

        x = tf.reshape(tf.gather(self.embeddings, w), (w.shape[0], w.shape[1] * self.embed_size))

        ### End
        
        return x


    def call(self, w):
        """ Run the model forward.

            Note that we will not apply the softmax function here because we will use logits in the loss function

        @param w (Tensor): input tensor of tokens (batch_size, n_features)

        @return logits (Tensor): tensor of predictions (output after applying the layers of the network)
                                 without applying softmax (batch_size, n_classes)
        """
        ### Start your code
        ### TODO:
        ###     Complete the forward computation as described in write-up.

        x = self.embedding_lookup(w)
        h = self.relu(tf.linalg.matmul(x, self.w) + self.b1)
        logits = tf.linalg.matmul(h, self.u) + self.b2

        ### End
        return logits

Run the follwoing cell to test your model.

In [None]:
import numpy as np


np.random.seed(6666)
embeddings = np.random.randn(100, 30)
model = ParserModel(embeddings)

def check_embedding():
    inds = tf.random.uniform((4, 36), 0, 100, dtype=tf.int64)
    selected = model.embedding_lookup(inds)
    assert (selected.numpy() - embeddings[inds].reshape((4, -1))).sum() < 1e-5, "The result of embedding lookup: " \
                                                                                + repr(selected) \
                                                                                + " does not match the original embeddings."

def check_forward():
    inputs = tf.random.uniform((4, 36), 0, 100, dtype=tf.int64)
    out = model(inputs)
    expected_out_shape = (4, 3)
    assert out.shape == expected_out_shape, "The result shape of forward is: " + repr(out.shape) + \
                                            " which doesn't match expected " + repr(expected_out_shape)

check_embedding()
print("Embedding_lookup sanity check passes!")

check_forward()
print("Forward sanity check passes!")

Embedding_lookup sanity check passes!
Forward sanity check passes!


### 2.2 Training (Fill the code, 20 points)

In [None]:
def train(parser, train_data, dev_data, output_path, batch_size=1024, n_epochs=10, lr=0.0005):
    """ Train the neural dependency parser.

    @param parser (Parser): Neural Dependency Parser
    @param train_data ():
    @param dev_data ():
    @param output_path (str): Path to which model weights and results are written.
    @param batch_size (int): Number of examples in a single batch
    @param n_epochs (int): Number of training epochs
    @param lr (float): Learning rate
    """
    best_dev_UAS = 0


    ### Start your code
    ### TODO:
    ###      1) Construct Adam Optimizer in variable `optimizer`
    ###      2) Construct the Cross Entropy Loss Function in variable `loss_func` with `mean`
    ###         reduction (default)
    ###

    optimizer = tf.keras.optimizers.Adam(learning_rate= lr)
    loss_func = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

    ### END

    for epoch in range(n_epochs):
        print("Epoch {:} out of {:}".format(epoch + 1, n_epochs))
        dev_UAS = train_for_epoch(parser, train_data, dev_data, optimizer, loss_func, batch_size)
        if dev_UAS > best_dev_UAS:
            best_dev_UAS = dev_UAS
            print("New best dev UAS! Saving model.")
            parser.model.save_weights(output_path)
        print("")


def train_for_epoch(parser, train_data, dev_data, optimizer, loss_func, batch_size):
    """ Train the neural dependency parser for single epoch.

    @param parser (Parser): Neural Dependency Parser
    @param train_data ():
    @param dev_data ():
    @param optimizer (nn.Optimizer): Adam Optimizer
    @param loss_func (nn.CrossEntropyLoss): Cross Entropy Loss Function
    @param batch_size (int): batch size

    @return dev_UAS (float): Unlabeled Attachment Score (UAS) for dev data
    """
    n_minibatches = math.ceil(len(train_data) / batch_size)
    loss_meter = AverageMeter()

    for i, (train_x, train_y) in enumerate(minibatches(train_data, batch_size)):
        if i % 10 == 0 or i == n_minibatches - 1:
            print_line(f'Step {i + 1} / {n_minibatches}')
        loss = 0. # store loss for this batch here
        train_x = tf.convert_to_tensor(train_x, dtype=tf.int64)
        train_y = tf.convert_to_tensor(train_y, dtype=tf.int64)

        ### Start your code
        ### TODO:
        ###      1) Run train_x forward through model to produce `logits`
        ###      2) Use the `loss_func` parameter to apply the CrossEntropyLoss function.
        ###         This will take `train_y` and `logits` as inputs. It will output the CrossEntropyLoss
        ###         between softmax(`logits`) and `train_y`. Remember that softmax(`logits`)
        ###         are the predictions (y^ from the PDF).
        ###      3) Backprop losses
        ###      4) Take step with the optimizer
        ### You can refer to our previous Assignment:

        with tf.GradientTape() as tape:
          output = parser.model(train_x)
          loss = loss_func(train_y, output)
          train_vars = parser.model.trainable_variables
          gradients = tape.gradient(loss, train_vars)
          optimizer.apply_gradients(zip(gradients, train_vars))

        # output = parser.model(train_x)
        # loss = loss_func(train_y, output)
        # train_vars = parser.model.trainable_variables
        # gradients = tf.GradientTape().gradient(loss, train_vars)
        # optimizer.apply_gradients(zip(gradients, train_vars))

        ### End

        loss_meter.update(loss.numpy())
    print('\n')

    print ("Average Train Loss: {}".format(loss_meter.avg))

    print("Evaluating on dev set",)
    dev_UAS, _ = parser.parse(dev_data)
    print("- dev UAS: {:.2f}".format(dev_UAS * 100.0))
    return dev_UAS

Run the following cell to train your model

In [None]:
import os
import math
import time
from datetime import datetime
from tqdm.notebook import tqdm
from utils.parser_utils import minibatches, load_and_preprocess_data, AverageMeter

debug = False ## Set to True if you want to debug your code

print(80 * "=")
print("INITIALIZING")
print(80 * "=")
parser, embeddings, train_data, dev_data, test_data = load_and_preprocess_data(minibatch_parse, debug)

INITIALIZING
Loading data...
took 4.50 seconds
Building parser...
took 2.21 seconds
Loading pretrained embeddings...
took 2.20 seconds
Vectorizing data...
took 2.07 seconds
Preprocessing training data...


100%|██████████| 39832/39832 [00:46<00:00, 854.12it/s]


took 46.65 seconds


In [None]:
import random


seed = 6669
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)

start = time.time()
model = ParserModel(embeddings)
parser.model = model
print("took {:.2f} seconds\n".format(time.time() - start))

print(80 * "=")
print("TRAINING")
print(80 * "=")
output_dir = "results/{:%Y%m%d_%H%M%S}/".format(datetime.now())
output_path = output_dir + "model.weights"

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

train(parser, train_data, dev_data, output_path, batch_size=1024, n_epochs=10, lr=0.0005)

if not debug:
    print(80 * "=")
    print("TESTING")
    print(80 * "=")
    print("Restoring the best model weights found on the dev set")
    parser.model.load_weights(output_path)
    print("Final evaluation on test set",)
    UAS, dependencies = parser.parse(test_data)
    print("- test UAS: {:.2f}".format(UAS * 100.0))
    print("Done!")

took 0.01 seconds

TRAINING
Epoch 1 out of 10
Step 1848 / 1848

Average Train Loss: 0.15447135853431956
Evaluating on dev set
- dev UAS: 83.40
New best dev UAS! Saving model.

Epoch 2 out of 10
Step 1848 / 1848

Average Train Loss: 0.09652193447220184
Evaluating on dev set
- dev UAS: 85.71
New best dev UAS! Saving model.

Epoch 3 out of 10
Step 1848 / 1848

Average Train Loss: 0.08456325242739349
Evaluating on dev set
- dev UAS: 86.33
New best dev UAS! Saving model.

Epoch 4 out of 10
Step 1848 / 1848

Average Train Loss: 0.07704085248091068
Evaluating on dev set
- dev UAS: 87.22
New best dev UAS! Saving model.

Epoch 5 out of 10
Step 1848 / 1848

Average Train Loss: 0.07144228303684043
Evaluating on dev set
- dev UAS: 87.17

Epoch 6 out of 10
Step 1848 / 1848

Average Train Loss: 0.06701429121728454
Evaluating on dev set
- dev UAS: 87.43
New best dev UAS! Saving model.

Epoch 7 out of 10
Step 1848 / 1848

Average Train Loss: 0.06344716526463112
Evaluating on dev set
- dev UAS: 87.40



If you implement correctly, the dev UAS will be about 88. The training loss will be about 0.06

**Conlusion: dev UAS is 87.71, Average Loss is 0.054, test dev UAS is 88.24.**