### Natcha Jangphiphatnawit 63340500031

# HW3.2: Neural Transition-Based Dependency Parsing


In this exercise, you are going to build a deep learning model for Neural Networks Transition-Based Dependency Parsing. A dependency parser analyzes the grammatical structure of a sentence, establishing relationships between “head” words and words which modify those heads. Your implementation will be a transition-based parser, which incrementally builds up a parse one step at a time.

To complete this exercise, you will need to complete the code and build a deep learning model for dependency parsing. 

We provide the code for data preparation and the skeleton for PartialParse class. You do not need to understand the code outside of this notebook. 


In [2]:
# from google.colab import drive
# drive.mount('/content/drive')

In [3]:
# import shutil
# shutil.copy("/content/drive/MyDrive/FRA 501 IntroNLP&DL/Dataset/HW3-2.zip", "/content/HW3-2.zip")
# !unzip -q HW3-2.zip

## 1. Transition-Based Dependency Parsing

Your implementation will be a transition-based parser, which incrementally builds
up a parse one step at a time. At every step it maintains a partial parse, which is represented as follows:
- A stack of words that are currently being processed.
- A buffer of words yet to be processed.
- A list of dependencies predicted by the parser.

Initially, the stack only contains ROOT, the dependencies lists is empty, and the buffer contains all words
of the sentence in order. At each step, the parse applies a transition to the partial parse until its buffer is
empty and the stack is size 1. The following transitions can be applied:
- SHIFT: removes the first word from the buffer and pushes it onto the stack.
- LEFT-ARC: marks the second (second most recently added) item on the stack as a dependent of the
first item and removes the second item from the stack.
- RIGHT-ARC: marks the first (most recently added) item on the stack as a dependent of the second
item and removes the first item from the stack.

Your parser will decide among transitions at each state using a neural network classifier.

### TODO 1 (Written):
Go through the sequence of transitions needed for parsing the sentence “I parsed
this sentence correctly”. The dependency tree for the sentence is shown below. At each step, give the
configuration of the stack and buffer, as well as what transition was applied this step and what new
dependency was added (if any). The first three steps are provided below as an example.

Image --> https://drive.google.com/file/d/10jYgxDhsyolZGarcNTEdt6G2xB0l9iZU/view?usp=share_link 

Complete the following table (double click the table and fill in the rest):

| stack    |  buffer |  new dependency | transition |
| :------: |:------: | :-------------: | :--------: |
| \[ROOT\]            | \[I, parsed, this, sentence, correctly\] |                  | Initial Configuration |
| \[ROOT, I\]         | \[parsed, this, sentence, correctly\]    |                  | SHIFT |
| \[ROOT, I, parsed\] | \[this, sentence, correctly\]            |                  | SHIFT |
| \[ROOT, parsed\]    | \[this, sentence, correctly\]            | parsed→I         | LEFT-ARC |
| \[ROOT, parsed, this\]   | \[sentence, correctly\]             |                  | SHIFT |
| \[ROOT, parsed, this, sentence\]   | \[correctly\]             |                  | SHIFT |
| \[ROOT, parsed, sentence\]         | \[correctly\]             | sentence→this    | LEFT-ARC |
| \[ROOT, parsed\]         | \[correctly\]                       | parsed→sentence  | RIGHT-ARC |
| \[ROOT, parsed, correctly\]   | \[\]                           |                  | SHIFT |
| \[ROOT, parsed\]   | \[\]                                      | parsed→correctly | RIGHT-ARC |
| \[ROOT\]           | \[\]                                      | \[ROOT\]→parsed | RIGHT-ARC |

### TODO 2 (Coding):
Implement the __\_\_init\_\___ and __parse_step__ functions in the PartialParse class. Your code must past both of the following tests.

In [4]:
class PartialParse(object):
    def __init__(self, sentence):
        """Initializes this partial parse.

        Your code should initialize the following fields:
            self.stack: The current stack represented as a list with the top of the stack as the
                        last element of the list.
            self.buffer: The current buffer represented as a list with the first item on the
                         buffer as the first item of the list
            self.dependencies: The list of dependencies produced so far. Represented as a list of
                    tuples where each tuple is of the form (head, dependent).
                    Order for this list doesn't matter.

        The root token should be represented with the string "ROOT"

        Args:
            sentence: The sentence to be parsed as a list of words.
                      Your code should not modify the sentence.
        """
        # The sentence being parsed is kept for bookkeeping purposes. Do not use it in your code.
        self.sentence = sentence #--list

        ### YOUR CODE HERE
        #self.stack = ?  --> list
        #self.buffer = ? --> list
        #self.dependencies = ?  --> list
        self.stack = ["ROOT"]
        self.buffer = self.sentence
        self.dependencies = []

        ### END YOUR CODE

    def parse_step(self, transition):
        """Performs a single parse step by applying the given transition to this partial parse

        Args:
            transition: A string that equals "S", "LA", or "RA" representing the shift, left-arc,
                        and right-arc transitions. You can assume the provided transition is a legal
                        transition.
        """
        ### YOUR CODE HERE

        if transition == 'S':
            self.stack.append(self.buffer[0])
            self.buffer = self.buffer[1:]

        elif transition == 'LA':
            H = self.stack[-1]
            D = self.stack[-2]
            self.stack.pop(-2)
            self.dependencies.append((H, D))

        else:
            H = self.stack[-2]
            D = self.stack[-1]
            self.stack.pop(-1)
            self.dependencies.append((H, D))
        # END YOUR CODE

    def parse(self, transitions):
        """Applies the provided transitions to this PartialParse

        Args:
            transitions: The list of transitions in the order they should be applied
        Returns:
            dependencies: The list of dependencies produced when parsing the sentence. Represented
                          as a list of tuples where each tuple is of the form (head, dependent)
        """
        for transition in transitions:
            self.parse_step(transition)
        return self.dependencies


In [5]:
# Do not modify this code
def test_step(name, transition, stack, buf, deps,
              ex_stack, ex_buf, ex_deps):
    """Tests that a single parse step returns the expected output"""
    pp = PartialParse([])
    pp.stack, pp.buffer, pp.dependencies = stack, buf, deps

    pp.parse_step(transition)
    stack, buf, deps = (tuple(pp.stack), tuple(pp.buffer), tuple(sorted(pp.dependencies)))
    assert stack == ex_stack, \
        "{:} test resulted in stack {:}, expected {:}".format(name, stack, ex_stack)
    assert buf == ex_buf, \
        "{:} test resulted in buffer {:}, expected {:}".format(name, buf, ex_buf)
    assert deps == ex_deps, \
        "{:} test resulted in dependency list {:}, expected {:}".format(name, deps, ex_deps)
    print("{:} test passed!".format(name))


def test_parse_step():
    """Simple tests for the PartialParse.parse_step function
    Warning: these are not exhaustive
    """
    test_step("SHIFT", "S", ["ROOT", "the"], ["cat", "sat"], [],
              ("ROOT", "the", "cat"), ("sat",), ())
    test_step("LEFT-ARC", "LA", ["ROOT", "the", "cat"], ["sat"], [],
              ("ROOT", "cat",), ("sat",), (("cat", "the"),))
    test_step("RIGHT-ARC", "RA", ["ROOT", "run", "fast"], [], [],
              ("ROOT", "run",), (), (("run", "fast"),))


def test_parse():
    """Simple tests for the PartialParse.parse function
    Warning: these are not exhaustive
    """
    sentence = ["parse", "this", "sentence"]
    dependencies = PartialParse(sentence).parse(["S", "S", "S", "LA", "RA", "RA"])
    dependencies = tuple(sorted(dependencies))
    expected = (('ROOT', 'parse'), ('parse', 'sentence'), ('sentence', 'this'))
    assert dependencies == expected,  \
        "parse test resulted in dependencies {:}, expected {:}".format(dependencies, expected)
    assert tuple(sentence) == ("parse", "this", "sentence"), \
        "parse test failed: the input sentence should not be modified"
    print("parse test passed!")

In [6]:
test_parse_step()
test_parse()

SHIFT test passed!
LEFT-ARC test passed!
RIGHT-ARC test passed!
parse test passed!


## 2. Setup and Preprocessing

In [7]:
from utils.parser_utils import minibatches, load_and_preprocess_data

Preparing data. We will use a subset of Penn Treebank and pretrained embeddings in this task

We are now going to train a neural network to predict, given the state of the stack, buffer, and dependencies, which transition should be applied next. First, the model extracts a feature vector representing the current state. We will be using the feature set presented in the original neural dependency parsing paper: A Fast and Accurate Dependency Parser using Neural Networks. 

The function extracting these features has been implemented for you in parser_utils. This feature vector consists of a list of tokens (e.g., the last word in the stack, first word in the buffer, dependent of the second-to-last word in the stack if there is one, etc.). They can be represented as a list of integers.

In [8]:
parser, embeddings, train_examples, dev_set, test_set = load_and_preprocess_data(True)

Loading data...
took 2.24 seconds
Building parser...
took 0.03 seconds
Loading pretrained embeddings...
took 3.11 seconds
Vectorizing data...
took 0.08 seconds
Preprocessing training data...
took 1.21 seconds


In [9]:
print(len(train_examples), len(dev_set), len(test_set))

48390 500 500


In [10]:
train_examples[10]

([5156,
  660,
  88,
  96,
  85,
  2131,
  5155,
  5155,
  5155,
  5155,
  5155,
  5155,
  91,
  5155,
  113,
  5155,
  5155,
  5155,
  84,
  39,
  40,
  61,
  41,
  39,
  83,
  83,
  83,
  83,
  83,
  83,
  40,
  83,
  41,
  83,
  83,
  83],
 [1, 1, 1],
 2)

In [11]:
embeddings

array([[ 1.5665921 ,  0.5537971 , -0.597256  , ..., -0.8123649 ,
         0.11324178,  1.052584  ],
       [ 0.18855253,  0.10662115,  0.3466703 , ..., -0.2465295 ,
         1.0177017 , -1.779257  ],
       [ 1.1268848 , -0.97388947, -1.5384685 , ..., -1.564026  ,
        -0.7775789 ,  0.30446744],
       ...,
       [-0.25723696, -0.9430154 , -1.2355871 , ...,  0.08056545,
         0.17923927,  0.8492005 ],
       [-1.7813855 , -0.66680807,  0.55915034, ...,  1.4307954 ,
         0.42832664,  1.309977  ],
       [ 0.06326196,  0.18182193,  0.29931995, ...,  0.39007783,
         1.5640349 , -0.2652124 ]], dtype=float32)

In [12]:
print(embeddings.shape)

(5157, 50)


Get the full batch of our subset data

In [13]:
minibatch_gen = minibatches(train_examples, len(train_examples))
x_train, y_train = minibatch_gen.__next__()

In [14]:
print(x_train.shape)
print(y_train.shape)

(48390, 36)
(48390, 3)


In [15]:
x_train[0]

array([5155, 5156,  861,   87, 5155, 5155,  882,  373, 5155, 1145,   85,
       2740, 5155, 5155, 5155, 5155, 5155, 5155,   83,   84,   55,   46,
         83,   83,   39,   39,   83,   47,   41,   44,   83,   83,   83,
         83,   83,   83])

You can use parser.id2tok[word_id] to lookup the word in English.

In [16]:
for word_id in x_train[0]:
  print(parser.id2tok[word_id])

<NULL>
<ROOT>
shows
.
<NULL>
<NULL>
dispute
power
<NULL>
clearly
the
titans
<NULL>
<NULL>
<NULL>
<NULL>
<NULL>
<NULL>
<p>:<NULL>
<p>:<ROOT>
<p>:VBZ
<p>:.
<p>:<NULL>
<p>:<NULL>
<p>:NN
<p>:NN
<p>:<NULL>
<p>:RB
<p>:DT
<p>:NNS
<p>:<NULL>
<p>:<NULL>
<p>:<NULL>
<p>:<NULL>
<p>:<NULL>
<p>:<NULL>


In [17]:
y_train[0]

array([0., 0., 1.])

## 3. Model

In [18]:
import keras
from keras.regularizers import l2
from keras.models import Sequential, Model
from keras.layers import Embedding, Reshape, Activation, Input, Dense, Reshape, Dropout, Flatten
from keras.optimizers import Adam

### TODO 3 (Coding):
Build and train a tensroflow keras model to predict an action for each state of of the input. This is a simple classification task. 
- The input and output of the model must match the dimention of x_train and y_train.
- The model must use the provided pretrained embeddings
- The model could comprise of only a feedforward layer and a dropout
- Training loss should be around 0.1 or below, and training categorical_accuracy above 0.94

In [19]:
dictionary_count = (embeddings.shape)[0]
vector_len = (embeddings.shape)[1]
input_length = (x_train.shape)[1]

In [37]:
model = Sequential()
model.add(Embedding(dictionary_count, vector_len, input_length=input_length, mask_zero=False, 
                        embeddings_initializer=keras.initializers.Constant(embeddings), 
                        trainable=False))

model.add(Flatten())

model.add(Dense(256, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(3, activation='softmax'))
opt=Adam(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

model.summary()
# Write your code here

Model: "sequential_9"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_9 (Embedding)     (None, 36, 50)            257850    
                                                                 
 flatten_9 (Flatten)         (None, 1800)              0         
                                                                 
 dense_33 (Dense)            (None, 256)               461056    
                                                                 
 dropout_13 (Dropout)        (None, 256)               0         
                                                                 
 dense_34 (Dense)            (None, 32)                8224      
                                                                 
 dense_35 (Dense)            (None, 16)                528       
                                                                 
 dense_36 (Dense)            (None, 3)                

In [38]:
# Write your code here
my_scheduler = keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.1, patience=1, min_lr=0.00000001, verbose=1, min_delta=0.05)
model.fit(x_train, y_train, epochs=15, callbacks=my_scheduler)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 3: ReduceLROnPlateau reducing learning rate to 0.0009999999776482583.
Epoch 4/15
Epoch 5/15
Epoch 5: ReduceLROnPlateau reducing learning rate to 9.999999310821295e-05.
Epoch 6/15
Epoch 6: ReduceLROnPlateau reducing learning rate to 9.999999019782991e-06.
Epoch 7/15
Epoch 7: ReduceLROnPlateau reducing learning rate to 9.99999883788405e-07.
Epoch 8/15
Epoch 8: ReduceLROnPlateau reducing learning rate to 9.99999883788405e-08.
Epoch 9/15
Epoch 9: ReduceLROnPlateau reducing learning rate to 1e-08.
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x1fb0a077b20>

In [72]:
model.evaluate(x_train, y_train)



[0.11782407015562057, 0.9573672413825989]