### Implementation of the Operations
First, let's discuss implementation of the four supported operations of our shift-reduce parser.  As discussed in [part 1]({{ base.url }}/2016/08/umich_nlp_depparse_intro/), the four supported operations are `left_arc()`, `right_arc()`, `shift()`, and `reduce()`.

#### Left Arc

In [1]:
def left_arc(conf, relation):
    if not conf.buffer:
        return -1

    if conf.stack[-1] == 0:
        return -1

    for arc in conf.arcs:
        if conf.stack[-1] == arc[Transition.ARC_CHILD]:
            return -1

    b = conf.buffer[0]
    s = conf.stack.pop(-1)
    # Add the arc (b, L, s)
    conf.arcs.append((b, relation, s))
    pass
# END left_arc

#### Right Arc

In [2]:
def right_arc(conf, relation):
    if not conf.buffer or not conf.stack:
        return -1

    s = conf.stack[-1]
    b = conf.buffer.pop(0)

    conf.stack.append(b)
    conf.arcs.append((s, relation, b))
    pass
# END right_arc

#### Reduce

In [3]:
def reduce(conf):
    if not conf.stack:
        return -1

    for arc in conf.arcs:
        if conf.stack[-1] == arc[Transition.ARC_CHILD]:
            s = conf.stack.pop(-1)
            return
    return -1
# END reduce

#### Shift

In [4]:
def shift(conf):
    if not conf.buffer or not conf.stack:
        return -1

    b = conf.buffer.pop(0)
    conf.stack.append(b)
    pass
# END shift

As might be expected, the implementation of the functions is straightforward.  No explanation of the logic is given, but I think the reader should be able to interpret fairly easily.

### Extraction of Features
The four functions of the shift-reduce parser above define the actions that are available for a given configuration, but we need a brain to tell the program when to apply which function.  As discussed in part 1, we could try to create some hard-coded rules based on various configurations.  However, with this approach, the program would be inflexible and likely perform poorly.  Instead, we can use supervised machine learning to have the program learn for itself how to apply the shift-reduce functions.

As with any supervised learning problem, we need two things - a set of golden data to train the machine, and a method of extracting useful features from that golden data.  For this problem, we were provided golden data in the form of CONLL datasets for english, danish, and swedish.  This dataset essentially provides a series of configurations accompanied by the correct operation and label (if applicable).  Our assignment was to extract "features" from each configuration - properties about the configuration which provide positive predictive value for the operation to be used.

I made multiple iterations to converge to a solution; these are outlined below.

#### Data Structures
Before getting into the iterations on the feature extractor, let's first define the data structure that is used the the extractor.  As discussed previously, there are three components of a parser configuration.  $B$ is the buffer, remaining words to be parsed.  $\Sigma$ is the stack, holding words that have been processed via the `right_arc()` or `shift()` operations.  $A$ is the set of arcs that have been added to the dependency graph.

Note that $B$ and $\Sigma$ both contain words, which may have a variety of properties.  Therefore it may make sense to index those words and store them in a separate data structure.  Let's call that $T$, a list of dictionaries storing the properties of each word.

Let's take a look at an example.

In [4]:
import random
from code.providedcode import dataset
data = dataset.get_english_train_corpus.parsed_sents()
smalldata = random.sample(data, 5)

ImportError: No module named providedcode

#### Iteration #1 - Coarse POS
The first iteration of the feature extractor only made use of the coarse-grained part of speech for the top word in the stack and the buffer.  For example, $S=['company',...]$ and $\Sigma=['was',...]$ might identify two features.  "Company" would be a "NOUN" and "was" would be a "VERB".

In code, this might look like:

In [1]:
s = stack[-1]
tok = tokens[s]
result = "STK_0_TAG_{0}".format(tok['tag'].upper())

NameError: name 'stack' is not defined