# Homework and bake-off: Relation extraction using distant supervision

In [1]:
__author__ = "Bill MacCartney and Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2020"

## Contents

1. [Overview](#Overview)
1. [Set-up](#Set-up)
1. [Baselines](#Baselines)
  1. [Hand-build feature functions](#Hand-build-feature-functions)
  1. [Distributed representations](#Distributed-representations)
1. [Homework questions](#Homework-questions)
  1. [Different model factory [1 points]](#Different-model-factory-[1-points])
  1. [Directional unigram features [1.5 points]](#Directional-unigram-features-[1.5-points])
  1. [The part-of-speech tags of the "middle" words [1.5 points]](#The-part-of-speech-tags-of-the-"middle"-words-[1.5-points])
  1. [Bag of Synsets [2 points]](#Bag-of-Synsets-[2-points])
  1. [Your original system [3 points]](#Your-original-system-[3-points])
1. [Bake-off [1 point]](#Bake-off-[1-point])

## Overview

This homework and associated bake-off are devoted to developing really effective relation extraction systems using distant supervision. 

As with the previous assignments, this notebook first establishes a baseline system. The initial homework questions ask you to create additional baselines and suggest areas for innovation, and the final homework question asks you to develop an original system for you to enter into the bake-off.

## Set-up

See [the first notebook in this unit](rel_ext_01_task.ipynb#Set-up) for set-up instructions.

In [2]:
import numpy as np
import os
import rel_ext
from sklearn.linear_model import LogisticRegression
import utils

As usual, we unite our corpus and KB into a dataset, and create some splits for experimentation:

In [3]:
rel_ext_data_home = os.path.join('data', 'rel_ext_data')

In [4]:
corpus = rel_ext.Corpus(os.path.join(rel_ext_data_home, 'corpus.tsv.gz'))

In [5]:
kb = rel_ext.KB(os.path.join(rel_ext_data_home, 'kb.tsv.gz'))

In [6]:
dataset = rel_ext.Dataset(corpus, kb)

You are not wedded to this set-up for splits. The bake-off will be conducted on a previously unseen test-set, so all of the data in `dataset` is fair game:

In [7]:
splits = dataset.build_splits(
    split_names=['tiny', 'train', 'dev'],
    split_fracs=[0.01, 0.79, 0.20],
    seed=1)

In [8]:
splits

{'tiny': Corpus with 3,474 examples; KB with 445 triples,
 'train': Corpus with 263,285 examples; KB with 36,191 triples,
 'dev': Corpus with 64,937 examples; KB with 9,248 triples,
 'all': Corpus with 331,696 examples; KB with 45,884 triples}

## Baselines

### Hand-build feature functions

In [9]:
def simple_bag_of_words_featurizer(kbt, corpus, feature_counter):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split(' '):
            feature_counter[word] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for word in ex.middle.split(' '):
            feature_counter[word] += 1
    return feature_counter

In [10]:
featurizers = [simple_bag_of_words_featurizer]

In [11]:
model_factory = lambda: LogisticRegression(fit_intercept=True, solver='liblinear')

In [14]:
baseline_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers,
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.868      0.385      0.694        340       5716
author                    0.751      0.546      0.699        509       5885
capital                   0.690      0.211      0.474         95       5471
contains                  0.799      0.602      0.750       3904       9280
film_performance          0.767      0.559      0.714        766       6142
founders                  0.795      0.397      0.662        380       5756
genre                     0.571      0.141      0.355        170       5546
has_sibling               0.868      0.236      0.566        499       5875
has_spouse                0.891      0.332      0.666        594       5970
is_a                      0.685      0.223      0.485        497       5873
nationality               0.632      0.183      0.424        301       5677
parents     

Studying model weights might yield insights:

In [15]:
rel_ext.examine_model_weights(baseline_results)

Highest and lowest feature weights for relation adjoins:

     2.538 Córdoba
     2.457 Taluks
     2.318 Valais
     ..... .....
    -1.475 who
    -1.547 he
    -2.346 Earth

Highest and lowest feature weights for relation author:

     2.620 author
     2.335 wrote
     2.332 by
     ..... .....
    -3.005 controversial
    -3.584 1945
    -3.723 17th

Highest and lowest feature weights for relation capital:

     3.471 capital
     1.779 km
     1.777 posted
     ..... .....
    -1.221 largest
    -1.242 and
    -1.294 Westminster

Highest and lowest feature weights for relation contains:

     2.787 third-largest
     2.428 bordered
     2.148 attended
     ..... .....
    -2.461 Henley-on-Thames
    -3.885 Ceylon
    -6.027 Bronx

Highest and lowest feature weights for relation film_performance:

     3.847 starring
     3.633 co-starring
     3.586 alongside
     ..... .....
    -1.850 Malice
    -2.000 Westminster
    -2.272 spy

Highest and lowest feature weights for relation 

### Distributed representations

This simple baseline sums the GloVe vector representations for all of the words in the "middle" span and feeds those representations into the standard `LogisticRegression`-based `model_factory`. The crucial parameter that enables this is `vectorize=False`. This essentially says to `rel_ext.experiment` that your featurizer or your model will do the work of turning examples into vectors; in that case, `rel_ext.experiment` just organizes these representations by relation type.

In [16]:
GLOVE_HOME = os.path.join('data', 'glove.6B')

In [17]:
glove_lookup = utils.glove2dict(
    os.path.join(GLOVE_HOME, 'glove.6B.300d.txt'))

In [18]:
def glove_middle_featurizer(kbt, corpus, np_func=np.sum):
    reps = []
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split():
            rep = glove_lookup.get(word)
            if rep is not None:
                reps.append(rep)
    # A random representation of the right dimensionality if the
    # example happens not to overlap with GloVe's vocabulary:
    if len(reps) == 0:
        dim = len(next(iter(glove_lookup.values())))                
        return utils.randvec(n=dim)
    else:
        return np_func(reps, axis=0)

In [19]:
glove_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[glove_middle_featurizer],    
    vectorize=False, # Crucial for this featurizer!
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.853      0.462      0.730        340       5716
author                    0.824      0.442      0.703        509       5885
capital                   0.633      0.200      0.442         95       5471
contains                  0.664      0.416      0.593       3904       9280
film_performance          0.825      0.315      0.623        766       6142
founders                  0.733      0.232      0.512        380       5756
genre                     0.455      0.059      0.194        170       5546
has_sibling               0.826      0.246      0.562        499       5875
has_spouse                0.883      0.355      0.681        594       5970
is_a                      0.722      0.141      0.395        497       5873
nationality               0.695      0.219      0.485        301       5677
parents     

With the same basic code design, one can also use the PyTorch models included in the course repo, or write new ones that are better aligned with the task. For those models, it's likely that the featurizer will just return a list of tokens (or perhaps a list of lists of tokens), and the model will map those into vectors using an embedding.

## Homework questions

Please embed your homework responses in this notebook, and do not delete any cells from the notebook. (You are free to add as many cells as you like as part of your responses.)

### Different model factory [1 points]

The code in `rel_ext` makes it very easy to experiment with other classifier models: one need only redefine the `model_factory` argument. This question asks you to assess a [Support Vector Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).

__To submit:__ A wrapper function `run_svm_model_factory` that does the following: 

1. Uses `rel_ext.experiment` with the model factory set to one based in an `SVC` with `kernel='linear'` and all other arguments left with default values. 
1. Trains on the 'train' part of `splits`.
1. Assesses on the `dev` part of `splits`.
1. Uses `featurizers` as defined above. 
1. Returns the return value of `rel_ext.experiment` for this set-up.

The function `test_run_svm_model_factory` will check that your function conforms to these general specifications.

In [20]:
def run_svm_model_factory():
    
    ##### YOUR CODE HERE
    from sklearn.svm import SVC
    svm_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=featurizers,
        model_factory=lambda: SVC(kernel="linear"),
        verbose=True
    )
    return svm_results

In [21]:
def test_run_svm_model_factory(run_svm_model_factory):
    results = run_svm_model_factory()
    assert 'featurizers' in results, \
        "The return value of `run_svm_model_factory` seems not to be correct"
    # Check one of the models to make sure it's an SVC:
    assert 'SVC' in results['models']['adjoins'].__class__.__name__, \
        "It looks like the model factor wasn't set to use an SVC."    

In [22]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_run_svm_model_factory(run_svm_model_factory)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.823      0.341      0.642        340       5716
author                    0.705      0.595      0.680        509       5885
capital                   0.636      0.295      0.517         95       5471
contains                  0.778      0.608      0.736       3904       9280
film_performance          0.715      0.614      0.692        766       6142
founders                  0.735      0.461      0.657        380       5756
genre                     0.639      0.229      0.471        170       5546
has_sibling               0.819      0.244      0.557        499       5875
has_spouse                0.846      0.350      0.659        594       5970
is_a                      0.577      0.272      0.471        497       5873
nationality               0.543      0.189      0.395        301       5677
parents     

### Directional unigram features [1.5 points]

The current bag-of-words representation makes no distinction between "forward" and "reverse" examples. But, intuitively, there is big difference between _X and his son Y_ and _Y and his son X_. This question asks you to modify `simple_bag_of_words_featurizer` to capture these differences. 

__To submit:__

1. A feature function `directional_bag_of_words_featurizer` that is just like `simple_bag_of_words_featurizer` except that it distinguishes "forward" and "reverse". To do this, you just need to mark each word feature for whether it is derived from a subject–object example or from an object–subject example.  The included function `test_directional_bag_of_words_featurizer` should help verify that you've done this correctly.

2. A call to `rel_ext.experiment` with `directional_bag_of_words_featurizer` as the only featurizer. (Aside from this, use all the default values for `rel_ext.experiment` as exemplified above in this notebook.)

3. `rel_ext.experiment` returns some of the core objects used in the experiment. How many feature names does the `vectorizer` have for the experiment run in the previous step? Include the code needed for getting this value. (Note: we're partly asking you to figure out how to get this value by using the sklearn documentation, so please don't ask how to do it!)

In [10]:
def directional_bag_of_words_featurizer(kbt, corpus, feature_counter): 
    # Append these to the end of the keys you add/access in 
    # `feature_counter` to distinguish the two orders. You'll
    # need to use exactly these strings in order to pass 
    # `test_directional_bag_of_words_featurizer`.
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"
    
    ##### YOUR CODE HERE
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split(' '):
            feature_counter[word + subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for word in ex.middle.split(' '):
            feature_counter[word + object_subject_suffix] += 1

    return feature_counter


# Call to `rel_ext.experiment`:
##### YOUR CODE HERE    
directional_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[directional_bag_of_words_featurizer],
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.865      0.397      0.700        340       5716
author                    0.855      0.589      0.784        509       5885
capital                   0.605      0.274      0.487         95       5471
contains                  0.819      0.686      0.789       3904       9280
film_performance          0.839      0.661      0.796        766       6142
founders                  0.831      0.400      0.683        380       5756
genre                     0.782      0.253      0.551        170       5546
has_sibling               0.828      0.251      0.567        499       5875
has_spouse                0.857      0.354      0.667        594       5970
is_a                      0.812      0.270      0.579        497       5873
nationality               0.673      0.219      0.476        301       5677
parents     

In [24]:
def test_directional_bag_of_words_featurizer(corpus):
    from collections import defaultdict
    kbt = rel_ext.KBTriple(rel='worked_at', sbj='Randall_Munroe', obj='xkcd')
    feature_counter = defaultdict(int)
    # Make sure `feature_counter` is being updated, not reinitialized:
    feature_counter['is_OS'] += 5
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter)
    expected = defaultdict(
        int, {'is_OS':6,'a_OS':1,'webcomic_OS':1,'created_OS':1,'by_OS':1})
    assert feature_counter == expected, \
        "Expected:\n{}\nGot:\n{}".format(expected, feature_counter)

In [25]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_directional_bag_of_words_featurizer(corpus)

### The part-of-speech tags of the "middle" words [1.5 points]

Our corpus distribution contains part-of-speech (POS) tagged versions of the core text spans. Let's begin to explore whether there is information in these sequences, focusing on `middle_POS`.

__To submit:__

1. A feature function `middle_bigram_pos_tag_featurizer` that is just like `simple_bag_of_words_featurizer` except that it creates a feature for bigram POS sequences. For example, given 

  `The/DT dog/N napped/V`
  
   we obtain the list of bigram POS sequences
  
   `b = ['<s> DT', 'DT N', 'N V', 'V </s>']`. 
   
   Of course, `middle_bigram_pos_tag_featurizer` should return count dictionaries defined in terms of such bigram POS lists, on the model of `simple_bag_of_words_featurizer`.  Don't forget the start and end tags, to model those environments properly! The included function `test_middle_bigram_pos_tag_featurizer` should help verify that you've done this correctly.

2. A call to `rel_ext.experiment` with `middle_bigram_pos_tag_featurizer` as the only featurizer. (Aside from this, use all the default values for `rel_ext.experiment` as exemplified above in this notebook.)

In [11]:
def middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter):
    
    ##### YOUR CODE HERE
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for tag_bigram in get_tag_bigrams(ex.middle_POS):
            feature_counter[tag_bigram] += 1
        
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for tag_bigram in get_tag_bigrams(ex.middle_POS):
            feature_counter[tag_bigram] += 1
            
    return feature_counter


def get_tag_bigrams(s):
    """Suggested helper method for `middle_bigram_pos_tag_featurizer`.
    This should be defined so that it returns a list of str, where each 
    element is a POS bigram."""
    # The values of `start_symbol` and `end_symbol` are defined
    # here so that you can use `test_middle_bigram_pos_tag_featurizer`.
    start_symbol = "<s>"
    end_symbol = "</s>"
    
    ##### YOUR CODE HERE
    tags = get_tags(s)
    tags.insert(0, start_symbol)
    tags.append(end_symbol)
    return [" ".join(tags[i:i+2]) for i in range(len(tags)-1)]
    

def get_tags(s): 
    """Given a sequence of word/POS elements (lemmas), this function
    returns a list containing just the POS elements, in order.    
    """
    return [parse_lem(lem)[1] for lem in s.strip().split(' ') if lem]


def parse_lem(lem):
    """Helper method for parsing word/POS elements. It just splits
    on the rightmost / and returns (word, POS) as a tuple of str."""
    return lem.strip().rsplit('/', 1)  

# Call to `rel_ext.experiment`:
##### YOUR CODE HERE
bigram_pos_tag_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[middle_bigram_pos_tag_featurizer],
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.841      0.341      0.650        340       5716
author                    0.671      0.332      0.557        509       5885
capital                   0.282      0.116      0.219         95       5471
contains                  0.756      0.588      0.715       3904       9280
film_performance          0.726      0.446      0.645        766       6142
founders                  0.525      0.166      0.366        380       5756
genre                     0.574      0.182      0.402        170       5546
has_sibling               0.646      0.168      0.412        499       5875
has_spouse                0.720      0.273      0.542        594       5970
is_a                      0.573      0.165      0.384        497       5873
nationality               0.393      0.073      0.210        301       5677
parents     

In [12]:
def test_middle_bigram_pos_tag_featurizer(corpus):
    from collections import defaultdict
    kbt = rel_ext.KBTriple(rel='worked_at', sbj='Randall_Munroe', obj='xkcd')
    feature_counter = defaultdict(int)
    # Make sure `feature_counter` is being updated, not reinitialized:
    feature_counter['<s> VBZ'] += 5
    feature_counter = middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter)
    expected = defaultdict(
        int, {'<s> VBZ':6,'VBZ DT':1,'DT JJ':1,'JJ VBN':1,'VBN IN':1,'IN </s>':1})
    assert feature_counter == expected, \
        "Expected:\n{}\nGot:\n{}".format(expected, feature_counter)

In [28]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_middle_bigram_pos_tag_featurizer(corpus)

### Bag of Synsets [2 points]

The following allows you to use NLTK's WordNet API to get the synsets compatible with _dog_ as used as a noun:

```
from nltk.corpus import wordnet as wn
dog = wn.synsets('dog', pos='n')
dog
[Synset('dog.n.01'),
 Synset('frump.n.01'),
 Synset('dog.n.03'),
 Synset('cad.n.01'),
 Synset('frank.n.02'),
 Synset('pawl.n.01'),
 Synset('andiron.n.01')]
```

This question asks you to create synset-based features from the word/tag pairs in `middle_POS`.

__To submit:__

1. A feature function `synset_featurizer` that is just like `simple_bag_of_words_featurizer` except that it returns a list of synsets derived from `middle_POS`. Stringify these objects with `str` so that they can be `dict` keys. Use `convert_tag` (included below) to convert tags to `pos` arguments usable by `wn.synsets`. The included function `test_synset_featurizer` should help verify that you've done this correctly.

2. A call to `rel_ext.experiment` with `synset_featurizer` as the only featurizer. (Aside from this, use all the default values for `rel_ext.experiment`.)

In [29]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    import nltk
    nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


In [13]:
from nltk.corpus import wordnet as wn

def synset_featurizer(kbt, corpus, feature_counter):
    
    ##### YOUR CODE HERE
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for synset in get_synsets(ex.middle_POS):
            feature_counter[synset] += 1
        
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for synset in get_synsets(ex.middle_POS):
            feature_counter[synset] += 1
            
    return feature_counter


def get_synsets(s):
    """Suggested helper method for `synset_featurizer`. This should
    be completed so that it returns a list of stringified Synsets 
    associated with elements of `s`.
    """   
    # Use `parse_lem` from the previous question to get a list of
    # (word, POS) pairs. Remember to convert the POS strings.
    wt = [parse_lem(lem) for lem in s.strip().split(' ') if lem]
    
    ##### YOUR CODE HERE
    synsets = []
    for word, tag in wt:
        synset = [str(s) for s in wn.synsets(word, pos=convert_tag(tag))]
        synsets.extend(synset)

    return synsets
    
    
def convert_tag(t):
    """Converts tags so that they can be used by WordNet:
    
    | Tag begins with | WordNet tag |
    |-----------------|-------------|
    | `N`             | `n`         |
    | `V`             | `v`         |
    | `J`             | `a`         |
    | `R`             | `r`         |
    | Otherwise       | `None`      |
    """        
    if t[0].lower() in {'n', 'v', 'r'}:
        return t[0].lower()
    elif t[0].lower() == 'j':
        return 'a'
    else:
        return None    


# Call to `rel_ext.experiment`:
##### YOUR CODE HERE    
synset_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[synset_featurizer],
    verbose=True)

LookupError: 
**********************************************************************
  Resource [93mwordnet[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('wordnet')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mcorpora/wordnet[0m

  Searched in:
    - '/root/nltk_data'
    - '/opt/conda/nltk_data'
    - '/opt/conda/share/nltk_data'
    - '/opt/conda/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************


In [31]:
def test_synset_featurizer(corpus):
    from collections import defaultdict
    kbt = rel_ext.KBTriple(rel='worked_at', sbj='Randall_Munroe', obj='xkcd')
    feature_counter = defaultdict(int)
    # Make sure `feature_counter` is being updated, not reinitialized:
    feature_counter["Synset('be.v.01')"] += 5
    feature_counter = synset_featurizer(kbt, corpus, feature_counter)
    # The full return values for this tend to be long, so we just
    # test a few examples to avoid cluttering up this notebook.
    test_cases = {
        "Synset('be.v.01')": 6,
        "Synset('embody.v.02')": 1
    }
    for ss, expected in test_cases.items():   
        result = feature_counter[ss]
        assert result == expected, \
            "Incorrect count for {}: Expected {}; Got {}".format(ss, expected, result)

In [32]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_synset_featurizer(corpus)

### Your original system [3 points]

There are many options, and this could easily grow into a project. Here are a few ideas:

- Try out different classifier models, from `sklearn` and elsewhere.
- Add a feature that indicates the length of the middle.
- Augment the bag-of-words representation to include bigrams or trigrams (not just unigrams).
- Introduce features based on the entity mentions themselves. <!-- \[SPOILER: it helps a lot, maybe 4% in F-score. And combines nicely with the directional features.\] -->
- Experiment with features based on the context outside (rather than between) the two entity mentions — that is, the words before the first mention, or after the second.
- Try adding features which capture syntactic information, such as the dependency-path features used by Mintz et al. 2009. The [NLTK](https://www.nltk.org/) toolkit contains a variety of [parsing algorithms](http://www.nltk.org/api/nltk.parse.html) that may help.
- The bag-of-words representation does not permit generalization across word categories such as names of people, places, or companies. Can we do better using word embeddings such as [GloVe](https://nlp.stanford.edu/projects/glove/)?

In the cell below, please provide a brief technical description of your original system, so that the teaching team can gain an understanding of what it does. This will help us to understand your code and analyze all the submissions to identify patterns and strategies.

#### Directional Bag of Words + Other Classifiers

In [33]:
from sklearn.svm import SVC

if 'IS_GRADESCOPE_ENV' not in os.environ:
    directional_svm_linear_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_bag_of_words_featurizer],
        model_factory=lambda: SVC(kernel="linear"),
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.811      0.391      0.668        340       5716
author                    0.803      0.672      0.773        509       5885
capital                   0.650      0.274      0.510         95       5471
contains                  0.794      0.675      0.767       3904       9280
film_performance          0.781      0.698      0.763        766       6142
founders                  0.761      0.461      0.673        380       5756
genre                     0.605      0.441      0.563        170       5546
has_sibling               0.786      0.251      0.551        499       5875
has_spouse                0.851      0.355      0.665        594       5970
is_a                      0.657      0.308      0.535        497       5873
nationality               0.571      0.266      0.465        301       5677
parents     

In [34]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    directional_svm_rbf_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_bag_of_words_featurizer],
        model_factory=lambda: SVC(kernel="rbf"),
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.844      0.350      0.658        340       5716
author                    0.833      0.578      0.765        509       5885
capital                   0.800      0.084      0.296         95       5471
contains                  0.852      0.569      0.774       3904       9280
film_performance          0.857      0.580      0.782        766       6142
founders                  0.886      0.287      0.625        380       5756
genre                     0.727      0.094      0.310        170       5546
has_sibling               0.879      0.188      0.507        499       5875
has_spouse                0.868      0.276      0.607        594       5970
is_a                      0.830      0.187      0.492        497       5873
nationality               0.760      0.126      0.379        301       5677
parents     

#### N-grams featurizer

In [None]:
def middle_ngram_pos_tag_featurizer(kbt, corpus, feature_counter, n=2):
    
    ##### YOUR CODE HERE
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for tag_bigram in get_tag_ngrams(ex.middle_POS, n=n):
            feature_counter[tag_bigram] += 1
        
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for tag_bigram in get_tag_ngrams(ex.middle_POS, n=n):
            feature_counter[tag_bigram] += 1
            
    return feature_counter


def get_tag_ngrams(s, n=2):
    """Suggested helper method for `middle_bigram_pos_tag_featurizer`.
    This should be defined so that it returns a list of str, where each 
    element is a POS bigram."""
    # The values of `start_symbol` and `end_symbol` are defined
    # here so that you can use `test_middle_bigram_pos_tag_featurizer`.
    start_symbol = "<s>"
    end_symbol = "</s>"
    
    ##### YOUR CODE HERE
    tags = get_tags(s)
    tags.insert(0, start_symbol)
    tags.append(end_symbol)
    return [" ".join(tags[i:i+n]) for i in range(len(tags)-1)]
    

def get_tags(s): 
    """Given a sequence of word/POS elements (lemmas), this function
    returns a list containing just the POS elements, in order.    
    """
    return [parse_lem(lem)[1] for lem in s.strip().split(' ') if lem]


def parse_lem(lem):
    """Helper method for parsing word/POS elements. It just splits
    on the rightmost / and returns (word, POS) as a tuple of str."""
    return lem.strip().rsplit('/', 1)  


if 'IS_GRADESCOPE_ENV' not in os.environ:
    from functools import partial
    middle_3gram_pos_tag_featurizer = partial(middle_ngram_pos_tag_featurizer, n=3)
    middle_4gram_pos_tag_featurizer = partial(middle_ngram_pos_tag_featurizer, n=4)

In [48]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    threegram_pos_tag_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[middle_3gram_pos_tag_featurizer],
        model_factory=lambda: SVC(kernel="linear"),
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.717      0.335      0.584        340       5716
author                    0.655      0.358      0.561        509       5885
capital                   0.387      0.126      0.274         95       5471
contains                  0.763      0.592      0.721       3904       9280
film_performance          0.677      0.507      0.634        766       6142
founders                  0.521      0.226      0.413        380       5756
genre                     0.405      0.176      0.322        170       5546
has_sibling               0.595      0.138      0.358        499       5875
has_spouse                0.665      0.254      0.503        594       5970
is_a                      0.502      0.231      0.407        497       5873
nationality               0.287      0.083      0.193        301       5677
parents     

In [49]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    fourgram_pos_tag_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[middle_4gram_pos_tag_featurizer],
        model_factory=lambda: SVC(kernel="linear"),
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.719      0.338      0.587        340       5716
author                    0.620      0.346      0.535        509       5885
capital                   0.450      0.095      0.257         95       5471
contains                  0.745      0.590      0.708       3904       9280
film_performance          0.694      0.526      0.652        766       6142
founders                  0.458      0.229      0.382        380       5756
genre                     0.403      0.182      0.324        170       5546
has_sibling               0.543      0.126      0.327        499       5875
has_spouse                0.678      0.231      0.489        594       5970
is_a                      0.515      0.310      0.455        497       5873
nationality               0.214      0.100      0.174        301       5677
parents     

In [50]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    threegram_pos_tag_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[middle_3gram_pos_tag_featurizer],
        model_factory=lambda: SVC(kernel="rbf"),
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.837      0.303      0.619        340       5716
author                    0.744      0.303      0.576        509       5885
capital                   0.542      0.137      0.340         95       5471
contains                  0.808      0.544      0.737       3904       9280
film_performance          0.822      0.427      0.693        766       6142
founders                  0.727      0.084      0.288        380       5756
genre                     1.000      0.029      0.132        170       5546
has_sibling               0.732      0.120      0.363        499       5875
has_spouse                0.801      0.190      0.488        594       5970
is_a                      0.805      0.125      0.385        497       5873
nationality               0.692      0.060      0.222        301       5677
parents     

In [51]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    fourgram_pos_tag_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[middle_4gram_pos_tag_featurizer],
        model_factory=lambda: SVC(kernel="rbf"),
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.832      0.306      0.619        340       5716
author                    0.741      0.287      0.563        509       5885
capital                   0.545      0.126      0.328         95       5471
contains                  0.812      0.524      0.731       3904       9280
film_performance          0.811      0.403      0.675        766       6142
founders                  0.667      0.105      0.323        380       5756
genre                     1.000      0.029      0.132        170       5546
has_sibling               0.711      0.108      0.336        499       5875
has_spouse                0.757      0.189      0.472        594       5970
is_a                      0.769      0.141      0.407        497       5873
nationality               0.640      0.053      0.200        301       5677
parents     

#### Multiple Featurizers

In [53]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    dir_synset_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_bag_of_words_featurizer, synset_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.843      0.426      0.705        340       5716
author                    0.855      0.627      0.797        509       5885
capital                   0.580      0.305      0.492         95       5471
contains                  0.813      0.679      0.782       3904       9280
film_performance          0.797      0.672      0.769        766       6142
founders                  0.752      0.455      0.665        380       5756
genre                     0.602      0.329      0.517        170       5546
has_sibling               0.800      0.281      0.584        499       5875
has_spouse                0.833      0.354      0.655        594       5970
is_a                      0.694      0.292      0.544        497       5873
nationality               0.556      0.262      0.455        301       5677
parents     

In [54]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    dir_synset_bigram_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_bag_of_words_featurizer, synset_featurizer, middle_bigram_pos_tag_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.798      0.418      0.675        340       5716
author                    0.856      0.652      0.805        509       5885
capital                   0.574      0.284      0.477         95       5471
contains                  0.764      0.756      0.762       3904       9280
film_performance          0.802      0.691      0.777        766       6142
founders                  0.757      0.450      0.666        380       5756
genre                     0.667      0.376      0.578        170       5546
has_sibling               0.824      0.281      0.594        499       5875
has_spouse                0.841      0.374      0.673        594       5970
is_a                      0.708      0.360      0.593        497       5873
nationality               0.568      0.319      0.491        301       5677
parents     

In [56]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    dir_synset_bigram_svm_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_bag_of_words_featurizer, synset_featurizer, middle_bigram_pos_tag_featurizer],
        model_factory=lambda: SVC(kernel="linear"),
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.713      0.409      0.621        340       5716
author                    0.746      0.682      0.732        509       5885
capital                   0.493      0.347      0.455         95       5471
contains                  0.726      0.750      0.730       3904       9280
film_performance          0.737      0.732      0.736        766       6142
founders                  0.660      0.489      0.617        380       5756
genre                     0.553      0.518      0.546        170       5546
has_sibling               0.656      0.253      0.497        499       5875
has_spouse                0.732      0.372      0.613        594       5970
is_a                      0.572      0.390      0.523        497       5873
nationality               0.455      0.365      0.433        301       5677
parents     

In [59]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    synset_bigram_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[synset_featurizer, middle_bigram_pos_tag_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.766      0.376      0.635        340       5716
author                    0.792      0.605      0.746        509       5885
capital                   0.585      0.326      0.505         95       5471
contains                  0.817      0.633      0.772       3904       9280
film_performance          0.770      0.628      0.736        766       6142
founders                  0.743      0.403      0.635        380       5756
genre                     0.581      0.318      0.498        170       5546
has_sibling               0.826      0.277      0.591        499       5875
has_spouse                0.832      0.350      0.652        594       5970
is_a                      0.632      0.342      0.540        497       5873
nationality               0.526      0.203      0.399        301       5677
parents     

In [33]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    dir_bigram_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_bag_of_words_featurizer, middle_bigram_pos_tag_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.862      0.403      0.702        340       5716
author                    0.832      0.635      0.784        509       5885
capital                   0.758      0.263      0.551         95       5471
contains                  0.846      0.681      0.807       3904       9280
film_performance          0.867      0.680      0.822        766       6142
founders                  0.794      0.416      0.672        380       5756
genre                     0.789      0.329      0.617        170       5546
has_sibling               0.865      0.257      0.587        499       5875
has_spouse                0.884      0.370      0.692        594       5970
is_a                      0.791      0.336      0.623        497       5873
nationality               0.598      0.223      0.447        301       5677
parents     

In [None]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    from sklearn.ensemble import RandomForestClassifier
    dir_bigram_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_bag_of_words_featurizer, middle_bigram_pos_tag_featurizer],
        model_factory=lambda: RandomForestClassifier(),
        verbose=True)

#### Gradient Boosting

In [57]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    from sklearn.ensemble import GradientBoostingClassifier
    dir_synset_bigram_gb_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_bag_of_words_featurizer, synset_featurizer, middle_bigram_pos_tag_featurizer],
        model_factory=lambda: GradientBoostingClassifier(),
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.838      0.412      0.694        340       5716
author                    0.851      0.629      0.795        509       5885
capital                   0.600      0.221      0.447         95       5471
contains                  0.850      0.592      0.782       3904       9280
film_performance          0.842      0.572      0.770        766       6142
founders                  0.873      0.379      0.692        380       5756
genre                     0.638      0.218      0.460        170       5546
has_sibling               0.844      0.283      0.604        499       5875
has_spouse                0.865      0.379      0.688        594       5970
is_a                      0.835      0.233      0.551        497       5873
nationality               0.732      0.199      0.477        301       5677
parents     

In [23]:
# Enter your system description in this cell.

# (1) System Description
"""
My system is pretty simple, consisting of both the directional
bag of words featurizer, and the middle bigram pos tag featurizer
used in the examples above, and it uses the default LogisticRegression
model for the model_factory argument. In hopes of possibly outperforming
on the hidden test set and avoiding overfitting, I adjusted the 
training/validation sets used here to be 55% and 44% of the data respectively
instead of the default 79% and 20% splits.

I experimented above with various other sklearn models, attempted 3-gram and 4-gram
pos tag featurizers, and different combinations of featurizers and ultimately came
to the LogisticRegression model performing more or less the same as any other,
and that my chosen pair of featurizers did better than any single featurizer or
combination of featurizers tested.
"""

# (2) Original System code
if 'IS_GRADESCOPE_ENV' not in os.environ:
    splits = dataset.build_splits(
        split_names=['tiny', 'train', 'dev'],
        split_fracs=[0.01, 0.55, 0.44],
        seed=1)
        
    dir_bigram_lr_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_bag_of_words_featurizer, middle_bigram_pos_tag_featurizer],
        model_factory=lambda: LogisticRegression(fit_intercept=True, solver='liblinear'),
        verbose=True)

# (3) Score
# My peak score was: 0.642

# Please do not remove this comment.

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.881      0.456      0.742        777      12519
author                    0.856      0.613      0.793       1170      12912
capital                   0.775      0.267      0.562        232      11974
contains                  0.845      0.644      0.795       9165      20907
film_performance          0.861      0.693      0.821       1741      13483
founders                  0.792      0.419      0.672        853      12595
genre                     0.696      0.328      0.569        357      12099
has_sibling               0.903      0.280      0.624       1159      12901
has_spouse                0.898      0.391      0.713       1336      13078
is_a                      0.723      0.328      0.582       1143      12885
nationality               0.620      0.287      0.503        683      12425
parents     

## Bake-off [1 point]

For the bake-off, we will release a test set. The announcement will go out on the discussion forum. You will evaluate your custom model from the previous question on these new datasets using the function `rel_ext.bake_off_experiment`. Rules:

1. Only one evaluation is permitted.
1. No additional system tuning is permitted once the bake-off has started.

The cells below this one constitute your bake-off entry.

People who enter will receive the additional homework point, and people whose systems achieve the top score will receive an additional 0.5 points. We will test the top-performing systems ourselves, and only systems for which we can reproduce the reported results will win the extra 0.5 points.

Late entries will be accepted, but they cannot earn the extra 0.5 points. Similarly, you cannot win the bake-off unless your homework is submitted on time.

The announcement will include the details on where to submit your entry.

In [15]:
# Enter your bake-off assessment code in this cell. 
# Please do not remove this comment.
if 'IS_GRADESCOPE_ENV' not in os.environ:
    # Please enter your code in the scope of the above conditional.
    ##### YOUR CODE HERE
    splits = dataset.build_splits(
        split_names=['tiny', 'train', 'dev'],
        split_fracs=[0.01, 0.55, 0.44],
        seed=1)
        
    bakeoff_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_bag_of_words_featurizer, middle_bigram_pos_tag_featurizer],
        model_factory=lambda: LogisticRegression(fit_intercept=True, solver='liblinear'),
        verbose=False)

    rel_ext_data_home_test = os.path.join(rel_ext_data_home, 'bakeoff-rel_ext-test-data')
    rel_ext.bake_off_experiment(bakeoff_results, rel_ext_data_home_test)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.886      0.463      0.750        438       7122
author                    0.828      0.626      0.778        645       7329
capital                   0.740      0.322      0.587        115       6799
contains                  0.794      0.663      0.764       3808      10492
film_performance          0.829      0.675      0.792       1011       7695
founders                  0.779      0.412      0.661        444       7128
genre                     0.647      0.351      0.554        188       6872
has_sibling               0.894      0.234      0.572        717       7401
has_spouse                0.869      0.358      0.676        780       7464
is_a                      0.727      0.283      0.553        611       7295
nationality               0.616      0.264      0.486        383       7067
parents     

In [None]:
# On an otherwise blank line in this cell, please enter
# your macro-average f-score (an F_0.5 score) as reported 
# by the code above. Please enter only a number between 
# 0 and 1 inclusive. Please do not remove this comment.
if 'IS_GRADESCOPE_ENV' not in os.environ:
    # Please enter your score in the scope of the above conditional.
    ##### YOUR CODE HERE
    0.627

