# Homework and bake-off: Relation extraction using distant supervision

In [1]:
__author__ = "Bill MacCartney and Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2020"

## Contents

1. [Overview](#Overview)
1. [Set-up](#Set-up)
1. [Baselines](#Baselines)
  1. [Hand-build feature functions](#Hand-build-feature-functions)
  1. [Distributed representations](#Distributed-representations)
1. [Homework questions](#Homework-questions)
  1. [Different model factory [1 points]](#Different-model-factory-[1-points])
  1. [Directional unigram features [1.5 points]](#Directional-unigram-features-[1.5-points])
  1. [The part-of-speech tags of the "middle" words [1.5 points]](#The-part-of-speech-tags-of-the-"middle"-words-[1.5-points])
  1. [Bag of Synsets [2 points]](#Bag-of-Synsets-[2-points])
  1. [Your original system [3 points]](#Your-original-system-[3-points])
1. [Bake-off [1 point]](#Bake-off-[1-point])

## Overview

This homework and associated bake-off are devoted to developing really effective relation extraction systems using distant supervision. 

As with the previous assignments, this notebook first establishes a baseline system. The initial homework questions ask you to create additional baselines and suggest areas for innovation, and the final homework question asks you to develop an original system for you to enter into the bake-off.

## Set-up

See [the first notebook in this unit](rel_ext_01_task.ipynb#Set-up) for set-up instructions.

In [2]:
import numpy as np
import os
import rel_ext
from sklearn.linear_model import LogisticRegression
import utils

As usual, we unite our corpus and KB into a dataset, and create some splits for experimentation:

In [3]:
rel_ext_data_home = os.path.join('data', 'rel_ext_data')

In [4]:
corpus = rel_ext.Corpus(os.path.join(rel_ext_data_home, 'corpus.tsv.gz'))

In [5]:
kb = rel_ext.KB(os.path.join(rel_ext_data_home, 'kb.tsv.gz'))

In [6]:
dataset = rel_ext.Dataset(corpus, kb)

You are not wedded to this set-up for splits. The bake-off will be conducted on a previously unseen test-set, so all of the data in `dataset` is fair game:

In [7]:
splits = dataset.build_splits(
    split_names=['tiny', 'train', 'dev'],
    split_fracs=[0.01, 0.79, 0.20],
    seed=1)

In [8]:
corpus.examples[0].__repr__()


"Example(entity_1='Marche', entity_2='Ancona', left='2011 History of Town and Charity Background History of the Town Servigliano is a comune ( municipality ) in the Province of Fermo in the Italian region', mention_1='Marche', middle=', located about 60 km south of', mention_2='Ancona', right='. As far back as 1914 , with the imminent prospect of Italy entering the war , a large holding camp was built in Servigliano for the', left_POS='2011/CD History/NN of/IN Town/NNP and/CC Charity/NNP Background/NNP History/NN of/IN the/DT Town/NNP Servigliano/NNP is/VBZ a/DT comune/NN -LRB-/-LRB- municipality/NN -RRB-/-RRB- in/IN the/DT Province/NNP of/IN Fermo/NNP in/IN the/DT Italian/JJ region/NN', mention_1_POS='Marche/NNP', middle_POS=',/, located/JJ about/RB 60/CD km/NN south/NN of/IN', mention_2_POS='Ancona/NNP', right_POS='./. As/IN far/RB back/RB as/IN 1914/CD ,/, with/IN the/DT imminent/JJ prospect/NN of/IN Italy/NNP entering/VBG the/DT war/NN ,/, a/DT large/JJ holding/VBG camp/NN was/VBD 

In [9]:
kb.get_triples_for_relation("adjoins")[0:5]

[KBTriple(rel='adjoins', sbj='France', obj='Spain'),
 KBTriple(rel='adjoins', sbj='Thailand', obj='Laos'),
 KBTriple(rel='adjoins', sbj='Alberta', obj='Northwest_Territories'),
 KBTriple(rel='adjoins', sbj='County_Kilkenny', obj='County_Laois'),
 KBTriple(rel='adjoins', sbj='Tianjin', obj='Hebei')]

## Baselines

### Hand-build feature functions

The simple_bag_of_words_featurizer below is a function that is passed to the experiment method. The experiment method calls the train models method which itself calls the featurizer method that uses this feautrizer as an inut as to how to featurize the input. Note that the elements of the featurizer are: a kb (a KB triple), the corpus and a feature counter

For a given triple (relation, subject, object), the simple featurizer looks for all the examples in the corpus (in both orders). The matched corpus example is splitted into words whicha re added the feature counter dictionary counts

In [10]:
def simple_bag_of_words_featurizer(kbt, corpus, feature_counter):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split(' '):
            feature_counter[word] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for word in ex.middle.split(' '):
            feature_counter[word] += 1
    return feature_counter

In [11]:
featurizers = [simple_bag_of_words_featurizer]

In [12]:
model_factory = lambda: LogisticRegression(fit_intercept=True, solver='liblinear')

In [13]:
baseline_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers,
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.861      0.365      0.677        340       5716
author                    0.799      0.540      0.729        509       5885
capital                   0.485      0.168      0.352         95       5471
contains                  0.788      0.604      0.743       3904       9280
film_performance          0.766      0.569      0.717        766       6142
founders                  0.819      0.405      0.680        380       5756
genre                     0.500      0.147      0.338        170       5546
has_sibling               0.878      0.244      0.578        499       5875
has_spouse                0.915      0.325      0.671        594       5970
is_a                      0.744      0.233      0.517        497       5873
nationality               0.593      0.179      0.406        301       5677
parents     

Studying model weights might yield insights:

In [14]:
rel_ext.examine_model_weights(baseline_results)

Highest and lowest feature weights for relation adjoins:

     2.493 Córdoba
     2.462 Valais
     2.359 Taluks
     ..... .....
    -1.455 Lancashire
    -1.559 he
    -1.621 who

Highest and lowest feature weights for relation author:

     2.823 books
     2.668 author
     2.604 book
     ..... .....
    -2.043 or
    -2.999 1818
    -4.402 17th

Highest and lowest feature weights for relation capital:

     2.380 capital
     1.742 km
     1.677 town
     ..... .....
    -1.738 ~3.9
    -1.748 pop
    -1.752 million

Highest and lowest feature weights for relation contains:

     2.800 third-largest
     2.404 district
     2.381 bordered
     ..... .....
    -2.644 Sarawak
    -2.646 Mile
    -2.657 Lancashire

Highest and lowest feature weights for relation film_performance:

     3.985 starring
     3.798 alongside
     3.526 movie
     ..... .....
    -1.798 Joker
    -1.986 Sajid
    -3.848 double

Highest and lowest feature weights for relation founders:

     4.455 founder

In [15]:
#code to check for examples of specific relations
for i in kb.get_triples_for_relation('adjoins')[:]:
    for example in corpus.get_examples_for_entities(i.sbj,i.obj)[0:2]:
        print(example.mention_1 + ' '  + example.middle + ' ' + example.mention_2)

France , Sweden , Spain
France nor Spain
Thailand , Vietnam , and Laos
Thailand , tomoi from Malaysia , muay Lao from Laos
Alberta and the Northwest Territories
Alberta , the Northwest Territories
Kilkenny , Laois
Tianjin and the provinces of Hebei
Bavaria , and Fahrzeugfabrik Eisenach in Thuringia
Bavaria , and Fahrzeugfabrik Eisenach in Thuringia
Hispaniola and Cuba
Hispaniola , Puerto Rico and Cuba
Libya , Egypt
Libya and Egypt
Jordan , Kuwait , Saudi Arabia
Jordan , and Saudi Arabia
Montana , bordering the Canadian provinces of Alberta
Montana and Alberta
East River . In 1874 , the western portion of the present Bronx County
Honduras , El Salvador , Nicaragua
Honduras , Mexico , Nicaragua
Haryana , Punjab and Rajasthan
Haryana and Rajasthan
Lambeth , Southwark
Lambeth , Southwark
Canada , United States of America
Canada , Germany , and the United States of America
Salta and Jujuy
Salta and Jujuy
France and Belgium
France , Belgium
Afghanistan , Tajikistan
Afghanistan , Tajikistan
N

West Bank . [ 29 ] Israel
New York , forcing the Americans to retreat to Pennsylvania
New York City , Long Island , New York , Pennsylvania
Puebla , Guerrero and Oaxaca
Puebla , whimsical bottle art from Oaxaca
New Zealand 's North Island . Its more immediate neighbours are Vanuatu
New Zealand , and Banks Island in modern-day Vanuatu
Sweden , Finland
Sweden , Norway , Finland
Gulf of Mexico , Mexico
Gulf of Mexico in the United States and Mexico
Tanzania , Uganda
Tanzania , Uganda
West Virginia to the southwest , and Ohio
W. Virginia ; entire service with the Baltimore & Ohio
Connecticut , Massachusetts and Rhode Island
Connecticut , Delaware , or Rhode Island
Kenedy , Willacy
Chinatown district ( in Soho
Chinatown / Soho
Puerto Rico , Hispaniola
Puerto Rico . He then went to Hispaniola
Sierra Leone and parts of neighboring Guinea
Sierra Leone on the west , Guinea
Beltrami County and added to Roseau County
New York City , via a flanking move from Staten Island across Long Island
New Yo

Benin Republic , to Togo
Benbecula , North Uist
Benbecula and then across another causeway to North Uist
Afghanistan and Pakistan
Afghanistan and Pakistan
San Benito County , Monterey County
Djibouti and Somalia
Djibouti and Somalia
Republic of Senegal in 1960 ( 13.7M , 2009 est . ) , Republic of Mali
Senegal broke from the Mali
Iowa , Minnesota
Iowa , Kansas , Minnesota
Tajikistan , Afghanistan
Tajikistan and Badakshan Province , Afghanistan
Madhya Pradesh , Gujarat
Madhya Pradesh and Gujarat
Andorra , France
Andorra - Franco War was not the first or the last time France
Italy to Slovenia
Italy , Serbia and Montenegro , and Slovenia
Lake Superior region of Minnesota , Wisconsin
Lake Superior , with Wisconsin
Mexico south to Peru , and around the Gulf of Mexico
Mexico . It represents the shortest distance between the Gulf of Mexico
Idaho , Utah
Idaho , Montana , Utah
Chicago , Florida , Indiana
Chicago , Indiana
Vancouver , Burnaby
Vancouver , Burnaby
Luanda Province , the Bengo Provin

### Distributed representations

This simple baseline sums the GloVe vector representations for all of the words in the "middle" span and feeds those representations into the standard `LogisticRegression`-based `model_factory`. The crucial parameter that enables this is `vectorize=False`. This essentially says to `rel_ext.experiment` that your featurizer or your model will do the work of turning examples into vectors; in that case, `rel_ext.experiment` just organizes these representations by relation type.

In [16]:
GLOVE_HOME = os.path.join('data', 'glove.6B')

In [17]:
glove_lookup = utils.glove2dict(
    os.path.join(GLOVE_HOME, 'glove.6B.300d.txt'))

The glove_lookup object is a dictionary that represents each word in a dictionary as a 300 dimensional vector. Below the glove middle featurizer takes a kb triple, looks for all the examples in the corpus, splits middle into words, looks them up on the glove vectorizer and appends all. At the end we will have a matrix for two entities with one row for each word. the np.sum np_func option just collapses the matrix into a vector. 

In [18]:
def glove_middle_featurizer(kbt, corpus, np_func=np.sum):
    reps = []
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split():
            rep = glove_lookup.get(word)
            if rep is not None:
                reps.append(rep)
    # A random representation of the right dimensionality if the
    # example happens not to overlap with GloVe's vocabulary:
    if len(reps) == 0:
        dim = len(next(iter(glove_lookup.values())))                
        return utils.randvec(n=dim)
    else:
        return np_func(reps, axis=0)

In [19]:
glove_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[glove_middle_featurizer],    
    vectorize=False, # Crucial for this featurizer!
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.825      0.471      0.717        340       5716
author                    0.871      0.413      0.713        509       5885
capital                   0.553      0.221      0.425         95       5471
contains                  0.658      0.406      0.585       3904       9280
film_performance          0.774      0.317      0.601        766       6142
founders                  0.754      0.226      0.514        380       5756
genre                     0.458      0.065      0.207        170       5546
has_sibling               0.837      0.246      0.566        499       5875
has_spouse                0.878      0.338      0.666        594       5970
is_a                      0.658      0.151      0.393        497       5873
nationality               0.608      0.196      0.428        301       5677
parents     

With the same basic code design, one can also use the PyTorch models included in the course repo, or write new ones that are better aligned with the task. For those models, it's likely that the featurizer will just return a list of tokens (or perhaps a list of lists of tokens), and the model will map those into vectors using an embedding.

## Homework questions

Please embed your homework responses in this notebook, and do not delete any cells from the notebook. (You are free to add as many cells as you like as part of your responses.)

### Different model factory [1 points]

The code in `rel_ext` makes it very easy to experiment with other classifier models: one need only redefine the `model_factory` argument. This question asks you to assess a [Support Vector Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).

__To submit:__ A wrapper function `run_svm_model_factory` that does the following: 

1. Uses `rel_ext.experiment` with the model factory set to one based in an `SVC` with `kernel='linear'` and all other arguments left with default values. 
1. Trains on the 'train' part of `splits`.
1. Assesses on the `dev` part of `splits`.
1. Uses `featurizers` as defined above. 
1. Returns the return value of `rel_ext.experiment` for this set-up.

The function `test_run_svm_model_factory` will check that your function conforms to these general specifications.

In [20]:
def run_svm_model_factory():
    from sklearn.svm import SVC
    
    model_factory = lambda: SVC(kernel = 'linear')
    glove_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_featurizer], 
    model_factory = model_factory,
    vectorize=True, # Crucial for this featurizer?
    verbose=True)
    return glove_results

In [21]:
def test_run_svm_model_factory(run_svm_model_factory):
    results = run_svm_model_factory()
    assert 'featurizers' in results, \
        "The return value of `run_svm_model_factory` seems not to be correct"
    # Check one of the models to make sure it's an SVC:
    assert 'SVC' in results['models']['adjoins'].__class__.__name__, \
        "It looks like the model factor wasn't set to use an SVC."    

In [22]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_run_svm_model_factory(run_svm_model_factory)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.779      0.353      0.628        340       5716
author                    0.756      0.603      0.720        509       5885
capital                   0.634      0.274      0.502         95       5471
contains                  0.769      0.602      0.729       3904       9280
film_performance          0.755      0.616      0.723        766       6142
founders                  0.775      0.426      0.666        380       5756
genre                     0.518      0.259      0.431        170       5546
has_sibling               0.799      0.255      0.559        499       5875
has_spouse                0.887      0.343      0.674        594       5970
is_a                      0.624      0.288      0.506        497       5873
nationality               0.586      0.193      0.416        301       5677
parents     

### Directional unigram features [1.5 points]

The current bag-of-words representation makes no distinction between "forward" and "reverse" examples. But, intuitively, there is big difference between _X and his son Y_ and _Y and his son X_. This question asks you to modify `simple_bag_of_words_featurizer` to capture these differences. 

__To submit:__

1. A feature function `directional_bag_of_words_featurizer` that is just like `simple_bag_of_words_featurizer` except that it distinguishes "forward" and "reverse". To do this, you just need to mark each word feature for whether it is derived from a subject–object example or from an object–subject example.  The included function `test_directional_bag_of_words_featurizer` should help verify that you've done this correctly.

2. A call to `rel_ext.experiment` with `directional_bag_of_words_featurizer` as the only featurizer. (Aside from this, use all the default values for `rel_ext.experiment` as exemplified above in this notebook.)

3. `rel_ext.experiment` returns some of the core objects used in the experiment. How many feature names does the `vectorizer` have for the experiment run in the previous step? Include the code needed for getting this value. (Note: we're partly asking you to figure out how to get this value by using the sklearn documentation, so please don't ask how to do it!)

In [23]:
def directional_bag_of_words_featurizer(kbt, corpus, feature_counter): 
    # Append these to the end of the keys you add/access in 
    # `feature_counter` to distinguish the two orders. You'll
    # need to use exactly these strings in order to pass 
    # `test_directional_bag_of_words_featurizer`.
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split(' '):
            feature_counter[word+subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for word in ex.middle.split(' '):
            feature_counter[word+object_subject_suffix] += 1

    return feature_counter


# Call to `rel_ext.experiment`:
baseline_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[directional_bag_of_words_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.836      0.406      0.690        340       5716
author                    0.851      0.605      0.787        509       5885
capital                   0.576      0.200      0.419         95       5471
contains                  0.806      0.673      0.775       3904       9280
film_performance          0.836      0.654      0.792        766       6142
founders                  0.856      0.424      0.711        380       5756
genre                     0.694      0.294      0.546        170       5546
has_sibling               0.879      0.248      0.583        499       5875
has_spouse                0.917      0.337      0.682        594       5970
is_a                      0.782      0.260      0.557        497       5873
nationality               0.656      0.209      0.460        301       5677
parents     

In [24]:
def test_directional_bag_of_words_featurizer(corpus):
    from collections import defaultdict
    kbt = rel_ext.KBTriple(rel='worked_at', sbj='Randall_Munroe', obj='xkcd')
    feature_counter = defaultdict(int)
    # Make sure `feature_counter` is being updated, not reinitialized:
    feature_counter['is_OS'] += 5
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter)
    expected = defaultdict(
        int, {'is_OS':6,'a_OS':1,'webcomic_OS':1,'created_OS':1,'by_OS':1})
    assert feature_counter == expected, \
        "Expected:\n{}\nGot:\n{}".format(expected, feature_counter)

In [25]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_directional_bag_of_words_featurizer(corpus)

### The part-of-speech tags of the "middle" words [1.5 points]

Our corpus distribution contains part-of-speech (POS) tagged versions of the core text spans. Let's begin to explore whether there is information in these sequences, focusing on `middle_POS`.

__To submit:__

1. A feature function `middle_bigram_pos_tag_featurizer` that is just like `simple_bag_of_words_featurizer` except that it creates a feature for bigram POS sequences. For example, given 

  `The/DT dog/N napped/V`
  
   we obtain the list of bigram POS sequences
  
   `b = ['<s> DT', 'DT N', 'N V', 'V </s>']`. 
   
   Of course, `middle_bigram_pos_tag_featurizer` should return count dictionaries defined in terms of such bigram POS lists, on the model of `simple_bag_of_words_featurizer`.  Don't forget the start and end tags, to model those environments properly! The included function `test_middle_bigram_pos_tag_featurizer` should help verify that you've done this correctly.

2. A call to `rel_ext.experiment` with `middle_bigram_pos_tag_featurizer` as the only featurizer. (Aside from this, use all the default values for `rel_ext.experiment` as exemplified above in this notebook.)

In [26]:
def middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter):
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for POS in get_tag_bigrams(ex):
            feature_counter[POS] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for POS in get_tag_bigrams(ex):
            feature_counter[POS] += 1
    return feature_counter


def get_tag_bigrams(s):
    """Suggested helper method for `middle_bigram_pos_tag_featurizer`.
    This should be defined so that it returns a list of str, where each 
    element is a POS bigram."""
    # The values of `start_symbol` and `end_symbol` are defined
    # here so that you can use `test_middle_bigram_pos_tag_featurizer`.
    start_symbol = "<s>"
    end_symbol = "</s>"
    list_w = s.middle_POS.split(' ')
    if list_w[0] == '':
        return ['<s> ', ' </s>']
    else:
        results = [(start_symbol + ' ' + list_w[0].split('/')[1])]

        for i in range(len(list_w) - 1):
            results.append(list_w[i].split('/')[1] + ' ' + list_w[i + 1].split('/')[1])

        results.append((list_w[len(list_w) - 1].split('/')[1]) + ' ' + end_symbol)
        return results

    
def get_tags(s): 
    """Given a sequence of word/POS elements (lemmas), this function
    returns a list containing just the POS elements, in order.    
    """
    return [parse_lem(lem)[1] for lem in s.strip().split(' ') if lem]


def parse_lem(lem):
    """Helper method for parsing word/POS elements. It just splits
    on the rightmost / and returns (word, POS) as a tuple of str."""
    return lem.strip().rsplit('/', 1)  

# Call to `rel_ext.experiment`:
baseline_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[middle_bigram_pos_tag_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.852      0.338      0.653        340       5716
author                    0.723      0.338      0.589        509       5885
capital                   0.615      0.168      0.402         95       5471
contains                  0.745      0.610      0.713       3904       9280
film_performance          0.719      0.440      0.638        766       6142
founders                  0.624      0.166      0.402        380       5756
genre                     0.580      0.171      0.392        170       5546
has_sibling               0.675      0.166      0.419        499       5875
has_spouse                0.764      0.256      0.547        594       5970
is_a                      0.589      0.153      0.375        497       5873
nationality               0.500      0.076      0.237        301       5677
parents     

In [27]:
def test_middle_bigram_pos_tag_featurizer(corpus):
    from collections import defaultdict
    kbt = rel_ext.KBTriple(rel='worked_at', sbj='Randall_Munroe', obj='xkcd')
    feature_counter = defaultdict(int)
    # Make sure `feature_counter` is being updated, not reinitialized:
    feature_counter['<s> VBZ'] += 5
    feature_counter = middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter)
    expected = defaultdict(
        int, {'<s> VBZ':6,'VBZ DT':1,'DT JJ':1,'JJ VBN':1,'VBN IN':1,'IN </s>':1})
    assert feature_counter == expected, \
        "Expected:\n{}\nGot:\n{}".format(expected, feature_counter)

In [28]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_middle_bigram_pos_tag_featurizer(corpus)

### Bag of Synsets [2 points]

The following allows you to use NLTK's WordNet API to get the synsets compatible with _dog_ as used as a noun:

```
from nltk.corpus import wordnet as wn
dog = wn.synsets('dog', pos='n')
dog
[Synset('dog.n.01'),
 Synset('frump.n.01'),
 Synset('dog.n.03'),
 Synset('cad.n.01'),
 Synset('frank.n.02'),
 Synset('pawl.n.01'),
 Synset('andiron.n.01')]
```

This question asks you to create synset-based features from the word/tag pairs in `middle_POS`.

__To submit:__

1. A feature function `synset_featurizer` that is just like `simple_bag_of_words_featurizer` except that it returns a list of synsets derived from `middle_POS`. Stringify these objects with `str` so that they can be `dict` keys. Use `convert_tag` (included below) to convert tags to `pos` arguments usable by `wn.synsets`. The included function `test_synset_featurizer` should help verify that you've done this correctly.

2. A call to `rel_ext.experiment` with `synset_featurizer` as the only featurizer. (Aside from this, use all the default values for `rel_ext.experiment`.)

In [29]:
from nltk.corpus import wordnet as wn

def synset_featurizer(kbt, corpus, feature_counter):
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for synset in get_synsets(ex.middle_POS):
            feature_counter[synset] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for synset in get_synsets(ex.middle_POS):
            feature_counter[synset] += 1

    return feature_counter


def get_synsets(s):
    """Suggested helper method for `synset_featurizer`. This should
    be completed so that it returns a list of stringified Synsets 
    associated with elements of `s`.
    """   
    # Use `parse_lem` from the previous question to get a list of
    # (word, POS) pairs. Remember to convert the POS strings.
    wt = [parse_lem(lem) for lem in s.strip().split(' ') if lem]
    new = []
    for element in wt:
        synsets = wn.synsets(element[0], pos=convert_tag(element[1]))
        for synset in synsets:
            new.append(str(synset))
    return new
    
  
    
def convert_tag(t):
    """Converts tags so that they can be used by WordNet:
    
    | Tag begins with | WordNet tag |
    |-----------------|-------------|
    | `N`             | `n`         |
    | `V`             | `v`         |
    | `J`             | `a`         |
    | `R`             | `r`         |
    | Otherwise       | `None`      |
    """        
    if t[0].lower() in {'n', 'v', 'r'}:
        return t[0].lower()
    elif t[0].lower() == 'j':
        return 'a'
    else:
        return None    


# Call to `rel_ext.experiment`:
baseline_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[synset_featurizer],
    model_factory=model_factory,
    verbose=True)



relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.759      0.315      0.592        340       5716
author                    0.739      0.462      0.660        509       5885
capital                   0.485      0.168      0.352         95       5471
contains                  0.781      0.587      0.733       3904       9280
film_performance          0.754      0.548      0.701        766       6142
founders                  0.768      0.384      0.640        380       5756
genre                     0.478      0.194      0.370        170       5546
has_sibling               0.797      0.228      0.532        499       5875
has_spouse                0.893      0.308      0.647        594       5970
is_a                      0.661      0.256      0.502        497       5873
nationality               0.494      0.146      0.335        301       5677
parents     

In [30]:
def test_synset_featurizer(corpus):
    from collections import defaultdict
    kbt = rel_ext.KBTriple(rel='worked_at', sbj='Randall_Munroe', obj='xkcd')
    feature_counter = defaultdict(int)
    # Make sure `feature_counter` is being updated, not reinitialized:
    feature_counter["Synset('be.v.01')"] += 5
    feature_counter = synset_featurizer(kbt, corpus, feature_counter)
    # The full return values for this tend to be long, so we just
    # test a few examples to avoid cluttering up this notebook.
    test_cases = {
        "Synset('be.v.01')": 6,
        "Synset('embody.v.02')": 1
    }
    for ss, expected in test_cases.items():   
        result = feature_counter[ss]
        assert result == expected, \
            "Incorrect count for {}: Expected {}; Got {}".format(ss, expected, result)

In [31]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_synset_featurizer(corpus)

### Your original system [3 points]

There are many options, and this could easily grow into a project. Here are a few ideas:

- Try out different classifier models, from `sklearn` and elsewhere.
- Add a feature that indicates the length of the middle.
- Augment the bag-of-words representation to include bigrams or trigrams (not just unigrams).
- Introduce features based on the entity mentions themselves. <!-- \[SPOILER: it helps a lot, maybe 4% in F-score. And combines nicely with the directional features.\] -->
- Experiment with features based on the context outside (rather than between) the two entity mentions — that is, the words before the first mention, or after the second.
- Try adding features which capture syntactic information, such as the dependency-path features used by Mintz et al. 2009. The [NLTK](https://www.nltk.org/) toolkit contains a variety of [parsing algorithms](http://www.nltk.org/api/nltk.parse.html) that may help.
- The bag-of-words representation does not permit generalization across word categories such as names of people, places, or companies. Can we do better using word embeddings such as [GloVe](https://nlp.stanford.edu/projects/glove/)?

In the cell below, please provide a brief technical description of your original system, so that the teaching team can gain an understanding of what it does. This will help us to understand your code and analyze all the submissions to identify patterns and strategies.

In [35]:
# Enter your system description in this cell.


# The system is a simple one but incorporate many of the features that were tested before or suggested by the question
# prompt. Specifically it incorporates a bidirectional bag of words that also inclused others features: POS tags and
# the length fo the middle section. 

def custom_featurizer(kbt, corpus, feature_counter):
    
    from sklearn.neural_network import MLPClassifier
    
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split(' '):
            feature_counter[word + subject_object_suffix] += 1
        for POS in get_tag_bigrams(ex):
            feature_counter[POS] += 1
        feature_counter['length'] = len(ex)
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for word in ex.middle.split(' '):
            feature_counter[word + object_subject_suffix] += 1
        for POS in get_tag_bigrams(ex):
            feature_counter[POS] += 1
        feature_counter['length'] = len(ex)
    return feature_counter


model_factory = lambda: LogisticRegression(fit_intercept=True, solver='liblinear')
bakeoff_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[custom_featurizer],
    model_factory=model_factory,
    verbose=True)
# Please do not remove this comment.

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.882      0.594      0.804        340       5716
author                    0.881      0.804      0.865        509       5885
capital                   0.741      0.421      0.643         95       5471
contains                  0.856      0.734      0.829       3904       9280
film_performance          0.854      0.697      0.818        766       6142
founders                  0.843      0.566      0.768        380       5756
genre                     0.707      0.341      0.582        170       5546
has_sibling               0.923      0.529      0.803        499       5875
has_spouse                0.935      0.631      0.853        594       5970
is_a                      0.839      0.535      0.754        497       5873
nationality               0.785      0.618      0.745        301       5677
parents     

## Bake-off [1 point]

For the bake-off, we will release a test set. The announcement will go out on the discussion forum. You will evaluate your custom model from the previous question on these new datasets using the function `rel_ext.bake_off_experiment`. Rules:

1. Only one evaluation is permitted.
1. No additional system tuning is permitted once the bake-off has started.

The cells below this one constitute your bake-off entry.

People who enter will receive the additional homework point, and people whose systems achieve the top score will receive an additional 0.5 points. We will test the top-performing systems ourselves, and only systems for which we can reproduce the reported results will win the extra 0.5 points.

Late entries will be accepted, but they cannot earn the extra 0.5 points. Similarly, you cannot win the bake-off unless your homework is submitted on time.

The announcement will include the details on where to submit your entry.

In [36]:
# Enter your bake-off assessment code in this cell. 
# Please do not remove this comment.
if 'IS_GRADESCOPE_ENV' not in os.environ:
    pass
    # Please enter your code in the scope of the above conditional.
    rel_ext_data_home_test = os.path.join(
    rel_ext_data_home, 'bakeoff-rel_ext-test-data')
    rel_ext.bake_off_experiment(bakeoff_results, rel_ext_data_home_test)
    

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.925      0.619      0.842        438       7122
author                    0.881      0.794      0.862        645       7329
capital                   0.734      0.409      0.633        115       6799
contains                  0.840      0.724      0.814       3808      10492
film_performance          0.862      0.705      0.825       1011       7695
founders                  0.849      0.633      0.795        444       7128
genre                     0.676      0.367      0.579        188       6872
has_sibling               0.908      0.550      0.803        717       7401
has_spouse                0.924      0.644      0.850        780       7464
is_a                      0.791      0.520      0.717        611       7295
nationality               0.849      0.527      0.757        383       7067
parents     

In [None]:
# On an otherwise blank line in this cell, please enter
# your macro-average f-score (an F_0.5 score) as reported 
# by the code above. Please enter only a number between 
# 0 and 1 inclusive. Please do not remove this comment.
if 'IS_GRADESCOPE_ENV' not in os.environ:
    pass
    0.76
    # Please enter your score in the scope of the above conditional.
    


