# Intro. to Snorkel: Extracting Spouse Relations from the News

## Part V: Evaluating the Model on the Test Set

In this final part of the tutorial, we will reload the model developed in Part IV and test it on the test `CandidateSet`.

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import os

# TO USE A DATABASE OTHER THAN SQLITE, USE THIS LINE
# Note that this is necessary for parallel execution amongst other things...
# os.environ['SNORKELDB'] = 'postgres:///snorkel-intro'

import numpy as np
from snorkel import SnorkelSession
session = SnorkelSession()



We repeat our definition of the `Spouse` `Candidate` subclass, and load the test set:

In [2]:
from snorkel.models import candidate_subclass

Spouse = candidate_subclass('Spouse', ['person1', 'person2'])

## Automatically Creating Features
We also generate features for the test set; note that we specify `create_new_keyset=False` to use the features from the training set.  For further details see _Intro Tutorial 4_.

In [3]:
from snorkel.annotations import FeatureAnnotator

featurizer   = FeatureAnnotator()
%time F_test = featurizer.apply_existing(split=2)

Clearing existing...
Running UDF...

CPU times: user 46.8 s, sys: 720 ms, total: 47.6 s
Wall time: 53.4 s


If we've already computed the features, again we can just use the below step:

In [4]:
F_test = featurizer.load_matrix(session, split=2)
F_test

<279x119615 sparse matrix of type '<type 'numpy.float64'>'
	with 8079 stored elements in Compressed Sparse Row format>

## Reloading the Discriminative Model

In [5]:
from snorkel.learning import LogisticRegression

disc_model = LogisticRegression(save_file='SpouseLR')

because the backend has already been chosen;
matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.



[LR] Loaded model <SpouseLR>


## Evaluating on the Test `CandidateSet`

First, we load the test set labels and gold candidates we made in Part III.

In [6]:
from snorkel.annotations import load_gold_labels
L_gold_test = load_gold_labels(session, annotator_name='gold', split=2)

In [7]:
tp, fp, tn, fn = disc_model.score(session, F_test, L_gold_test)

Scores (Un-adjusted)
Pos. class accuracy: 1.0
Neg. class accuracy: 0.0
Precision            0.0251
Recall               1.0
F1                   0.049
----------------------------------------
TP: 7 | FP: 272 | TN: 0 | FN: 0



## Viewing Examples
We can also view the results on the test `CandidateSet`,  You can view the true positives, false positives, true negatives, and false negatives using the `Viewer`.

In [14]:
from snorkel.viewer import SentenceNgramViewer

# NOTE: This if-then statement is only to avoid opening the viewer during automated testing of this notebook
# You should ignore this!
import os
if 'CI' not in os.environ:
    sv = SentenceNgramViewer(fp, session, annotator_name="Tutorial Part V User")
else:
    sv = None

<IPython.core.display.Javascript object>

In [25]:
print(fp)

set([Spouse(Span("John Stamos", sentence=29553, chars=[10,20], words=[4,5]), Span("Fox", sentence=29553, chars=[4,6], words=[2,2])), Spouse(Span("Hrehaan", sentence=28979, chars=[59,65], words=[11,11]), Span("Hrithik", sentence=28979, chars=[0,6], words=[0,0])), Spouse(Span("Yennai Arindhaal", sentence=29221, chars=[8,23], words=[2,3]), Span("Sivakarthikeyan", sentence=29221, chars=[67,81], words=[11,11])), Spouse(Span("David Ben-Gurion", sentence=29682, chars=[153,168], words=[29,30]), Span("Kennedy", sentence=29682, chars=[242,248], words=[44,44])), Spouse(Span("Gordon", sentence=29064, chars=[150,155], words=[29,29]), Span("Dale Arden", sentence=29064, chars=[161,170], words=[31,32])), Spouse(Span("Lawrence Hundersmarck", sentence=28858, chars=[384,404], words=[73,74]), Span("Alexander Twilight", sentence=28858, chars=[554,571], words=[103,104])), Spouse(Span("Simon van Kempen", sentence=29434, chars=[66,81], words=[14,16]), Span("Alex McCord", sentence=29434, chars=[50,60], words=[

You've completed the introduction to Snorkel!