# Intro. to Snorkel: Extracting Spouse Relations from the News

## Part 4: Training our End Extraction Model

In this final section of the tutorial, we'll use the noisy training labels we generated in the last tutorial part to train our end extraction model.

For this tutorial, we will be training a simple - but fairly effective - logistic regression model.  More generally, however, Snorkel plugs in with many ML libraries including [TensorFlow](https://www.tensorflow.org/), making it easy to use almost any state-of-the-art model as the end extractor!

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import os

import numpy as np
from snorkel import SnorkelSession
session = SnorkelSession()

We repeat our definition of the `Spouse` `Candidate` subclass, and load the test set:

In [2]:
from snorkel.models import candidate_subclass
Spouse = candidate_subclass('Spouse', ['person1', 'person2'])

# 1 Training a `SparseLogReg` Discriminative Model
We use the training marginals to train a discriminative model that classifies each `Candidate` as a true or false mention. We'll use a random hyperparameter search, evaluated on the development set labels, to find the best hyperparameters for our model. To run a hyperparameter search, we need labels for a development set. If they aren't already available, we can manually create labels using the Viewer.

## Feature Extraction
Instead of using a deep learning approach to start, let's look at a standard sparse logistic regression model. First, we need to extract out features. This can take a while, but we only have to do it once!

In [3]:
from snorkel.annotations import FeatureAnnotator
featurizer = FeatureAnnotator()

In [4]:
F_train = featurizer.load_matrix(session, split=0)
F_dev = featurizer.load_matrix(session, split=1)
F_test = featurizer.load_matrix(session, split=2)

if F_train.size == 0:    
    %time F_train = featurizer.apply(split=0)
if F_dev.size == 0:     
    %time F_dev  = featurizer.apply_existing(split=1)
if F_test == 0:
    %time F_test = featurizer.apply_existing(split=2)

Clearing existing...
Running UDF...

CPU times: user 10min 54s, sys: 3.19 s, total: 10min 57s
Wall time: 11min
Clearing existing...
Running UDF...

CPU times: user 22 s, sys: 253 ms, total: 22.3 s
Wall time: 22.4 s
Clearing existing...
Running UDF...

CPU times: user 36.4 s, sys: 435 ms, total: 36.9 s
Wall time: 37.1 s




First, reload the training marginals:

In [5]:
from snorkel.annotations import load_marginals
train_marginals = load_marginals(session, F_train, split=0)

In [6]:
from snorkel.learning import SparseLogisticRegression
disc_model = SparseLogisticRegression()

The following code performs model selection by tuning our learning algorithm's hyperparamters.

In [7]:
from snorkel.learning.utils import MentionScorer
from snorkel.learning import RandomSearch, ListParameter, RangeParameter

# Searching over learning rate
rate_param = RangeParameter('lr', 1e-6, 1e-2, step=1, log_base=10)
l1_param  = RangeParameter('l1_penalty', 1e-6, 1e-2, step=1, log_base=10)
l2_param  = RangeParameter('l2_penalty', 1e-6, 1e-2, step=1, log_base=10)

searcher = RandomSearch(session, disc_model, F_train, train_marginals, [rate_param, l1_param, l2_param], n=20)

Initialized RandomSearch search of size 20. Search space size = 125.


In [8]:
from snorkel.annotations import load_gold_labels
L_gold_dev = load_gold_labels(session, annotator_name='gold', split=1)
L_gold_dev.shape

(228, 1)

In [9]:
np.random.seed(1701)
searcher.fit(F_dev, L_gold_dev, n_epochs=50, rebalance=0.5, print_freq=25)

[1] Testing lr = 1.00e-02, l1_penalty = 1.00e-03, l2_penalty = 1.00e-04
[SparseLR] lr=0.01 l1=0.001 l2=0.0001
[SparseLR] Building model
[SparseLR] Training model
[SparseLR] #examples=392  #epochs=50  batch size=100
[SparseLR] Epoch 0 (0.31s)	Avg. loss=0.825656	NNZ=136398
[SparseLR] Epoch 25 (4.26s)	Avg. loss=0.562187	NNZ=136398
[SparseLR] Epoch 49 (8.08s)	Avg. loss=0.560432	NNZ=136398
[SparseLR] Training done (8.08s)
[SparseLR] Model saved. To load, use name
		SparseLR_0
[2] Testing lr = 1.00e-04, l1_penalty = 1.00e-06, l2_penalty = 1.00e-03
[SparseLR] lr=0.0001 l1=1e-06 l2=0.001
[SparseLR] Building model
[SparseLR] Training model
[SparseLR] #examples=392  #epochs=50  batch size=100
[SparseLR] Epoch 0 (0.31s)	Avg. loss=0.784314	NNZ=136398
[SparseLR] Epoch 25 (4.16s)	Avg. loss=0.708896	NNZ=136398
[SparseLR] Epoch 49 (7.89s)	Avg. loss=0.663600	NNZ=136398
[SparseLR] Training done (7.89s)
[SparseLR] Model saved. To load, use name
		SparseLR_1
[3] Testing lr = 1.00e-03, l1_penalty = 1.00e-0

Unnamed: 0,lr,l1_penalty,l2_penalty,Prec.,Rec.,F1
3,0.001,1e-06,0.001,0.4,0.285714,0.333333
4,0.01,0.0001,1e-05,0.4,0.285714,0.333333
19,0.01,1e-05,0.0001,0.5,0.142857,0.222222
2,0.001,1e-05,1e-05,0.166667,0.285714,0.210526
17,0.01,1e-05,1e-06,0.25,0.142857,0.181818
7,0.01,1e-05,0.01,0.2,0.142857,0.166667
5,1e-06,0.001,1e-05,0.048276,1.0,0.092105
12,0.0001,0.0001,1e-05,0.045752,1.0,0.0875
6,1e-06,0.001,0.01,0.043796,0.857143,0.083333
16,1e-06,0.01,0.001,0.041667,0.714286,0.07874


## Examining Features
Extracting features allows us to inspect and interperet our learned weights 

In [10]:
w, _ = disc_model.get_weights()
largest_idxs = reversed(np.argsort(np.abs(w))[-5:])
for i in largest_idxs:
    print('Feature: {0: <70}Weight: {1:.6f}'.format(F_train.get_key(session, i).name, w[i]))

Feature: TDL_INV_LEMMA:SEQ-BETWEEN[\ nmontano replace]                         Weight: -0.398191
Feature: TDL_LEMMA:SEQ-BETWEEN[as if]                                          Weight: -0.395683
Feature: TDL_LEMMA:SEQ-BETWEEN[predict]                                        Weight: -0.392996
Feature: TDL_LEMMA:SEQ-BETWEEN[visit at the]                                   Weight: -0.388405
Feature: TDL_DEP_LABEL|LEMMA:BETWEEN-MENTION-and-MENTION[nsubjpass|Bobby]      Weight: -0.380656


## Evaluate on Test Data

In [11]:
from snorkel.annotations import load_gold_labels
L_gold_test = load_gold_labels(session, annotator_name='gold', split=2)

In [12]:
_, _, _, _ = disc_model.score(session, F_test, L_gold_test)

Scores (Un-adjusted)
Pos. class accuracy: 0.0
Neg. class accuracy: 0.971
Precision            0.0
Recall               0.0
F1                   0.0
----------------------------------------
TP: 0 | FP: 8 | TN: 264 | FN: 6



# 2: Training an LSTM Discriminative Model
Deep learning allows us to train models without manually definining features

In [13]:
train = session.query(Spouse).filter(Spouse.split == 0).order_by(Spouse.id).all()
dev = session.query(Spouse).filter(Spouse.split == 1).order_by(Spouse.id).all()
test = session.query(Spouse).filter(Spouse.split == 2).order_by(Spouse.id).all()

print('Training set:\t{0} candidates'.format(len(train)))
print('Dev set:\t{0} candidates'.format(len(dev)))
print('Test set:\t{0} candidates'.format(len(test)))

Training set:	4781 candidates
Dev set:	228 candidates
Test set:	278 candidates


In [14]:
# Load dev labels and convert to [0, 1] range
dev_labels = (np.ravel(L_gold_dev.todense()) + 1) / 2

In [15]:
from snorkel.contrib.rnn import reRNN

train_kwargs = {
    'lr':         0.01,
    'dim':        100,
    'n_epochs':   50,
    'dropout':    0.5,
    'rebalance':  0.25,
    'print_freq': 5
}

lstm = reRNN(seed=1701, n_threads=None)
lstm.train(train, train_marginals, dev_candidates=dev, dev_labels=dev_labels, **train_kwargs)

[reRNN] Dimension=100  LR=0.01
[reRNN] Begin preprocessing
[reRNN] Loaded 228 candidates for evaluation
[reRNN] Preprocessing done (15.75s)
[reRNN] Training model
[reRNN] #examples=784  #epochs=50  batch size=256
[reRNN] Epoch 0 (11.13s)	Average loss=0.680978	Dev F1=0.00
[reRNN] Epoch 5 (56.46s)	Average loss=0.505550	Dev F1=0.00
[reRNN] Epoch 10 (100.32s)	Average loss=0.496826	Dev F1=0.00
[reRNN] Epoch 15 (138.23s)	Average loss=0.495360	Dev F1=0.00
[reRNN] Epoch 20 (175.94s)	Average loss=0.495219	Dev F1=0.00
[reRNN] Epoch 25 (215.71s)	Average loss=0.494242	Dev F1=0.00
[reRNN] Epoch 30 (261.43s)	Average loss=0.494273	Dev F1=0.00
[reRNN] Epoch 35 (309.35s)	Average loss=0.493773	Dev F1=0.00
[reRNN] Epoch 40 (352.63s)	Average loss=0.493954	Dev F1=0.00
[reRNN] Epoch 45 (396.19s)	Average loss=0.493807	Dev F1=0.00
[reRNN] Epoch 49 (434.40s)	Average loss=0.493831	Dev F1=0.00
[reRNN] Training done (435.17s)


## 3. Evaluating on the Test Set

In this last section of the tutorial, we'll get the score we've been after: the performance of the extraction model on the blind test set (`split` 2). First, we load the test set labels and gold candidates we made in Part III.

In [16]:
from snorkel.annotations import load_gold_labels
L_gold_test = load_gold_labels(session, annotator_name='gold', split=2)

Now, we score using the discriminative model:

In [17]:
_, _, _, _  = lstm.score(session, test, L_gold_test)

Scores (Un-adjusted)
Pos. class accuracy: 0.0
Neg. class accuracy: 0.963
Precision            0.0
Recall               0.0
F1                   0.0
----------------------------------------
TP: 0 | FP: 10 | TN: 262 | FN: 6

