# Testing `TFNoiseAwareModel` in Jupyter Notebook

We'll start by testing the `textRNN` model on a categorical problem from `tutorials/crowdsourcing`.  In particular we'll test for (a) basic performance and (b) proper construction / re-construction of the TF computation graph both after (i) repeated notebook calls, and (ii) with `GridSearch` in particular.

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import os
os.environ['SNORKELDB'] = 'sqlite:///{0}{1}crowdsourcing.db'.format(os.getcwd(), os.sep)

from snorkel import SnorkelSession
session = SnorkelSession()

### Load candidates and training marginals

In [2]:
from snorkel.models import candidate_subclass
from snorkel.contrib.models.text import RawText
Tweet = candidate_subclass('Tweet', ['tweet'], cardinality=5)
train_tweets = session.query(Tweet).filter(Tweet.split == 0).order_by(Tweet.id).all()
len(train_tweets)

568

In [3]:
from snorkel.annotations import load_marginals
train_marginals = load_marginals(session, train_tweets, split=0)
train_marginals.shape

(568, 5)

### Train `LogisticRegression`

In [4]:
# Simple unigram featurizer
def get_unigram_tweet_features(c):
    for w in c.tweet.text.split():
        yield w, 1

# Construct feature matrix
from snorkel.annotations import FeatureAnnotator
featurizer = FeatureAnnotator(f=get_unigram_tweet_features)

%time F_train = featurizer.apply(split=0)
F_train

Clearing existing...
Running UDF...

CPU times: user 4.06 s, sys: 83.7 ms, total: 4.14 s
Wall time: 4.13 s


<568x3526 sparse matrix of type '<type 'numpy.int64'>'
	with 8126 stored elements in Compressed Sparse Row format>

In [5]:
%time F_test = featurizer.apply_existing(split=1)
F_test

Clearing existing...
Running UDF...

CPU times: user 723 ms, sys: 48.3 ms, total: 771 ms
Wall time: 754 ms


<64x3526 sparse matrix of type '<type 'numpy.int64'>'
	with 539 stored elements in Compressed Sparse Row format>

In [6]:
from snorkel.learning import LogisticRegression

model = LogisticRegression(cardinality=Tweet.cardinality)
model.train(F_train.todense(), train_marginals)

[LogisticRegression] Training model
[LogisticRegression] n_train=568  #epochs=25  batch size=256
[LogisticRegression] Epoch 0 (0.03s)	Average loss=1.539132
[LogisticRegression] Epoch 5 (0.11s)	Average loss=0.536786
[LogisticRegression] Epoch 10 (0.19s)	Average loss=0.251106
[LogisticRegression] Epoch 15 (0.29s)	Average loss=0.182942
[LogisticRegression] Epoch 20 (0.37s)	Average loss=0.134229
[LogisticRegression] Epoch 24 (0.44s)	Average loss=0.102803
[LogisticRegression] Training done (0.44s)


### Train `SparseLogisticRegression`

Note: Testing doesn't currently work with `LogisticRegression` above, but no real reason to use that over this...

In [7]:
from snorkel.learning import SparseLogisticRegression

model = SparseLogisticRegression(cardinality=Tweet.cardinality)
model.train(F_train, train_marginals, n_epochs=50, print_freq=10)

[SparseLogisticRegression] Training model
[SparseLogisticRegression] n_train=568  #epochs=50  batch size=256
[SparseLogisticRegression] Epoch 0 (0.02s)	Average loss=1.622139
[SparseLogisticRegression] Epoch 10 (0.14s)	Average loss=0.260563
[SparseLogisticRegression] Epoch 20 (0.27s)	Average loss=0.112988
[SparseLogisticRegression] Epoch 30 (0.39s)	Average loss=0.073128
[SparseLogisticRegression] Epoch 40 (0.52s)	Average loss=0.064021
[SparseLogisticRegression] Epoch 49 (0.63s)	Average loss=0.050803
[SparseLogisticRegression] Training done (0.63s)


In [None]:
import numpy as np
test_labels = np.load('crowdsourcing_test_labels.npy')
acc = model.score(F_test, test_labels)
print acc
assert acc > 0.6

0.703125


### Train basic LSTM

With dev set scoring during execution (note we use test set here to be simple)

In [None]:
from snorkel.learning import TextRNN
test_tweets = session.query(Tweet).filter(Tweet.split == 1).order_by(Tweet.id).all()

train_kwargs = {
    'dim':        100,
    'lr':         0.001,
    'n_epochs':   100,
    'dropout':    0.2,
    'print_freq': 5
}
lstm = TextRNN(seed=1701, cardinality=Tweet.cardinality)
lstm.train(train_tweets, train_marginals, X_dev=test_tweets, Y_dev=test_labels, **train_kwargs)

[TextRNN] Training model
[TextRNN] n_train=568  #epochs=100  batch size=256
[TextRNN] Epoch 0 (0.44s)	Average loss=1.602129	Dev Acc.=40.62
[TextRNN] Epoch 5 (2.19s)	Average loss=1.342293	Dev Acc.=34.38
[TextRNN] Epoch 10 (3.91s)	Average loss=0.730269	Dev Acc.=53.12
[TextRNN] Epoch 15 (5.63s)	Average loss=0.312139	Dev Acc.=68.75
[TextRNN] Epoch 20 (7.35s)	Average loss=0.139608	Dev Acc.=62.50
[TextRNN] Epoch 25 (9.09s)	Average loss=0.086493	Dev Acc.=65.62
[TextRNN] Epoch 30 (10.81s)	Average loss=0.080443	Dev Acc.=70.31
[TextRNN] Epoch 35 (12.56s)	Average loss=0.045322	Dev Acc.=67.19
[TextRNN] Epoch 40 (14.26s)	Average loss=0.050468	Dev Acc.=68.75
[TextRNN] Epoch 45 (15.98s)	Average loss=0.045916	Dev Acc.=70.31
[TextRNN] Epoch 50 (17.73s)	Average loss=0.031020	Dev Acc.=68.75
[TextRNN] Epoch 55 (19.45s)	Average loss=0.026304	Dev Acc.=67.19
[TextRNN] Epoch 60 (21.18s)	Average loss=0.030072	Dev Acc.=67.19
[TextRNN] Epoch 65 (22.91s)	Average loss=0.021733	Dev Acc.=65.62
[TextRNN] Epoch 70 (24

In [None]:
acc = lstm.score(test_tweets, test_labels)
print acc
assert acc > 0.60

### Run `GridSearch`

In [None]:
from snorkel.learning.utils import GridSearch, RangeParameter

lstm = TextRNN(seed=1701, cardinality=Tweet.cardinality)

# Searching over learning rate
rate_param = RangeParameter('lr', 1e-4, 1e-3, step=1, log_base=10)
dim_param = RangeParameter('dim', 50, 100, step=50)
searcher = GridSearch(session, lstm, train_tweets, train_marginals, [rate_param, dim_param])

# Use test set here (just for testing)
train_kwargs = {
    'dim':        100,
    'n_epochs':   50,
    'dropout':    0.2,
    'print_freq': 10
}
searcher.fit(test_tweets, test_labels, **train_kwargs)

In [None]:
acc = lstm.score(test_tweets, test_labels)
print acc
assert acc > 0.60

### Reload saved model outside of `GridSearch`

In [None]:
lstm = TextRNN(seed=1701, cardinality=Tweet.cardinality)
lstm.load('TextRNN_3')
acc = lstm.score(test_tweets, test_labels)
print acc
assert acc > 0.60

### Reload a model with different structure

In [None]:
lstm.load('TextRNN_0')
acc = lstm.score(test_tweets, test_labels)
print acc
assert acc < 0.60