# Testing `TFNoiseAwareModel` in Jupyter Notebook

We'll start by testing the `textRNN` model on a categorical problem from `tutorials/crowdsourcing`.  In particular we'll test for (a) basic performance and (b) proper construction / re-construction of the TF computation graph both after (i) repeated notebook calls, and (ii) with `GridSearch` in particular.

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import os
os.environ['SNORKELDB'] = 'sqlite:///{0}{1}crowdsourcing.db'.format(os.getcwd(), os.sep)

from snorkel import SnorkelSession
session = SnorkelSession()

### Load candidates and training marginals

In [2]:
from snorkel.models import candidate_subclass
from snorkel.contrib.models.text import RawText
Tweet = candidate_subclass('Tweet', ['tweet'], cardinality=5)
train_tweets = session.query(Tweet).filter(Tweet.split == 0).order_by(Tweet.id).all()
len(train_tweets)

568

In [3]:
from snorkel.annotations import load_marginals
train_marginals = load_marginals(session, train_tweets, split=0)
train_marginals.shape

(568, 5)

### Train `LogisticRegression`

In [4]:
# Simple unigram featurizer
def get_unigram_tweet_features(c):
    for w in c.tweet.text.split():
        yield w, 1

# Construct feature matrix
from snorkel.annotations import FeatureAnnotator
featurizer = FeatureAnnotator(f=get_unigram_tweet_features)

%time F_train = featurizer.apply(split=0)
F_train

Clearing existing...
Running UDF...

CPU times: user 3.29 s, sys: 64.8 ms, total: 3.35 s
Wall time: 3.33 s


<568x3526 sparse matrix of type '<type 'numpy.int64'>'
	with 8126 stored elements in Compressed Sparse Row format>

In [5]:
%time F_test = featurizer.apply_existing(split=1)
F_test

Clearing existing...
Running UDF...

CPU times: user 613 ms, sys: 28.2 ms, total: 641 ms
Wall time: 631 ms


<64x3526 sparse matrix of type '<type 'numpy.int64'>'
	with 539 stored elements in Compressed Sparse Row format>

In [6]:
from snorkel.learning import LogisticRegression

model = LogisticRegression(cardinality=Tweet.cardinality)
model.train(F_train.todense(), train_marginals)

[LogisticRegression] Training model
[LogisticRegression] n_train=568  #epochs=25  batch size=256
[LogisticRegression] Epoch 0 (0.03s)	Average loss=311.697540
[LogisticRegression] Epoch 5 (0.11s)	Average loss=104.337334
[LogisticRegression] Epoch 10 (0.19s)	Average loss=48.822495
[LogisticRegression] Epoch 15 (0.27s)	Average loss=29.923157
[LogisticRegression] Epoch 20 (0.35s)	Average loss=21.392838
[LogisticRegression] Epoch 24 (0.42s)	Average loss=17.463308
[LogisticRegression] Training done (0.42s)


### Train `SparseLogisticRegression`

Note: Testing doesn't currently work with `LogisticRegression` above, but no real reason to use that over this...

In [7]:
from snorkel.learning import SparseLogisticRegression

model = SparseLogisticRegression(cardinality=Tweet.cardinality)
model.train(F_train, train_marginals, n_epochs=50, print_freq=10)

[SparseLogisticRegression] Training model
[SparseLogisticRegression] n_train=568  #epochs=50  batch size=256
[SparseLogisticRegression] Epoch 0 (0.02s)	Average loss=312.012482
[SparseLogisticRegression] Epoch 10 (0.14s)	Average loss=48.253460
[SparseLogisticRegression] Epoch 20 (0.26s)	Average loss=21.264622
[SparseLogisticRegression] Epoch 30 (0.38s)	Average loss=13.666287
[SparseLogisticRegression] Epoch 40 (0.49s)	Average loss=10.200731
[SparseLogisticRegression] Epoch 49 (0.60s)	Average loss=8.400674
[SparseLogisticRegression] Training done (0.60s)


In [8]:
import numpy as np
test_labels = np.load('crowdsourcing_test_labels.npy')
acc = model.score(F_test, test_labels)
assert acc > 0.6
acc

0.734375

### Train basic LSTM

With dev set scoring during execution (note we use test set here to be simple)

In [9]:
from snorkel.learning import TextRNN
test_tweets = session.query(Tweet).filter(Tweet.split == 1).order_by(Tweet.id).all()

train_kwargs = {
    'dim':        50,
    'lr':         0.001,
    'n_epochs':   100,
    'dropout':    0.2,
    'print_freq': 5
}
lstm = TextRNN(seed=1701, cardinality=Tweet.cardinality)
lstm.train(train_tweets, train_marginals, X_dev=test_tweets, Y_dev=test_labels, dev_ckpt_delay=0.0, **train_kwargs)

[TextRNN] Training model
[TextRNN] n_train=568  #epochs=100  batch size=256
[TextRNN] Epoch 0 (0.24s)	Average loss=304.533478	Dev Acc.=31.25
[TextRNN] Epoch 5 (1.11s)	Average loss=288.815521	Dev Acc.=40.62
[TextRNN] Model saved as <TextRNN>
[TextRNN] Epoch 10 (3.01s)	Average loss=246.427124	Dev Acc.=40.62
[TextRNN] Epoch 15 (3.88s)	Average loss=162.491180	Dev Acc.=42.19
[TextRNN] Model saved as <TextRNN>
[TextRNN] Epoch 20 (5.88s)	Average loss=79.957619	Dev Acc.=50.00
[TextRNN] Model saved as <TextRNN>
[TextRNN] Epoch 25 (8.04s)	Average loss=35.223850	Dev Acc.=59.38
[TextRNN] Model saved as <TextRNN>
[TextRNN] Epoch 30 (10.40s)	Average loss=27.104734	Dev Acc.=60.94
[TextRNN] Model saved as <TextRNN>
[TextRNN] Epoch 35 (12.81s)	Average loss=16.695681	Dev Acc.=60.94
[TextRNN] Epoch 40 (13.67s)	Average loss=12.820331	Dev Acc.=59.38
[TextRNN] Epoch 45 (14.52s)	Average loss=10.488051	Dev Acc.=62.50
[TextRNN] Model saved as <TextRNN>
[TextRNN] Epoch 50 (16.65s)	Average loss=8.907477	Dev Acc.

In [10]:
acc = lstm.score(test_tweets, test_labels)
assert acc > 0.60
acc

0.671875

### Run `GridSearch`

In [11]:
from snorkel.learning.utils import GridSearch, RangeParameter

lstm = TextRNN(seed=1701, cardinality=Tweet.cardinality)

# Searching over learning rate
rate_param = RangeParameter('lr', 1e-4, 1e-3, step=1, log_base=10)
dim_param = RangeParameter('dim', 50, 100, step=50)
searcher = GridSearch(session, lstm, train_tweets, train_marginals, [rate_param, dim_param])

# Use test set here (just for testing)
train_kwargs = {
    'dim':        100,
    'n_epochs':   50,
    'dropout':    0.2,
    'print_freq': 10
}
searcher.fit(test_tweets, test_labels, **train_kwargs)

[1] Testing lr = 1.00e-04, dim = 5.00e+01
[TextRNN] Training model
[TextRNN] n_train=568  #epochs=50  batch size=256
[TextRNN] Epoch 0 (0.25s)	Average loss=304.922089
[TextRNN] Epoch 10 (2.01s)	Average loss=302.557037
[TextRNN] Epoch 20 (3.71s)	Average loss=299.895447
[TextRNN] Epoch 30 (5.44s)	Average loss=296.205902
[TextRNN] Epoch 40 (7.16s)	Average loss=289.982208
[TextRNN] Epoch 49 (8.69s)	Average loss=278.808136
[TextRNN] Training done (8.69s)
[TextRNN] Accuracy: 0.359375
[TextRNN] Model saved as <TextRNN_0>
[2] Testing lr = 1.00e-04, dim = 1.00e+02
[TextRNN] Training model
[TextRNN] n_train=568  #epochs=50  batch size=256
[TextRNN] Epoch 0 (0.38s)	Average loss=304.893097
[TextRNN] Epoch 10 (3.69s)	Average loss=298.973969
[TextRNN] Epoch 20 (6.98s)	Average loss=291.799713
[TextRNN] Epoch 30 (10.29s)	Average loss=279.220978
[TextRNN] Epoch 40 (13.60s)	Average loss=246.922668
[TextRNN] Epoch 49 (16.58s)	Average loss=195.988663
[TextRNN] Training done (16.58s)
[TextRNN] Accuracy: 0.

Unnamed: 0,lr,dim,Acc.
3,0.001,100,0.703125
2,0.001,50,0.609375
1,0.0001,100,0.421875
0,0.0001,50,0.359375


In [12]:
acc = lstm.score(test_tweets, test_labels)
assert acc > 0.60

### Reload saved model outside of `GridSearch`

In [13]:
lstm = TextRNN(seed=1701, cardinality=Tweet.cardinality)
lstm.load('TextRNN_3')
acc = lstm.score(test_tweets, test_labels)
print acc
assert acc > 0.60

[TextRNN] Loaded model <TextRNN_3>
0.703125


### Reload a model with different structure

In [14]:
lstm.load('TextRNN_0')
acc = lstm.score(test_tweets, test_labels)
print acc
assert acc < 0.60

[TextRNN] Loaded model <TextRNN_0>
0.359375
