# Testing `TFNoiseAwareModel`

We'll start by testing the `textRNN` model on a categorical problem from `tutorials/crowdsourcing`.  In particular we'll test for (a) basic performance and (b) proper construction / re-construction of the TF computation graph both after (i) repeated notebook calls, and (ii) with `GridSearch` in particular.

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import os
os.environ['SNORKELDB'] = 'sqlite:///{0}{1}crowdsourcing.db'.format(os.getcwd(), os.sep)

from snorkel import SnorkelSession
session = SnorkelSession()

### Load candidates and training marginals

In [2]:
from snorkel.models import candidate_subclass
from snorkel.contrib.models.text import RawText
Tweet = candidate_subclass('Tweet', ['tweet'], cardinality=5)
train_tweets = session.query(Tweet).filter(Tweet.split == 0).order_by(Tweet.id).all()
len(train_tweets)

568

In [3]:
from snorkel.annotations import load_marginals
train_marginals = load_marginals(session, train_tweets, split=0)
train_marginals.shape

(568, 5)

### Train `LogisticRegression`

In [4]:
# Simple unigram featurizer
def get_unigram_tweet_features(c):
    for w in c.tweet.text.split():
        yield w, 1

# Construct feature matrix
from snorkel.annotations import FeatureAnnotator
featurizer = FeatureAnnotator(f=get_unigram_tweet_features)

%time F_train = featurizer.apply(split=0)
F_train

Clearing existing...
Running UDF...

CPU times: user 6.1 s, sys: 114 ms, total: 6.22 s
Wall time: 11.6 s


<568x3526 sparse matrix of type '<type 'numpy.int64'>'
	with 8126 stored elements in Compressed Sparse Row format>

In [5]:
%time F_test = featurizer.apply_existing(split=1)
F_test

Clearing existing...
Running UDF...

CPU times: user 1.02 s, sys: 36.6 ms, total: 1.06 s
Wall time: 1.7 s


<64x3526 sparse matrix of type '<type 'numpy.int64'>'
	with 539 stored elements in Compressed Sparse Row format>

In [6]:
from snorkel.learning import LogisticRegression

model = LogisticRegression(cardinality=Tweet.cardinality)
model.train(F_train.todense(), train_marginals)

[LogisticRegression] Training model
[LogisticRegression] n_train=568  #epochs=25  batch size=256
[LogisticRegression] Epoch 0 (0.06s)	Average loss=1.609304
[LogisticRegression] Epoch 5 (0.16s)	Average loss=0.533771
[LogisticRegression] Epoch 10 (0.25s)	Average loss=0.269326
[LogisticRegression] Epoch 15 (0.36s)	Average loss=0.169991
[LogisticRegression] Epoch 20 (0.50s)	Average loss=0.119820
[LogisticRegression] Epoch 24 (0.62s)	Average loss=0.099092
[LogisticRegression] Training done (0.62s)


### Train `SparseLogisticRegression`

Note: Testing doesn't currently work with `LogisticRegression` above, but no real reason to use that over this...

In [7]:
from snorkel.learning import SparseLogisticRegression

model = SparseLogisticRegression(cardinality=Tweet.cardinality)
model.train(F_train, train_marginals, n_epochs=50, print_freq=10)

[SparseLogisticRegression] Training model
[SparseLogisticRegression] n_train=568  #epochs=50  batch size=256
[SparseLogisticRegression] Epoch 0 (0.06s)	Average loss=1.656555
[SparseLogisticRegression] Epoch 10 (0.40s)	Average loss=0.278991
[SparseLogisticRegression] Epoch 20 (0.79s)	Average loss=0.130961
[SparseLogisticRegression] Epoch 30 (1.41s)	Average loss=0.080214
[SparseLogisticRegression] Epoch 40 (1.75s)	Average loss=0.058399
[SparseLogisticRegression] Epoch 49 (2.07s)	Average loss=0.048295
[SparseLogisticRegression] Training done (2.07s)


In [8]:
import numpy as np
test_labels = np.load('crowdsourcing_test_labels.npy')
acc = model.score(F_test, test_labels)
print acc
assert acc > 0.6

0.6875


### Train basic LSTM

With dev set scoring during execution (note we use test set here to be simple)

In [9]:
from snorkel.learning import TextRNN
test_tweets = session.query(Tweet).filter(Tweet.split == 1).order_by(Tweet.id).all()

train_kwargs = {
    'dim':        100,
    'lr':         0.001,
    'n_epochs':   100,
    'dropout':    0.2,
    'print_freq': 5
}
lstm = TextRNN(seed=1701, cardinality=Tweet.cardinality)
lstm.train(train_tweets, train_marginals, X_dev=test_tweets, Y_dev=test_labels, **train_kwargs)

[TextRNN] Training model
[TextRNN] n_train=568  #epochs=100  batch size=256
[TextRNN] Epoch 0 (0.67s)	Average loss=1.602129	Dev Acc.=40.62
[TextRNN] Epoch 5 (4.24s)	Average loss=1.342293	Dev Acc.=34.38
[TextRNN] Epoch 10 (7.79s)	Average loss=0.730269	Dev Acc.=53.12
[TextRNN] Epoch 15 (11.03s)	Average loss=0.312139	Dev Acc.=68.75
[TextRNN] Epoch 20 (14.36s)	Average loss=0.139608	Dev Acc.=62.50
[TextRNN] Epoch 25 (17.71s)	Average loss=0.086493	Dev Acc.=65.62
[TextRNN] Epoch 30 (20.95s)	Average loss=0.080443	Dev Acc.=70.31
[TextRNN] Epoch 35 (24.26s)	Average loss=0.045322	Dev Acc.=67.19
[TextRNN] Epoch 40 (27.59s)	Average loss=0.050468	Dev Acc.=68.75
[TextRNN] Epoch 45 (30.87s)	Average loss=0.045916	Dev Acc.=70.31
[TextRNN] Epoch 50 (34.17s)	Average loss=0.031020	Dev Acc.=68.75
[TextRNN] Epoch 55 (37.50s)	Average loss=0.026304	Dev Acc.=67.19
[TextRNN] Epoch 60 (41.24s)	Average loss=0.030072	Dev Acc.=67.19
[TextRNN] Epoch 65 (45.06s)	Average loss=0.021733	Dev Acc.=65.62
[TextRNN] Epoch 70 

In [10]:
acc = lstm.score(test_tweets, test_labels)
print acc
assert acc > 0.60

0.65625


### Run `GridSearch`

In [11]:
from snorkel.learning.utils import GridSearch, RangeParameter

# Searching over learning rate
rate_param = RangeParameter('lr', 1e-4, 1e-3, step=1, log_base=10)
dim_param = RangeParameter('dim', 50, 100, step=50)

searcher = GridSearch(TextRNN, [rate_param, dim_param], train_tweets, train_marginals,
                     seed=1701, cardinality=Tweet.cardinality)

# Use test set here (just for testing)
train_kwargs = {
    'dim':        100,
    'n_epochs':   50,
    'dropout':    0.2,
    'print_freq': 10
}

lstm, run_stats = searcher.fit(test_tweets, test_labels, **train_kwargs)

[1] Testing lr = 1.00e-04, dim = 50
[TextRNN] Training model
[TextRNN] n_train=568  #epochs=50  batch size=256
[TextRNN] Epoch 0 (0.42s)	Average loss=1.610982
[TextRNN] Epoch 10 (3.61s)	Average loss=1.598022
[TextRNN] Epoch 20 (6.78s)	Average loss=1.584193
[TextRNN] Epoch 30 (9.94s)	Average loss=1.564177
[TextRNN] Epoch 40 (13.11s)	Average loss=1.532403
[TextRNN] Epoch 49 (15.96s)	Average loss=1.472197
[TextRNN] Training done (15.96s)
[TextRNN] Accuracy: 0.40625
[TextRNN] Model saved as <TextRNN_0>
[2] Testing lr = 1.00e-04, dim = 100
[TextRNN] Training model
[TextRNN] n_train=568  #epochs=50  batch size=256
[TextRNN] Epoch 0 (0.67s)	Average loss=1.610511
[TextRNN] Epoch 10 (7.84s)	Average loss=1.579410
[TextRNN] Epoch 20 (15.12s)	Average loss=1.541513
[TextRNN] Epoch 30 (22.31s)	Average loss=1.473694
[TextRNN] Epoch 40 (29.79s)	Average loss=1.301941
[TextRNN] Epoch 49 (36.58s)	Average loss=1.073563
[TextRNN] Training done (36.58s)
[TextRNN] Accuracy: 0.421875
[TextRNN] Model saved as 

In [12]:
acc = lstm.score(test_tweets, test_labels)
print acc
assert acc > 0.60

0.6875


### Reload saved model outside of `GridSearch`

In [13]:
lstm = TextRNN(seed=1701, cardinality=Tweet.cardinality)
lstm.load('TextRNN_3')
acc = lstm.score(test_tweets, test_labels)
print acc
assert acc > 0.60

[TextRNN] Loaded model <TextRNN_3>
0.6875


### Reload a model with different structure

In [14]:
lstm.load('TextRNN_0')
acc = lstm.score(test_tweets, test_labels)
print acc
assert acc < 0.60

[TextRNN] Loaded model <TextRNN_0>
0.40625


# Testing `GenerativeModel`

### Testing `GridSearch` on crowdsourcing data

In [15]:
from snorkel.annotations import load_label_matrix
import numpy as np

L_train = load_label_matrix(session, split=0)
train_labels = np.load('crowdsourcing_train_labels.npy')

In [16]:
from snorkel.learning import GenerativeModel
from snorkel.learning.utils import ListParameter

# Searching over learning rate
n_epochs = ListParameter('epochs', [0, 10, 30])
searcher = GridSearch(GenerativeModel, [n_epochs], L_train)

# Use training set labels here (just for testing)
gen_model, run_stats = searcher.fit(L_train, train_labels)

[1] Testing epochs = 0
Inferred cardinality: 5
[GenerativeModel] Accuracy: 0.984154929577
[GenerativeModel] Model saved as <GenerativeModel_0>.
[2] Testing epochs = 10
Inferred cardinality: 5
[GenerativeModel] Accuracy: 0.991197183099
[GenerativeModel] Model saved as <GenerativeModel_1>.
[3] Testing epochs = 30
Inferred cardinality: 5
[GenerativeModel] Accuracy: 0.996478873239
[GenerativeModel] Model saved as <GenerativeModel_2>.
[GenerativeModel] Model <GenerativeModel_2> loaded.


In [17]:
acc = gen_model.score(L_train, train_labels)
print acc
assert acc > 0.97

0.996478873239
