# Searching for Model Hyperparameters

In this notebook, we'll demonstrate how to automatically select hyperparameters based on a labeled development set.

### Generating Some Data

We'll generate some data from a known model of noisy labels in which two pairs of labeling functions are correlated.

In [1]:
from snorkel.learning import GenerativeModelWeights
from snorkel.learning.structure import generate_label_matrix

weights = GenerativeModelWeights(10)
for i in range(10):
    weights.lf_accuracy[i] = 2.5
weights.dep_similar[0, 1] = 0.25
weights.dep_similar[2, 3] = 0.25

L_gold_train, L_train = generate_label_matrix(weights, 10000)
L_gold_dev, L_dev = generate_label_matrix(weights, 1000)

### Generative Model Hyperparameters

In [4]:
from snorkel.learning import GenerativeModel
from snorkel.learning import RandomSearch, ListParameter, RangeParameter

gen_model = GenerativeModel()

step_size_param = RangeParameter('step_size', 1e-6, 1e-2, step=1, log_base=10)
decay_param     = ListParameter('decay', [1.0, 0.95, 0.9])
epochs_param    = ListParameter('epochs', [20, 50, 100])

searcher = RandomSearch(None, gen_model, L_train, None, [step_size_param, decay_param, epochs_param], n=20)

Initialized RandomSearch search of size 20. Search space size = 45.


In [5]:
searcher.fit(L_dev, L_gold_dev)

[1] Testing step_size = 1.00e-04, decay = 9.00e-01, epochs = 5.00e+01
Inferred cardinality: 2
[GenerativeModel] F1 Score: 1.0


IOError: [Errno 2] No such file or directory: 'checkpoints/GenerativeModel_0.weights.pkl'

Now we can apply the best learned model.

In [None]:
train_marginals = gen_model.marginals(L_train)

### Discriminative Model Hyperparameters

Hyperparameter search can also be used for the discriminative model. Below is example code:

```
from snorkel.learning.disc_models.rnn import reRNN
lstm = reRNN(seed=1701, n_threads=None)

rate_param    = RangeParameter('lr', 1e-6, 1e-2, step=1, log_base=10)
dropout_param = ListParameter('dropout', [0.0, 0.5])

# We now add a session and probabilistic labels, as well as pass in the candidates
# instead of the label matrix
searcher = RandomSearch(session, lstm, train_candidates, train_marginals, [rate_param, dropout_param], n=20)

np.random.seed(1701)
# We now pass in the development candidates and the gold development labels
# Any arguments that should be passed to the training method for every trial can also be specified
searcher.fit(dev_candidates, L_gold_dev, n_epochs=50, rebalance=0.5, print_freq=25)
```