# Intro. to Snorkel: Extracting Spouse Relations from the News

## Part III: Creating or Loading Evaluation Labels

In [1]:
%load_ext autoreload
%autoreload 2
import os

# TO USE A DATABASE OTHER THAN SQLITE, USE THIS LINE
# Note that this is necessary for parallel execution amongst other things...
# os.environ['SNORKELDB'] = 'postgres:///snorkel-intro'

from snorkel import SnorkelSession
session = SnorkelSession()

## Part III(a): Creating Evaluation Labels in the `Viewer`

We repeat our definition of the `Spouse` `Candidate` subclass from Part II.

In [2]:
from snorkel.models import candidate_subclass
Spouse = candidate_subclass('Spouse', ['person1', 'person2'])

In [3]:
dev_cands = session.query(Spouse).filter(Spouse.split == 1).all()
len(dev_cands)

223

In [4]:
test_cands = session.query(Spouse).filter(Spouse.split == 2).all()
len(test_cands)

279

## Labeling by hand in the `Viewer`

In [5]:
from snorkel.viewer import SentenceNgramViewer

# NOTE: This if-then statement is only to avoid opening the viewer during automated testing of this notebook
# You should ignore this!
import os
if 'CI' not in os.environ:
    sv = SentenceNgramViewer(dev_cands, session)
else:
    sv = None

<IPython.core.display.Javascript object>

We now open the Viewer.  You can mark each `Candidate` as true or false. Try it!  These labels are automatically saved in the database backend, and can be accessed using the annotator's name as the AnnotationKey.

In [6]:
sv

## Part III(b): Loading External Evaluation Labels

We have already annotated the dev and test set for this tutorial, and now use it as an excuse to go through a basic procedure of loading in _externally annotated_ labels.

Snorkel stores all labels that are manually annotated in a **stable** format (called `StableLabels`), which is somewhat independent from the rest of Snorkel's data model, does not get deleted when you delete the candidates, corpus, or any other objects, and can be recovered even if the rest of the data changes or is deleted.

If we have external labels from another source, we can also load them in via the `stable_label` table:

In [7]:
import pandas as pd
from snorkel.models import StableLabel

gold_labels = pd.read_csv('data/gold_labels.tsv', sep="\t")
name = 'gold'
for index, row in gold_labels.iterrows():    
    # We check if the label already exists, in case this cell was already executed
    context_stable_ids = "~~".join([row['person1'], row['person2']])
    query = session.query(StableLabel).filter(StableLabel.context_stable_ids == context_stable_ids)
    query = query.filter(StableLabel.annotator_name == name)
    if query.count() == 0:
        session.add(StableLabel(context_stable_ids=context_stable_ids, annotator_name=name, value=row['label']))
        
    # Because it's a symmetric relation, load both directions...
    context_stable_ids = "~~".join([row['person2'], row['person1']])
    query = session.query(StableLabel).filter(StableLabel.context_stable_ids == context_stable_ids)
    query = query.filter(StableLabel.annotator_name == name)
    if query.count() == 0:
        session.add(StableLabel(context_stable_ids=context_stable_ids, annotator_name=name, value=row['label']))

session.commit()

Then, we use a helper function to restore `Labels` from the `StableLabels` we just loaded

_Note that we "miss" a few due to parsing discrepancies with original candidates labeled; specifically, you should be able to reload 220/223 on the dev set and 273/279 on the test set._

In [8]:
from snorkel.db_helpers import reload_annotator_labels
reload_annotator_labels(session, Spouse, 'gold', split=1, filter_label_split=False)
reload_annotator_labels(session, Spouse, 'gold', split=2, filter_label_split=False)

AnnotatorLabels created: 220
AnnotatorLabels created: 273


If you want to confirm that these labels are loaded, you can reload the `SentenceNgramViewer` with `annotator_name=gold` to see them! Next, in Part IV, we will build a model to predict these labels using data programming.