<img align="left" src="imgs/logo.jpg" width="50px" style="margin-right:10px">

# Snorkel Workshop: Extracting Spouse Relations <br> from the News
## Part 2: Writing  Labeling Functions

__TODO: change explanation to be in terms of spouse pairs and sentence instead of candidates__

In Snorkel, our primary interface through which we provide training signal to the end extraction model we are training is by writing **labeling functions (LFs)** (as opposed to hand-labeling massive training sets).  We'll go through some examples for our spouse extraction task below.

A labeling function isn't anything special. It's just a Python function that accepts a `Candidate` as the input argument and returns `1` if it says the `Candidate` should be marked as true, `-1` if it says the `Candidate` should be marked as false, and `0` if it doesn't know how to vote and abstains. In practice, many labeling functions are unipolar: it labels only `1`s and `0`s, or it labels only `-1`s and `0`s.

Recall that our goal is to ultimately train a high-performance classification model that predicts which of our `Candidate`s are true mentions of spouse relations.  It turns out that we can do this by writing potentially low-quality labeling functions!

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import re
import sys
import numpy as np
import pandas as pd

# I. Background

## A. Preprocessing the Database

In a real application, there is a lot of data preparation, parsing, and database loading that needs to be completed before we dive into writing labeling functions. Here we've pre-generated a database instance for you. All _candidates_ and _gold labels_ (i.e., human-generated labels) are queried from this database for use in the the tutorial. 

## B. Using a _Development Set_ of Human-labeled Data

In our setting, we will use the phrase _development set_ to refer to a set of examples (here, a subset of our training set) which we label by hand and use to help us develop and refine labeling functions.  Unlike the _test set_, which we do not look at and use for final evaluation, we can inspect the development set while writing labeling functions. This is a list of `{-1,1}` labels.

In [2]:
#TODO: make it the same pickle.
dev_data = pd.read_pickle("dev_data.pkl")
dev_labels = np.load("dev_labels.npy")

## D. Labeling Function Metrics

### 1. Coverage
One simple metric we can compute quickly is our _coverage_, the number of candidates labeled by our LF, on our training set (or any other set).

### 2. Precision / Recall / F1
If we have gold labeled data, we can also compute standard precision, recall, and F1 metrics for the output of a single labeling function. These metrics are computed over 4 _error buckets_: _True Positives_ (tp), _False Positives_ (fp), _True Negatives_ (tn), and _False Negatives_ (fn).

\begin{equation*}
precision = \frac{tp}{(tp + fp)}
\end{equation*}

\begin{equation*}
recall = \frac{tp}{(tp + fn)}
\end{equation*}

\begin{equation*}
F1 = 2 \cdot \frac{ (precision \cdot recall)}{(precision + recall)}
\end{equation*}

# II. Labeling Functions

## A. Pattern Matching Labeling Functions

One powerful form of labeling function design is defining sets of keywords or regular expressions that, as a human labeler, you know are correlated with the true label. In the terminology of [Bayesian inference](https://en.wikipedia.org/wiki/Statistical_inference#Bayesian_inference), this can be thought of as defining a [_prior_](https://en.wikipedia.org/wiki/Prior_probability) over your word features. 

For example, we could define a dictionary of terms that occur between person names in a candidate. One simple dictionary of terms indicating a true relation could be:
    
    spouses = {'husband', 'wife'}
 
We can then write a labeling function that checks for a match with these terms in the text that occurs between person names.

    @labeling_function(resources=dict(spouses=['husband','wife']))
    def LF_husband_wife(x: DataPoint, spouses: List[str]) -> int:
        for word in spouses:
            return 1 if word in x.text_between.split(' ') else 0
        return 0
        
The idea is that we can easily create dictionaries that encode themes or categories descibing all kinds of relationships between 2 people and then use these objects to _weakly supervise_ our classification task.

    other_relationship = {'boyfriend', 'girlfriend'}
    
**IMPORTANT** Good labeling functions manage a trade-off between high coverage and high precision. When constructing your dictionaries, think about building larger, noiser sets of terms instead of relying on 1 or 2 keywords. Sometimes a single word can be very predictive (e.g., `ex-wife`) but it's almost always better to define something more general, such as a regular expression pattern capturing _any_ string with the `ex-` prefix. 

In [22]:
#TODO: WHY DO I NEED THIS?!
sys.path.append('../..')

In [41]:
dict(spouses)

ValueError: dictionary update sequence element #0 has length 7; 2 is required

In [84]:
from typing import List

from snorkel.labeling.apply import PandasLFApplier
from snorkel.labeling.lf import labeling_function
from snorkel.types import DataPoint

spouses = ['spouse', 'wife', 'husband', 'ex-wife', 'ex-husband']
@labeling_function(resources=dict(spouses=spouses))
def LF_husband_wife(x: DataPoint, spouses: List[str]) -> int:
    return 1 if len(set(spouses).intersection(set(x.between_tokens))) > 0 else 0

@labeling_function(resources=dict(spouses=spouses))
def LF_husband_wife_left_window(x: DataPoint, spouses: List[str]) -> int:
    if len(set(spouses).intersection(set(x.person1_left_tokens))) > 0:
        return 1
    elif len(set(spouses).intersection(set(x.person2_left_tokens))) > 0:
        return 1
    else:
        return 0

@labeling_function()
def LF_same_last_name(x: DataPoint) -> int:
    p1 = x.sentence.split(' ')[x.person1_word_range[0]:x.person1_word_range[1]+1]
    p2 = x.sentence.split(' ')[x.person2_word_range[0]:x.person2_word_range[1]+1]
    p1n = p1[-1] if len(p1) > 0 else None
    p2n = p2[-1] if len(p2) > 0 else None
    
    if p1n and p2n and p1n == p2n:
        if ' '.join(p1) != ' '.join(p2):
            return 1
    return 0

@labeling_function()
def LF_and_married(x: DataPoint) -> int:
    return 1 if 'and' in x.between_tokens and 'married' in x.person2_right_tokens else 0    


family = ['father', 'mother', 'sister', 'brother', 'son', 'daughter',
              'grandfather', 'grandmother', 'uncle', 'aunt', 'cousin']
family = family+[f + '-in-law' for f in family]

@labeling_function(resources=dict(family=family))
def LF_familial_relationship(x: DataPoint, family: List[str]) -> int:
    return 1 if len(set(family).intersection(set(x.between_tokens))) > 0 else 0  


@labeling_function(resources=dict(family=family))
def LF_family_left_window(x: DataPoint, family: List[str]) -> int:
    if len(set(family).intersection(set(x.person1_left_tokens))) > 0:
        return -1
    elif len(set(family).intersection(set(x.person2_left_tokens))) > 0:
        return -1
    else:
        return 0

other = {'boyfriend', 'girlfriend' 'boss', 'employee', 'secretary', 'co-worker'}
@labeling_function(resources=dict(other=other))
def LF_other_relationship(x: DataPoint, other: List[str]) -> int:
    for word in other:
        return -1 if word in x.text_between.split(' ') else 0
    return 0

In [85]:
applier = PandasLFApplier([LF_husband_wife,
                           LF_husband_wife_left_window,
                           LF_same_last_name,
                           LF_and_married, 
                           LF_familial_relationship,
                           LF_family_left_window,
                           LF_other_relationship])
L = applier.apply(dev_data)



















  0%|          | 0/2811 [00:00<?, ?it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A

















 15%|█▍        | 416/2811 [00:00<00:00, 4152.73it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A

















 31%|███       | 858/2811 [00:00<00:00, 4227.55it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A

















 46%|████▌     | 1288/2811 [00:00<00:00, 4248.40it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A

















 60%|██████    | 1697/2811 [00:00<00:00, 4199.18it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A

















 76%|███████▌  | 2129/2811 [00:00<00:00, 4234.43it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A

















 91%|█████████▏| 2569/2811 [00:00<00:00, 4280.59it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A

















100%|██████████| 2811/2811 [00:00<00:00, 4263.11it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A

#### Viewing Error Buckets
If we have gold labeled data, we can evaluate formal metrics. It's useful to view specific errors for a given LF input in the `SentenceNgramViewer`.

Below, we'll compute our empirical scores using human-labeled development set data and then look at any false positive matches by our `LF_marriage` LF. We can see below from our scores that this LF isn't very accurate -- only 36% precision!

In [56]:
#TODO: pretty printer for LF stats

from snorkel.model.metrics import coverage_score, f1_score

print("LF_husband_wife coverage: ", coverage_score(dev_labels,L[:,0]))
print("LF_husband_wife F1 score: ", f1_score(dev_labels,L[:,0]))
print('')
print("LF_and_married coverage: ", coverage_score(dev_labels,L[:,1]))
print("LF_and_married F1 score: ", f1_score(dev_labels,L[:,1]))

LF_husband_wife coverage:  0.09178228388473852
LF_husband_wife F1 score:  0.4196428571428571

LF_and_married coverage:  0.0
LF_and_married F1 score:  0


## B. Distant Supervision Labeling Functions

In addition to using factories that encode pattern matching heuristics, we can also write labeling functions that _distantly supervise_ examples. Here, we'll load in a list of known spouse pairs and check to see if the candidate pair matches one of these.

**DBpedia**
http://wiki.dbpedia.org/
Out database of known spouses comes from DBpedia, which is a community-driven resource similar to Wikipedia but for curating structured data. We'll use a preprocessed snapshot as our knowledge base for all labeling function development.

We can look at some of the example entries from DBPedia and use them in a simple distant supervision labeling function.

In [59]:
import pickle 

with open('dbpedia.pkl', 'rb') as f:
     known_spouses = pickle.load(f)
        
list(known_spouses)[0:5]

[('Lady Anne Somerset', 'Thomas Percy'),
 ('Lau Lauritzen Jr.', 'Lisbeth Movin'),
 ('John Alexander', 'Robinson Thwaites'),
 ('Callie Khouri', 'T-Bone Burnett'),
 ('Anna Maria of Hesse-Kassel', 'Louis II Count of Nassau-Weilburg')]

In [76]:
@labeling_function(resources=dict(known_spouses=known_spouses))
def LF_distant_supervision(x: DataPoint, known_spouses: List[str]) -> int:
    p1 = x.sentence.split(' ')[x.person1_word_range[0]:x.person1_word_range[1]+1]
    p2 = x.sentence.split(' ')[x.person2_word_range[0]:x.person2_word_range[1]+1]
    p1, p2 = ' '.join(p1), ' '.join(p2)

    return 1 if (p1, p2) in known_spouses or (p2, p1) in known_spouses else 0


# Helper function to get last name
def last_name(s):
    name_parts = s.split(' ')
    return name_parts[-1] if len(name_parts) > 1 else None 

# Last name pairs for known spouses
last_names = set([(last_name(x), last_name(y)) for x, y in known_spouses if last_name(x) and last_name(y)])

@labeling_function(resources=dict(last_names=last_names))
def LF_distant_supervision_last_names(x: DataPoint, last_names: List[str]) -> int:
    p1 = x.sentence.split(' ')[x.person1_word_range[0]:x.person1_word_range[1]+1]
    p2 = x.sentence.split(' ')[x.person2_word_range[0]:x.person2_word_range[1]+1]
    p1n = p1[-1] if len(p1) > 0 else None
    p2n = p2[-1] if len(p2) > 0 else None
    
    return 1 if (p1 != p2) and ((p1n, p2n) in last_names or (p2n, p1n) in last_names) else 0 

In [79]:
applier = PandasLFApplier([LF_husband_wife,
                           LF_husband_wife_left_window,
                           LF_same_last_name,
                           LF_and_married, 
                           LF_familial_relationship,
                           LF_family_left_window,
                           LF_other_relationship,
                           LF_distant_supervision,
                           LF_distant_supervision_last_names])
L = applier.apply(dev_data)


















  0%|          | 0/2811 [00:00<?, ?it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A
















 14%|█▍        | 387/2811 [00:00<00:00, 3859.61it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A
















 28%|██▊       | 795/2811 [00:00<00:00, 3920.92it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A
















 43%|████▎     | 1202/2811 [00:00<00:00, 3962.53it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A
















 57%|█████▋    | 1610/2811 [00:00<00:00, 3996.89it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A
















 72%|███████▏  | 2010/2811 [00:00<00:00, 3996.00it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A
















 86%|████████▌ | 2410/2811 [00:00<00:00, 3994.56it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A
















100%|█████████▉| 2806/2811 [00:00<00:00, 3982.64it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A
















100%|██████████| 2811/2811 [00

## C. Writing Custom Labeling Functions

The strength of LFs is that you can write any arbitrary function and use it to supervise a classification task. This approach can combine many of the same strategies discussed above or encode other information. 

For example, we observe that when mentions of person names occur far apart in a sentence, this is a good indicator that the candidate's label is False.

In [78]:
@labeling_function()
def LF_new(x: DataPoint) -> int:
    return 0

applier = PandasLFApplier([LF_husband_wife, LF_and_married,LF_new])
L = applier.apply(dev_data)


















  0%|          | 0/2811 [00:00<?, ?it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A
















 68%|██████▊   | 1902/2811 [00:00<00:00, 19012.77it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A
















100%|██████████| 2811/2811 [00:00<00:00, 17261.07it/s][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A

In [86]:
np.save('dev_L.npy', L)

__TODO: make lf_stats/scorer function to print nice statistics__

### 3. Label Matrix Empirical Accuracies

If we have a small set of human-labeled data

In [17]:
L_dev.lf_stats(session, labels=L_gold_dev.toarray().ravel())

Unnamed: 0,j,Coverage,Overlaps,Conflicts,TP,FP,FN,TN,Empirical Acc.
LF_TERMS_marriage_[between|words]_TRUE,0,0.063323,0.063323,0.01174,61,109,0,0,0.358824
LF_TERMS_other_relationship_[left|words|window=1]_FALSE,1,0.003913,0.001067,0.001067,0,0,0,11,1.0
LF_REGEX_exes_[between|words]_FALSE,2,0.004269,0.0,0.0,0,0,5,7,0.583333
LF_DIST_SUPERVISION_dbpedia_TRUE,3,0.001067,0.000356,0.0,2,1,0,0,0.666667
LF_too_far_apart,4,0.068659,0.011028,0.011028,0,0,9,177,0.951613
LF_marriage_and_too_far_apart,5,0.052295,0.052295,0.000711,58,82,0,0,0.414286


## 3. Iterating on Labeling Function Design

When writing labeling functions, you will want to iterate on the process outlined above several times. You should focus on tuning individual LFs, based on emprical accuracy metrics, and adding new LFs to improve coverage. 