# Detecting spouse mentions in sentences

In this tutorial, we will see how Snorkel can be used for Information Extraction. We will walk through an example text classification task for information extraction, where we use labeling functions involving keywords and distant supervision.
### Classification Task
<img src="imgs/sentence.jpg" width="700px;">

We want to classify each __candidate__ or pair of people mentioned in a sentence, as being married at some point or not.

In the above example, our candidate represents the possible relation `(Barack Obama, Michelle Obama)`. As readers, we know this mention is true due to external knowledge and the keyword of `wedding` occuring later in the sentence.
We begin with some basic setup and data downloading.


In [1]:
%matplotlib inline

import os
import pickle
import numpy as np

if os.path.basename(os.getcwd()) == "snorkel-tutorials":
    os.chdir("spouse")

from utils import load_data

((df_dev, Y_dev), df_train, (df_test, Y_test)) = load_data()

**Input Data:** `df_dev`, `df_train`, and `df_test` are Pandas DataFrame objects, where each row represents a particular __candidate__. For our problem, a candidate consists of a sentence, and two people mentioned in the sentence. The DataFrames contain the fields `sentence`, which refers to the sentence of the candidate, `tokens`, the tokenized form of the sentence, and `person1_word_idx` and `person2_word_idx`, which represent `[start, end]` indices in the tokens at which the first and second person's name appear, respectively.

We also have certain **preprocessed fields**, that we discuss a few cells below.

In [2]:
import pandas as pd

# Don't truncate text fields in the display
pd.set_option("display.max_colwidth", 0)

df_dev.head()

Unnamed: 0,person1_word_idx,person2_word_idx,sentence,tokens,person1_right_tokens,person2_right_tokens,between_tokens
0,"(1, 1)","(22, 24)","The Richards are half-sisters to Kathy Hilton, the mother of socialite Paris Hilton and spouse of luxury hotel magnate Richard Howard Hilton.","[The, Richards, are, half, -, sisters, to, Kathy, Hilton, ,, the, mother, of, socialite, Paris, Hilton, and, spouse, of, luxury, hotel, magnate, Richard, Howard, Hilton, ., ]","[are, half, -, sisters]","[., ]","[are, half, -, sisters, to, Kathy, Hilton, ,, the, mother, of, socialite, Paris, Hilton, and, spouse, of, luxury, hotel, magnate]"
1,"(1, 1)","(7, 8)","The Richards are half-sisters to Kathy Hilton, the mother of socialite Paris Hilton and spouse of luxury hotel magnate Richard Howard Hilton.","[The, Richards, are, half, -, sisters, to, Kathy, Hilton, ,, the, mother, of, socialite, Paris, Hilton, and, spouse, of, luxury, hotel, magnate, Richard, Howard, Hilton, ., ]","[are, half, -, sisters]","[,, the, mother, of]","[are, half, -, sisters, to]"
2,"(7, 8)","(22, 24)","The Richards are half-sisters to Kathy Hilton, the mother of socialite Paris Hilton and spouse of luxury hotel magnate Richard Howard Hilton.","[The, Richards, are, half, -, sisters, to, Kathy, Hilton, ,, the, mother, of, socialite, Paris, Hilton, and, spouse, of, luxury, hotel, magnate, Richard, Howard, Hilton, ., ]","[,, the, mother, of]","[., ]","[,, the, mother, of, socialite, Paris, Hilton, and, spouse, of, luxury, hotel, magnate]"
3,"(6, 6)","(20, 21)","Prior to both his guests, Colbert's monologue - parts of which he did sitting down - ripped into Donald Trump and his oft-mocked policy of building a wall at the US-Mexico border and not eating Oreos anymore.","[Prior, to, both, his, guests, ,, Colbert, s, monologue, -, parts, of, which, he, did, sitting, down, -, ripped, into, Donald, Trump, and, his, oft, -, mocked, policy, of, building, a, wall, at, the, US, -, Mexico, border, and, not, eating, Oreos, anymore, ., ]","[s, monologue, -, parts]","[and, his, oft, -]","[s, monologue, -, parts, of, which, he, did, sitting, down, -, ripped, into]"
4,"(2, 2)","(4, 5)","People reported Williams and Ven Veen tied the knot Saturday at Brush Creek Ranch in Saratoga, Wyoming, in front of about 200 guests.","[People, reported, Williams, and, Ven, Veen, tied, the, knot, Saturday, at, Brush, Creek, Ranch, in, Saratoga, ,, Wyoming, ,, in, front, of, about, 200, guests, .]","[and, Ven, Veen, tied]","[tied, the, knot, Saturday]",[and]


Let's look at a candidate in the development set:

In [3]:
from preprocessors import get_person_text

candidate = df_dev.loc[2]
person_names = get_person_text(candidate).person_names

print("Sentence: ", candidate["sentence"])
print("Person 1: ", person_names[0])
print("Person 2: ", person_names[1])

Sentence:  The Richards are half-sisters to Kathy Hilton, the mother of socialite Paris Hilton and spouse of luxury hotel magnate Richard Howard Hilton.   
Person 1:  Kathy Hilton
Person 2:  Richard Howard Hilton


### Preprocessing the Data

In a real application, there is a lot of data preparation, parsing, and database loading that needs to be completed before we generate candidates and dive into writing labeling functions. Here we've pre-generated candidates in a pandas DataFrame object per split (train,dev,test).

### Labeling Function Helpers

When writing labeling functions, there are several functions you will use over and over again. In the case of text relation extraction as with this task, common functions include those for fetching text between mentions of the two people in a candidate, examing word windows around person mentions, and so on. We will wrap these functions as `preprocessors`.

In [4]:
from snorkel.preprocess import preprocessor


@preprocessor()
def get_text_between(cand):
    """
    Returns the text between the two person mentions in the sentence for a candidate
    """
    start = cand.person1_word_idx[1] + 1
    end = cand.person2_word_idx[0]
    cand.text_between = " ".join(cand.tokens[start:end])
    return cand

### Candidate PreProcessors

For the purposes of the tutorial, we have three fields (`between_tokens`, `person1_right_tokens`, `person2_right_tokens`) preprocessed in the data, which can be used when creating labeling functions. We also provide the following set of `preprocessor`s for this task in `preprocessors.py`, along with the fields these populate.
* `get_person_text(cand)`: `person_names`
* `get_person_lastnames(cand)`: `person_lastnames`
* `get_left_tokens(cand)`: `person1_left_tokens`, `person2_left_tokens`

In [5]:
from preprocessors import get_left_tokens, get_person_last_names

POSITIVE = 1
NEGATIVE = 0
ABSTAIN = -1

In [6]:
from snorkel.labeling.lf import labeling_function

# Check for the `spouse` words appearing between the person mentions
spouses = {"spouse", "wife", "husband", "ex-wife", "ex-husband"}


@labeling_function(resources=dict(spouses=spouses))
def lf_husband_wife(x, spouses):
    return POSITIVE if len(spouses.intersection(set(x.between_tokens))) > 0 else ABSTAIN

In [7]:
# Check for the `spouse` words appearing to the left of the person mentions
@labeling_function(resources=dict(spouses=spouses), pre=[get_left_tokens])
def lf_husband_wife_left_window(x, spouses):
    if len(set(spouses).intersection(set(x.person1_left_tokens))) > 0:
        return POSITIVE
    elif len(set(spouses).intersection(set(x.person2_left_tokens))) > 0:
        return POSITIVE
    else:
        return ABSTAIN

In [8]:
# Check for the person mentions having the same last name
@labeling_function(pre=[get_person_last_names])
def lf_same_last_name(x):
    p1_ln, p2_ln = x.person_lastnames

    if p1_ln and p2_ln and p1_ln == p2_ln:
        return POSITIVE
    return ABSTAIN

In [9]:
# Check for the word `married` between person mentions
@labeling_function()
def lf_married(x):
    return POSITIVE if "married" in x.between_tokens else ABSTAIN

In [10]:
# Check for words that refer to `family` relationships between and to the left of the person mentions
family = {
    "father",
    "mother",
    "sister",
    "brother",
    "son",
    "daughter",
    "grandfather",
    "grandmother",
    "uncle",
    "aunt",
    "cousin",
}
family = family.union({f + "-in-law" for f in family})


@labeling_function(resources=dict(family=family))
def lf_familial_relationship(x, family):
    return NEGATIVE if len(family.intersection(set(x.between_tokens))) > 0 else ABSTAIN


@labeling_function(resources=dict(family=family), pre=[get_left_tokens])
def lf_family_left_window(x, family):
    if len(set(family).intersection(set(x.person1_left_tokens))) > 0:
        return NEGATIVE
    elif len(set(family).intersection(set(x.person2_left_tokens))) > 0:
        return NEGATIVE
    else:
        return ABSTAIN

In [11]:
# Check for `other` relationship words between person mentions
other = {"boyfriend", "girlfriend", "boss", "employee", "secretary", "co-worker"}


@labeling_function(resources=dict(other=other))
def lf_other_relationship(x, other):
    return NEGATIVE if len(other.intersection(set(x.between_tokens))) > 0 else ABSTAIN

### Distant Supervision Labeling Functions

In addition to using factories that encode pattern matching heuristics, we can also write labeling functions that _distantly supervise_ examples. Here, we'll load in a list of known spouse pairs and check to see if the pair of persons in a candidate matches one of these.

**DBpedia**
http://wiki.dbpedia.org/
Our database of known spouses comes from DBpedia, which is a community-driven resource similar to Wikipedia but for curating structured data. We'll use a preprocessed snapshot as our knowledge base for all labeling function development.

We can look at some of the example entries from DBPedia and use them in a simple distant supervision labeling function.

Make sure `dbpedia.pkl` is in the `spouse/data` directory.

In [12]:
with open("data/dbpedia.pkl", "rb") as f:
    known_spouses = pickle.load(f)

list(known_spouses)[0:5]

[('Cynthia Lennon', 'John Lennon'),
 ('Carmel Cryan', 'Roy Kinnear'),
 ('Catherine of Valois', 'Charles the Bold'),
 ('Gigi Rice', 'Ted McGinley'),
 ('Martha Washington', 'Presidency of George Washington')]

In [13]:
@labeling_function(resources=dict(known_spouses=known_spouses), pre=[get_person_text])
def lf_distant_supervision(x, known_spouses):
    p1, p2 = x.person_names
    if (p1, p2) in known_spouses or (p2, p1) in known_spouses:
        return POSITIVE
    else:
        return ABSTAIN

In [14]:
from preprocessors import last_name

# Last name pairs for known spouses
last_names = set(
    [
        (last_name(x), last_name(y))
        for x, y in known_spouses
        if last_name(x) and last_name(y)
    ]
)


@labeling_function(resources=dict(last_names=last_names), pre=[get_person_last_names])
def lf_distant_supervision_last_names(x, last_names):
    p1_ln, p2_ln = x.person_lastnames

    return (
        POSITIVE
        if (p1_ln != p2_ln)
        and ((p1_ln, p2_ln) in last_names or (p2_ln, p1_ln) in last_names)
        else ABSTAIN
    )

#### Apply Labeling Functions to the Data
We create a list of labeling functions and apply them to the data

In [15]:
from snorkel.labeling.apply import PandasLFApplier

lfs = [
    lf_husband_wife,
    lf_husband_wife_left_window,
    lf_same_last_name,
    lf_married,
    lf_familial_relationship,
    lf_family_left_window,
    lf_other_relationship,
    lf_distant_supervision,
    lf_distant_supervision_last_names,
]
applier = PandasLFApplier(lfs)

In [16]:
from snorkel.labeling.analysis import LFAnalysis

dev_L = applier.apply(df_dev)
train_L = applier.apply(df_train)

LFAnalysis(dev_L, lfs).lf_summary(Y_dev)

  0%|          | 0/2811 [00:00<?, ?it/s]

  1%|▏         | 38/2811 [00:00<00:07, 378.97it/s]

  3%|▎         | 77/2811 [00:00<00:07, 380.81it/s]

  4%|▍         | 116/2811 [00:00<00:07, 383.10it/s]

  6%|▌         | 155/2811 [00:00<00:06, 383.29it/s]

  7%|▋         | 193/2811 [00:00<00:06, 381.51it/s]

  8%|▊         | 231/2811 [00:00<00:06, 379.51it/s]

 10%|▉         | 270/2811 [00:00<00:06, 380.32it/s]

 11%|█         | 309/2811 [00:00<00:06, 381.55it/s]

 12%|█▏        | 346/2811 [00:00<00:06, 377.89it/s]

 14%|█▎        | 384/2811 [00:01<00:06, 378.02it/s]

 15%|█▌        | 422/2811 [00:01<00:06, 377.30it/s]

 16%|█▋        | 461/2811 [00:01<00:06, 379.73it/s]

 18%|█▊        | 500/2811 [00:01<00:06, 382.13it/s]

 19%|█▉        | 539/2811 [00:01<00:05, 381.96it/s]

 21%|██        | 578/2811 [00:01<00:05, 382.61it/s]

 22%|██▏       | 618/2811 [00:01<00:05, 384.93it/s]

 23%|██▎       | 657/2811 [00:01<00:05, 384.79it/s]

 25%|██▍       | 696/2811 [00:01<00:05, 383.85it/s]

 26%|██▌       | 735/2811 [00:01<00:05, 385.09it/s]

 28%|██▊       | 774/2811 [00:02<00:05, 385.40it/s]

 29%|██▉       | 813/2811 [00:02<00:05, 384.08it/s]

 30%|███       | 852/2811 [00:02<00:05, 382.97it/s]

 32%|███▏      | 891/2811 [00:02<00:04, 384.38it/s]

 33%|███▎      | 930/2811 [00:02<00:04, 384.54it/s]

 34%|███▍      | 969/2811 [00:02<00:04, 385.82it/s]

 36%|███▌      | 1008/2811 [00:02<00:04, 382.81it/s]

 37%|███▋      | 1047/2811 [00:02<00:04, 382.17it/s]

 39%|███▊      | 1086/2811 [00:02<00:04, 380.87it/s]

 40%|████      | 1125/2811 [00:02<00:04, 382.19it/s]

 41%|████▏     | 1164/2811 [00:03<00:04, 382.92it/s]

 43%|████▎     | 1203/2811 [00:03<00:04, 383.21it/s]

 44%|████▍     | 1242/2811 [00:03<00:04, 382.29it/s]

 46%|████▌     | 1281/2811 [00:03<00:03, 382.91it/s]

 47%|████▋     | 1320/2811 [00:03<00:03, 383.52it/s]

 48%|████▊     | 1359/2811 [00:03<00:03, 385.12it/s]

 50%|████▉     | 1398/2811 [00:03<00:03, 385.77it/s]

 51%|█████     | 1437/2811 [00:03<00:03, 384.61it/s]

 53%|█████▎    | 1476/2811 [00:03<00:03, 384.32it/s]

 54%|█████▍    | 1515/2811 [00:03<00:03, 383.66it/s]

 55%|█████▌    | 1554/2811 [00:04<00:03, 383.60it/s]

 57%|█████▋    | 1593/2811 [00:04<00:03, 383.09it/s]

 58%|█████▊    | 1632/2811 [00:04<00:03, 381.37it/s]

 59%|█████▉    | 1671/2811 [00:04<00:02, 382.31it/s]

 61%|██████    | 1710/2811 [00:04<00:02, 380.55it/s]

 62%|██████▏   | 1749/2811 [00:04<00:02, 382.15it/s]

 64%|██████▎   | 1788/2811 [00:04<00:02, 381.92it/s]

 65%|██████▍   | 1827/2811 [00:04<00:02, 379.56it/s]

 66%|██████▋   | 1866/2811 [00:04<00:02, 381.72it/s]

 68%|██████▊   | 1905/2811 [00:04<00:02, 382.42it/s]

 69%|██████▉   | 1944/2811 [00:05<00:02, 383.52it/s]

 71%|███████   | 1983/2811 [00:05<00:02, 383.83it/s]

 72%|███████▏  | 2022/2811 [00:05<00:02, 381.27it/s]

 73%|███████▎  | 2061/2811 [00:05<00:01, 378.28it/s]

 75%|███████▍  | 2100/2811 [00:05<00:01, 379.17it/s]

 76%|███████▌  | 2139/2811 [00:05<00:01, 380.79it/s]

 77%|███████▋  | 2178/2811 [00:05<00:01, 380.55it/s]

 79%|███████▉  | 2217/2811 [00:05<00:01, 381.49it/s]

 80%|████████  | 2256/2811 [00:05<00:01, 382.88it/s]

 82%|████████▏ | 2295/2811 [00:06<00:01, 381.98it/s]

 83%|████████▎ | 2334/2811 [00:06<00:01, 382.08it/s]

 84%|████████▍ | 2373/2811 [00:06<00:01, 383.05it/s]

 86%|████████▌ | 2412/2811 [00:06<00:01, 381.46it/s]

 87%|████████▋ | 2451/2811 [00:06<00:00, 381.66it/s]

 89%|████████▊ | 2490/2811 [00:06<00:00, 383.22it/s]

 90%|████████▉ | 2529/2811 [00:06<00:00, 383.85it/s]

 91%|█████████▏| 2568/2811 [00:06<00:00, 384.64it/s]

 93%|█████████▎| 2607/2811 [00:06<00:00, 384.87it/s]

 94%|█████████▍| 2646/2811 [00:06<00:00, 385.22it/s]

 96%|█████████▌| 2685/2811 [00:07<00:00, 384.44it/s]

 97%|█████████▋| 2724/2811 [00:07<00:00, 384.61it/s]

 98%|█████████▊| 2763/2811 [00:07<00:00, 384.24it/s]

100%|█████████▉| 2802/2811 [00:07<00:00, 383.60it/s]

100%|██████████| 2811/2811 [00:07<00:00, 382.47it/s]


  0%|          | 0/22254 [00:00<?, ?it/s]

  0%|          | 36/22254 [00:00<01:02, 357.32it/s]

  0%|          | 74/22254 [00:00<01:00, 363.61it/s]

  1%|          | 113/22254 [00:00<01:00, 368.65it/s]

  1%|          | 151/22254 [00:00<00:59, 370.27it/s]

  1%|          | 189/22254 [00:00<00:59, 371.62it/s]

  1%|          | 228/22254 [00:00<00:58, 375.14it/s]

  1%|          | 267/22254 [00:00<00:58, 377.87it/s]

  1%|▏         | 306/22254 [00:00<00:57, 379.32it/s]

  2%|▏         | 344/22254 [00:00<00:57, 378.95it/s]

  2%|▏         | 382/22254 [00:01<00:57, 377.33it/s]

  2%|▏         | 421/22254 [00:01<00:57, 378.19it/s]

  2%|▏         | 460/22254 [00:01<00:57, 378.95it/s]

  2%|▏         | 498/22254 [00:01<00:57, 378.53it/s]

  2%|▏         | 536/22254 [00:01<00:58, 374.12it/s]

  3%|▎         | 574/22254 [00:01<00:57, 374.53it/s]

  3%|▎         | 613/22254 [00:01<00:57, 376.88it/s]

  3%|▎         | 651/22254 [00:01<00:57, 375.64it/s]

  3%|▎         | 689/22254 [00:01<00:57, 376.78it/s]

  3%|▎         | 728/22254 [00:01<00:56, 378.48it/s]

  3%|▎         | 766/22254 [00:02<00:56, 378.92it/s]

  4%|▎         | 804/22254 [00:02<00:56, 379.06it/s]

  4%|▍         | 843/22254 [00:02<00:56, 379.89it/s]

  4%|▍         | 881/22254 [00:02<00:56, 379.29it/s]

  4%|▍         | 920/22254 [00:02<00:56, 380.28it/s]

  4%|▍         | 959/22254 [00:02<00:55, 381.33it/s]

  4%|▍         | 998/22254 [00:02<00:55, 381.43it/s]

  5%|▍         | 1037/22254 [00:02<00:55, 380.64it/s]

  5%|▍         | 1076/22254 [00:02<00:55, 379.26it/s]

  5%|▌         | 1114/22254 [00:02<00:55, 377.92it/s]

  5%|▌         | 1152/22254 [00:03<00:55, 378.25it/s]

  5%|▌         | 1191/22254 [00:03<00:55, 378.92it/s]

  6%|▌         | 1229/22254 [00:03<00:55, 378.83it/s]

  6%|▌         | 1268/22254 [00:03<00:55, 380.66it/s]

  6%|▌         | 1307/22254 [00:03<00:54, 382.06it/s]

  6%|▌         | 1346/22254 [00:03<00:54, 382.04it/s]

  6%|▌         | 1385/22254 [00:03<00:54, 382.57it/s]

  6%|▋         | 1424/22254 [00:03<00:55, 376.35it/s]

  7%|▋         | 1462/22254 [00:03<00:55, 375.95it/s]

  7%|▋         | 1500/22254 [00:03<00:55, 376.12it/s]

  7%|▋         | 1539/22254 [00:04<00:54, 377.38it/s]

  7%|▋         | 1577/22254 [00:04<00:54, 376.67it/s]

  7%|▋         | 1615/22254 [00:04<00:54, 377.65it/s]

  7%|▋         | 1653/22254 [00:04<00:54, 377.75it/s]

  8%|▊         | 1691/22254 [00:04<00:54, 377.55it/s]

  8%|▊         | 1729/22254 [00:04<00:54, 377.43it/s]

  8%|▊         | 1767/22254 [00:04<00:54, 377.08it/s]

  8%|▊         | 1806/22254 [00:04<00:53, 379.16it/s]

  8%|▊         | 1845/22254 [00:04<00:53, 380.49it/s]

  8%|▊         | 1884/22254 [00:04<00:53, 380.79it/s]

  9%|▊         | 1923/22254 [00:05<00:54, 371.39it/s]

  9%|▉         | 1961/22254 [00:05<00:54, 373.82it/s]

  9%|▉         | 2000/22254 [00:05<00:53, 376.28it/s]

  9%|▉         | 2039/22254 [00:05<00:53, 377.48it/s]

  9%|▉         | 2078/22254 [00:05<00:53, 379.09it/s]

 10%|▉         | 2116/22254 [00:05<00:53, 378.57it/s]

 10%|▉         | 2154/22254 [00:05<00:53, 378.48it/s]

 10%|▉         | 2192/22254 [00:05<00:53, 378.43it/s]

 10%|█         | 2231/22254 [00:05<00:52, 380.06it/s]

 10%|█         | 2270/22254 [00:06<00:52, 381.54it/s]

 10%|█         | 2309/22254 [00:06<00:52, 381.97it/s]

 11%|█         | 2348/22254 [00:06<00:52, 380.86it/s]

 11%|█         | 2387/22254 [00:06<00:52, 380.62it/s]

 11%|█         | 2426/22254 [00:06<00:52, 377.03it/s]

 11%|█         | 2464/22254 [00:06<00:52, 377.42it/s]

 11%|█         | 2503/22254 [00:06<00:52, 378.42it/s]

 11%|█▏        | 2541/22254 [00:06<00:52, 377.01it/s]

 12%|█▏        | 2580/22254 [00:06<00:52, 378.35it/s]

 12%|█▏        | 2619/22254 [00:06<00:51, 378.95it/s]

 12%|█▏        | 2657/22254 [00:07<00:52, 374.68it/s]

 12%|█▏        | 2695/22254 [00:07<00:52, 374.15it/s]

 12%|█▏        | 2734/22254 [00:07<00:51, 376.79it/s]

 12%|█▏        | 2772/22254 [00:07<00:51, 377.13it/s]

 13%|█▎        | 2810/22254 [00:07<00:51, 377.77it/s]

 13%|█▎        | 2848/22254 [00:07<00:51, 376.22it/s]

 13%|█▎        | 2887/22254 [00:07<00:51, 377.52it/s]

 13%|█▎        | 2926/22254 [00:07<00:50, 379.26it/s]

 13%|█▎        | 2965/22254 [00:07<00:50, 380.41it/s]

 13%|█▎        | 3004/22254 [00:07<00:50, 380.93it/s]

 14%|█▎        | 3043/22254 [00:08<00:50, 379.97it/s]

 14%|█▍        | 3082/22254 [00:08<00:50, 380.34it/s]

 14%|█▍        | 3121/22254 [00:08<00:50, 380.66it/s]

 14%|█▍        | 3160/22254 [00:08<00:50, 379.05it/s]

 14%|█▍        | 3198/22254 [00:08<00:50, 377.74it/s]

 15%|█▍        | 3237/22254 [00:08<00:50, 379.44it/s]

 15%|█▍        | 3276/22254 [00:08<00:49, 380.79it/s]

 15%|█▍        | 3315/22254 [00:08<00:49, 380.40it/s]

 15%|█▌        | 3354/22254 [00:08<00:49, 380.51it/s]

 15%|█▌        | 3393/22254 [00:08<00:49, 379.71it/s]

 15%|█▌        | 3431/22254 [00:09<00:49, 378.86it/s]

 16%|█▌        | 3469/22254 [00:09<00:49, 377.65it/s]

 16%|█▌        | 3507/22254 [00:09<00:49, 376.80it/s]

 16%|█▌        | 3545/22254 [00:09<00:49, 375.68it/s]

 16%|█▌        | 3583/22254 [00:09<00:49, 376.41it/s]

 16%|█▋        | 3621/22254 [00:09<00:49, 377.11it/s]

 16%|█▋        | 3659/22254 [00:09<00:49, 377.52it/s]

 17%|█▋        | 3698/22254 [00:09<00:49, 378.63it/s]

 17%|█▋        | 3736/22254 [00:09<00:48, 378.78it/s]

 17%|█▋        | 3774/22254 [00:09<00:48, 377.57it/s]

 17%|█▋        | 3812/22254 [00:10<00:48, 376.53it/s]

 17%|█▋        | 3851/22254 [00:10<00:48, 377.97it/s]

 17%|█▋        | 3890/22254 [00:10<00:48, 378.99it/s]

 18%|█▊        | 3929/22254 [00:10<00:48, 379.72it/s]

 18%|█▊        | 3968/22254 [00:10<00:48, 379.86it/s]

 18%|█▊        | 4006/22254 [00:10<00:48, 379.16it/s]

 18%|█▊        | 4044/22254 [00:10<00:48, 378.05it/s]

 18%|█▊        | 4083/22254 [00:10<00:47, 380.09it/s]

 19%|█▊        | 4122/22254 [00:10<00:47, 380.13it/s]

 19%|█▊        | 4161/22254 [00:10<00:47, 378.89it/s]

 19%|█▉        | 4199/22254 [00:11<00:47, 378.42it/s]

 19%|█▉        | 4237/22254 [00:11<00:47, 376.67it/s]

 19%|█▉        | 4275/22254 [00:11<00:48, 374.07it/s]

 19%|█▉        | 4313/22254 [00:11<00:47, 375.56it/s]

 20%|█▉        | 4351/22254 [00:11<00:47, 375.63it/s]

 20%|█▉        | 4390/22254 [00:11<00:47, 377.05it/s]

 20%|█▉        | 4428/22254 [00:11<00:47, 376.50it/s]

 20%|██        | 4466/22254 [00:11<00:47, 377.04it/s]

 20%|██        | 4504/22254 [00:11<00:46, 377.86it/s]

 20%|██        | 4543/22254 [00:12<00:46, 380.11it/s]

 21%|██        | 4582/22254 [00:12<00:46, 380.10it/s]

 21%|██        | 4621/22254 [00:12<00:46, 380.72it/s]

 21%|██        | 4660/22254 [00:12<00:46, 381.03it/s]

 21%|██        | 4699/22254 [00:12<00:46, 379.25it/s]

 21%|██▏       | 4737/22254 [00:12<00:46, 379.19it/s]

 21%|██▏       | 4775/22254 [00:12<00:46, 379.22it/s]

 22%|██▏       | 4814/22254 [00:12<00:45, 380.12it/s]

 22%|██▏       | 4853/22254 [00:12<00:45, 379.75it/s]

 22%|██▏       | 4891/22254 [00:12<00:45, 378.74it/s]

 22%|██▏       | 4930/22254 [00:13<00:45, 379.19it/s]

 22%|██▏       | 4969/22254 [00:13<00:45, 380.74it/s]

 23%|██▎       | 5008/22254 [00:13<00:45, 381.36it/s]

 23%|██▎       | 5047/22254 [00:13<00:45, 381.40it/s]

 23%|██▎       | 5086/22254 [00:13<00:45, 378.33it/s]

 23%|██▎       | 5125/22254 [00:13<00:45, 379.17it/s]

 23%|██▎       | 5163/22254 [00:13<00:45, 379.31it/s]

 23%|██▎       | 5201/22254 [00:13<00:45, 377.97it/s]

 24%|██▎       | 5240/22254 [00:13<00:44, 379.27it/s]

 24%|██▎       | 5278/22254 [00:13<00:44, 378.88it/s]

 24%|██▍       | 5317/22254 [00:14<00:44, 379.96it/s]

 24%|██▍       | 5356/22254 [00:14<00:44, 380.28it/s]

 24%|██▍       | 5395/22254 [00:14<00:44, 380.12it/s]

 24%|██▍       | 5434/22254 [00:14<00:44, 379.82it/s]

 25%|██▍       | 5472/22254 [00:14<00:44, 379.84it/s]

 25%|██▍       | 5511/22254 [00:14<00:43, 381.27it/s]

 25%|██▍       | 5550/22254 [00:14<00:43, 382.06it/s]

 25%|██▌       | 5589/22254 [00:14<00:43, 381.26it/s]

 25%|██▌       | 5628/22254 [00:14<00:43, 380.59it/s]

 25%|██▌       | 5667/22254 [00:14<00:43, 380.80it/s]

 26%|██▌       | 5706/22254 [00:15<00:43, 379.43it/s]

 26%|██▌       | 5744/22254 [00:15<00:43, 379.15it/s]

 26%|██▌       | 5782/22254 [00:15<00:43, 378.76it/s]

 26%|██▌       | 5820/22254 [00:15<00:43, 378.84it/s]

 26%|██▋       | 5858/22254 [00:15<00:43, 378.86it/s]

 26%|██▋       | 5897/22254 [00:15<00:43, 379.29it/s]

 27%|██▋       | 5935/22254 [00:15<00:43, 376.67it/s]

 27%|██▋       | 5973/22254 [00:15<00:43, 376.91it/s]

 27%|██▋       | 6011/22254 [00:15<00:43, 375.97it/s]

 27%|██▋       | 6049/22254 [00:15<00:43, 375.30it/s]

 27%|██▋       | 6087/22254 [00:16<00:43, 375.36it/s]

 28%|██▊       | 6125/22254 [00:16<00:42, 375.38it/s]

 28%|██▊       | 6163/22254 [00:16<00:43, 373.66it/s]

 28%|██▊       | 6201/22254 [00:16<00:43, 373.14it/s]

 28%|██▊       | 6239/22254 [00:16<00:42, 373.16it/s]

 28%|██▊       | 6278/22254 [00:16<00:42, 375.47it/s]

 28%|██▊       | 6317/22254 [00:16<00:42, 377.29it/s]

 29%|██▊       | 6356/22254 [00:16<00:42, 378.36it/s]

 29%|██▊       | 6394/22254 [00:16<00:41, 378.40it/s]

 29%|██▉       | 6433/22254 [00:17<00:41, 379.28it/s]

 29%|██▉       | 6471/22254 [00:17<00:41, 379.11it/s]

 29%|██▉       | 6509/22254 [00:17<00:41, 378.22it/s]

 29%|██▉       | 6548/22254 [00:17<00:41, 379.22it/s]

 30%|██▉       | 6586/22254 [00:17<00:41, 379.23it/s]

 30%|██▉       | 6625/22254 [00:17<00:41, 379.48it/s]

 30%|██▉       | 6663/22254 [00:17<00:41, 378.32it/s]

 30%|███       | 6701/22254 [00:17<00:41, 378.78it/s]

 30%|███       | 6739/22254 [00:17<00:40, 379.12it/s]

 30%|███       | 6777/22254 [00:17<00:40, 378.87it/s]

 31%|███       | 6816/22254 [00:18<00:40, 379.53it/s]

 31%|███       | 6854/22254 [00:18<00:40, 379.30it/s]

 31%|███       | 6893/22254 [00:18<00:40, 380.31it/s]

 31%|███       | 6932/22254 [00:18<00:40, 380.50it/s]

 31%|███▏      | 6971/22254 [00:18<00:40, 380.43it/s]

 31%|███▏      | 7010/22254 [00:18<00:40, 374.04it/s]

 32%|███▏      | 7049/22254 [00:18<00:40, 376.40it/s]

 32%|███▏      | 7087/22254 [00:18<00:40, 376.81it/s]

 32%|███▏      | 7125/22254 [00:18<00:40, 376.79it/s]

 32%|███▏      | 7164/22254 [00:18<00:39, 378.76it/s]

 32%|███▏      | 7202/22254 [00:19<00:39, 378.20it/s]

 33%|███▎      | 7240/22254 [00:19<00:39, 378.71it/s]

 33%|███▎      | 7278/22254 [00:19<00:39, 377.95it/s]

 33%|███▎      | 7316/22254 [00:19<00:39, 378.47it/s]

 33%|███▎      | 7354/22254 [00:19<00:39, 377.91it/s]

 33%|███▎      | 7393/22254 [00:19<00:39, 378.79it/s]

 33%|███▎      | 7431/22254 [00:19<00:39, 378.25it/s]

 34%|███▎      | 7469/22254 [00:19<00:39, 375.25it/s]

 34%|███▎      | 7508/22254 [00:19<00:39, 377.42it/s]

 34%|███▍      | 7546/22254 [00:19<00:38, 377.37it/s]

 34%|███▍      | 7585/22254 [00:20<00:38, 378.51it/s]

 34%|███▍      | 7623/22254 [00:20<00:38, 378.45it/s]

 34%|███▍      | 7661/22254 [00:20<00:38, 378.38it/s]

 35%|███▍      | 7699/22254 [00:20<00:38, 378.09it/s]

 35%|███▍      | 7738/22254 [00:20<00:38, 378.77it/s]

 35%|███▍      | 7776/22254 [00:20<00:38, 379.08it/s]

 35%|███▌      | 7815/22254 [00:20<00:38, 379.58it/s]

 35%|███▌      | 7853/22254 [00:20<00:38, 377.94it/s]

 35%|███▌      | 7892/22254 [00:20<00:37, 378.85it/s]

 36%|███▌      | 7930/22254 [00:20<00:38, 376.34it/s]

 36%|███▌      | 7968/22254 [00:21<00:37, 376.13it/s]

 36%|███▌      | 8006/22254 [00:21<00:38, 369.43it/s]

 36%|███▌      | 8043/22254 [00:21<00:39, 363.28it/s]

 36%|███▋      | 8080/22254 [00:21<00:39, 360.12it/s]

 36%|███▋      | 8119/22254 [00:21<00:38, 365.59it/s]

 37%|███▋      | 8157/22254 [00:21<00:38, 368.04it/s]

 37%|███▋      | 8195/22254 [00:21<00:38, 369.53it/s]

 37%|███▋      | 8234/22254 [00:21<00:37, 373.10it/s]

 37%|███▋      | 8273/22254 [00:21<00:37, 376.61it/s]

 37%|███▋      | 8312/22254 [00:21<00:36, 378.19it/s]

 38%|███▊      | 8350/22254 [00:22<00:36, 378.21it/s]

 38%|███▊      | 8388/22254 [00:22<00:36, 377.36it/s]

 38%|███▊      | 8427/22254 [00:22<00:36, 378.53it/s]

 38%|███▊      | 8466/22254 [00:22<00:36, 379.46it/s]

 38%|███▊      | 8505/22254 [00:22<00:36, 380.24it/s]

 38%|███▊      | 8544/22254 [00:22<00:36, 379.65it/s]

 39%|███▊      | 8583/22254 [00:22<00:35, 380.60it/s]

 39%|███▊      | 8622/22254 [00:22<00:35, 380.56it/s]

 39%|███▉      | 8661/22254 [00:22<00:35, 379.77it/s]

 39%|███▉      | 8699/22254 [00:23<00:35, 378.89it/s]

 39%|███▉      | 8737/22254 [00:23<00:35, 377.69it/s]

 39%|███▉      | 8776/22254 [00:23<00:35, 378.53it/s]

 40%|███▉      | 8815/22254 [00:23<00:35, 380.67it/s]

 40%|███▉      | 8854/22254 [00:23<00:35, 379.99it/s]

 40%|███▉      | 8893/22254 [00:23<00:35, 379.25it/s]

 40%|████      | 8931/22254 [00:23<00:35, 375.88it/s]

 40%|████      | 8969/22254 [00:23<00:35, 376.51it/s]

 40%|████      | 9007/22254 [00:23<00:35, 374.60it/s]

 41%|████      | 9045/22254 [00:23<00:35, 374.31it/s]

 41%|████      | 9084/22254 [00:24<00:34, 376.75it/s]

 41%|████      | 9122/22254 [00:24<00:34, 375.37it/s]

 41%|████      | 9160/22254 [00:24<00:34, 376.42it/s]

 41%|████▏     | 9199/22254 [00:24<00:34, 377.33it/s]

 42%|████▏     | 9237/22254 [00:24<00:34, 373.94it/s]

 42%|████▏     | 9275/22254 [00:24<00:34, 373.51it/s]

 42%|████▏     | 9314/22254 [00:24<00:34, 376.02it/s]

 42%|████▏     | 9353/22254 [00:24<00:34, 377.77it/s]

 42%|████▏     | 9392/22254 [00:24<00:33, 378.57it/s]

 42%|████▏     | 9431/22254 [00:24<00:33, 379.01it/s]

 43%|████▎     | 9469/22254 [00:25<00:33, 378.53it/s]

 43%|████▎     | 9507/22254 [00:25<00:33, 378.54it/s]

 43%|████▎     | 9546/22254 [00:25<00:33, 379.50it/s]

 43%|████▎     | 9584/22254 [00:25<00:33, 378.91it/s]

 43%|████▎     | 9623/22254 [00:25<00:33, 379.37it/s]

 43%|████▎     | 9662/22254 [00:25<00:33, 380.44it/s]

 44%|████▎     | 9701/22254 [00:25<00:32, 380.44it/s]

 44%|████▍     | 9740/22254 [00:25<00:32, 380.83it/s]

 44%|████▍     | 9779/22254 [00:25<00:32, 381.69it/s]

 44%|████▍     | 9818/22254 [00:25<00:32, 381.18it/s]

 44%|████▍     | 9857/22254 [00:26<00:32, 375.86it/s]

 44%|████▍     | 9896/22254 [00:26<00:32, 377.53it/s]

 45%|████▍     | 9935/22254 [00:26<00:32, 378.65it/s]

 45%|████▍     | 9973/22254 [00:26<00:32, 378.15it/s]

 45%|████▍     | 10012/22254 [00:26<00:32, 379.35it/s]

 45%|████▌     | 10050/22254 [00:26<00:32, 379.16it/s]

 45%|████▌     | 10088/22254 [00:26<00:32, 379.32it/s]

 46%|████▌     | 10126/22254 [00:26<00:32, 378.87it/s]

 46%|████▌     | 10165/22254 [00:26<00:31, 379.51it/s]

 46%|████▌     | 10203/22254 [00:26<00:31, 379.47it/s]

 46%|████▌     | 10241/22254 [00:27<00:31, 379.59it/s]

 46%|████▌     | 10280/22254 [00:27<00:31, 380.24it/s]

 46%|████▋     | 10319/22254 [00:27<00:31, 379.29it/s]

 47%|████▋     | 10357/22254 [00:27<00:31, 377.51it/s]

 47%|████▋     | 10395/22254 [00:27<00:31, 375.60it/s]

 47%|████▋     | 10433/22254 [00:27<00:31, 376.21it/s]

 47%|████▋     | 10472/22254 [00:27<00:31, 378.33it/s]

 47%|████▋     | 10511/22254 [00:27<00:30, 379.60it/s]

 47%|████▋     | 10550/22254 [00:27<00:30, 381.60it/s]

 48%|████▊     | 10589/22254 [00:28<00:30, 380.75it/s]

 48%|████▊     | 10628/22254 [00:28<00:30, 378.34it/s]

 48%|████▊     | 10667/22254 [00:28<00:30, 379.24it/s]

 48%|████▊     | 10705/22254 [00:28<00:30, 378.59it/s]

 48%|████▊     | 10743/22254 [00:28<00:30, 378.42it/s]

 48%|████▊     | 10781/22254 [00:28<00:30, 377.07it/s]

 49%|████▊     | 10819/22254 [00:28<00:30, 376.92it/s]

 49%|████▉     | 10858/22254 [00:28<00:30, 378.24it/s]

 49%|████▉     | 10897/22254 [00:28<00:29, 379.06it/s]

 49%|████▉     | 10935/22254 [00:28<00:29, 379.19it/s]

 49%|████▉     | 10973/22254 [00:29<00:29, 378.58it/s]

 49%|████▉     | 11012/22254 [00:29<00:29, 379.51it/s]

 50%|████▉     | 11051/22254 [00:29<00:29, 380.06it/s]

 50%|████▉     | 11090/22254 [00:29<00:29, 381.60it/s]

 50%|█████     | 11129/22254 [00:29<00:29, 379.69it/s]

 50%|█████     | 11167/22254 [00:29<00:29, 378.01it/s]

 50%|█████     | 11205/22254 [00:29<00:29, 377.72it/s]

 51%|█████     | 11243/22254 [00:29<00:29, 377.21it/s]

 51%|█████     | 11282/22254 [00:29<00:28, 378.79it/s]

 51%|█████     | 11320/22254 [00:29<00:28, 378.30it/s]

 51%|█████     | 11358/22254 [00:30<00:28, 375.89it/s]

 51%|█████     | 11397/22254 [00:30<00:28, 378.12it/s]

 51%|█████▏    | 11435/22254 [00:30<00:28, 376.05it/s]

 52%|█████▏    | 11474/22254 [00:30<00:28, 377.36it/s]

 52%|█████▏    | 11512/22254 [00:30<00:28, 377.39it/s]

 52%|█████▏    | 11550/22254 [00:30<00:28, 373.05it/s]

 52%|█████▏    | 11588/22254 [00:30<00:28, 373.45it/s]

 52%|█████▏    | 11626/22254 [00:30<00:28, 371.70it/s]

 52%|█████▏    | 11664/22254 [00:30<00:28, 372.59it/s]

 53%|█████▎    | 11702/22254 [00:30<00:28, 374.06it/s]

 53%|█████▎    | 11740/22254 [00:31<00:27, 375.80it/s]

 53%|█████▎    | 11779/22254 [00:31<00:27, 377.70it/s]

 53%|█████▎    | 11817/22254 [00:31<00:27, 377.93it/s]

 53%|█████▎    | 11855/22254 [00:31<00:27, 377.98it/s]

 53%|█████▎    | 11894/22254 [00:31<00:27, 378.95it/s]

 54%|█████▎    | 11933/22254 [00:31<00:27, 379.46it/s]

 54%|█████▍    | 11972/22254 [00:31<00:27, 380.17it/s]

 54%|█████▍    | 12011/22254 [00:31<00:26, 381.01it/s]

 54%|█████▍    | 12050/22254 [00:31<00:26, 382.08it/s]

 54%|█████▍    | 12089/22254 [00:31<00:26, 381.56it/s]

 54%|█████▍    | 12128/22254 [00:32<00:26, 381.63it/s]

 55%|█████▍    | 12167/22254 [00:32<00:26, 380.00it/s]

 55%|█████▍    | 12206/22254 [00:32<00:26, 379.63it/s]

 55%|█████▌    | 12245/22254 [00:32<00:26, 380.80it/s]

 55%|█████▌    | 12284/22254 [00:32<00:26, 379.96it/s]

 55%|█████▌    | 12323/22254 [00:32<00:26, 380.30it/s]

 56%|█████▌    | 12362/22254 [00:32<00:26, 379.54it/s]

 56%|█████▌    | 12400/22254 [00:32<00:25, 379.57it/s]

 56%|█████▌    | 12439/22254 [00:32<00:25, 380.52it/s]

 56%|█████▌    | 12478/22254 [00:32<00:25, 380.02it/s]

 56%|█████▌    | 12517/22254 [00:33<00:25, 380.48it/s]

 56%|█████▋    | 12556/22254 [00:33<00:25, 380.86it/s]

 57%|█████▋    | 12595/22254 [00:33<00:25, 381.96it/s]

 57%|█████▋    | 12634/22254 [00:33<00:25, 380.70it/s]

 57%|█████▋    | 12673/22254 [00:33<00:25, 381.80it/s]

 57%|█████▋    | 12712/22254 [00:33<00:25, 381.04it/s]

 57%|█████▋    | 12751/22254 [00:33<00:24, 381.33it/s]

 57%|█████▋    | 12790/22254 [00:33<00:24, 380.06it/s]

 58%|█████▊    | 12829/22254 [00:33<00:24, 380.85it/s]

 58%|█████▊    | 12868/22254 [00:34<00:24, 380.31it/s]

 58%|█████▊    | 12907/22254 [00:34<00:24, 380.75it/s]

 58%|█████▊    | 12946/22254 [00:34<00:24, 380.78it/s]

 58%|█████▊    | 12985/22254 [00:34<00:24, 381.31it/s]

 59%|█████▊    | 13024/22254 [00:34<00:24, 380.83it/s]

 59%|█████▊    | 13063/22254 [00:34<00:24, 380.71it/s]

 59%|█████▉    | 13102/22254 [00:34<00:24, 380.23it/s]

 59%|█████▉    | 13141/22254 [00:34<00:24, 379.33it/s]

 59%|█████▉    | 13179/22254 [00:34<00:24, 377.41it/s]

 59%|█████▉    | 13217/22254 [00:34<00:24, 376.07it/s]

 60%|█████▉    | 13255/22254 [00:35<00:23, 375.26it/s]

 60%|█████▉    | 13293/22254 [00:35<00:23, 374.09it/s]

 60%|█████▉    | 13331/22254 [00:35<00:23, 374.73it/s]

 60%|██████    | 13369/22254 [00:35<00:24, 369.34it/s]

 60%|██████    | 13407/22254 [00:35<00:23, 371.79it/s]

 60%|██████    | 13446/22254 [00:35<00:23, 374.93it/s]

 61%|██████    | 13485/22254 [00:35<00:23, 376.93it/s]

 61%|██████    | 13524/22254 [00:35<00:23, 378.58it/s]

 61%|██████    | 13562/22254 [00:35<00:22, 378.55it/s]

 61%|██████    | 13600/22254 [00:35<00:22, 377.92it/s]

 61%|██████▏   | 13638/22254 [00:36<00:23, 372.57it/s]

 61%|██████▏   | 13677/22254 [00:36<00:22, 375.59it/s]

 62%|██████▏   | 13716/22254 [00:36<00:22, 377.65it/s]

 62%|██████▏   | 13754/22254 [00:36<00:22, 378.06it/s]

 62%|██████▏   | 13792/22254 [00:36<00:22, 375.51it/s]

 62%|██████▏   | 13831/22254 [00:36<00:22, 377.56it/s]

 62%|██████▏   | 13870/22254 [00:36<00:22, 379.69it/s]

 63%|██████▎   | 13909/22254 [00:36<00:21, 380.50it/s]

 63%|██████▎   | 13948/22254 [00:36<00:21, 380.90it/s]

 63%|██████▎   | 13987/22254 [00:36<00:21, 380.71it/s]

 63%|██████▎   | 14026/22254 [00:37<00:21, 379.62it/s]

 63%|██████▎   | 14064/22254 [00:37<00:21, 378.09it/s]

 63%|██████▎   | 14102/22254 [00:37<00:21, 377.73it/s]

 64%|██████▎   | 14140/22254 [00:37<00:21, 376.67it/s]

 64%|██████▎   | 14179/22254 [00:37<00:21, 378.88it/s]

 64%|██████▍   | 14218/22254 [00:37<00:21, 379.66it/s]

 64%|██████▍   | 14256/22254 [00:37<00:21, 379.72it/s]

 64%|██████▍   | 14294/22254 [00:37<00:21, 378.56it/s]

 64%|██████▍   | 14333/22254 [00:37<00:20, 379.06it/s]

 65%|██████▍   | 14371/22254 [00:38<00:20, 376.33it/s]

 65%|██████▍   | 14409/22254 [00:38<00:20, 376.27it/s]

 65%|██████▍   | 14447/22254 [00:38<00:20, 375.91it/s]

 65%|██████▌   | 14485/22254 [00:38<00:20, 375.77it/s]

 65%|██████▌   | 14523/22254 [00:38<00:20, 376.82it/s]

 65%|██████▌   | 14562/22254 [00:38<00:20, 378.83it/s]

 66%|██████▌   | 14600/22254 [00:38<00:20, 377.29it/s]

 66%|██████▌   | 14639/22254 [00:38<00:20, 379.13it/s]

 66%|██████▌   | 14677/22254 [00:38<00:19, 379.28it/s]

 66%|██████▌   | 14716/22254 [00:38<00:19, 380.67it/s]

 66%|██████▋   | 14755/22254 [00:39<00:19, 380.82it/s]

 66%|██████▋   | 14794/22254 [00:39<00:19, 381.62it/s]

 67%|██████▋   | 14833/22254 [00:39<00:20, 370.42it/s]

 67%|██████▋   | 14871/22254 [00:39<00:19, 369.68it/s]

 67%|██████▋   | 14909/22254 [00:39<00:19, 371.51it/s]

 67%|██████▋   | 14948/22254 [00:39<00:19, 374.11it/s]

 67%|██████▋   | 14987/22254 [00:39<00:19, 375.95it/s]

 68%|██████▊   | 15026/22254 [00:39<00:19, 377.69it/s]

 68%|██████▊   | 15065/22254 [00:39<00:18, 379.11it/s]

 68%|██████▊   | 15103/22254 [00:39<00:18, 378.98it/s]

 68%|██████▊   | 15141/22254 [00:40<00:18, 377.88it/s]

 68%|██████▊   | 15180/22254 [00:40<00:18, 378.88it/s]

 68%|██████▊   | 15218/22254 [00:40<00:18, 376.94it/s]

 69%|██████▊   | 15256/22254 [00:40<00:18, 375.81it/s]

 69%|██████▊   | 15294/22254 [00:40<00:18, 376.57it/s]

 69%|██████▉   | 15332/22254 [00:40<00:18, 377.08it/s]

 69%|██████▉   | 15371/22254 [00:40<00:18, 378.03it/s]

 69%|██████▉   | 15409/22254 [00:40<00:18, 378.14it/s]

 69%|██████▉   | 15448/22254 [00:40<00:17, 379.01it/s]

 70%|██████▉   | 15486/22254 [00:40<00:17, 379.05it/s]

 70%|██████▉   | 15524/22254 [00:41<00:17, 379.19it/s]

 70%|██████▉   | 15562/22254 [00:41<00:17, 377.82it/s]

 70%|███████   | 15600/22254 [00:41<00:17, 377.83it/s]

 70%|███████   | 15639/22254 [00:41<00:17, 378.96it/s]

 70%|███████   | 15677/22254 [00:41<00:17, 379.00it/s]

 71%|███████   | 15715/22254 [00:41<00:17, 374.97it/s]

 71%|███████   | 15753/22254 [00:41<00:17, 375.88it/s]

 71%|███████   | 15791/22254 [00:41<00:17, 374.09it/s]

 71%|███████   | 15830/22254 [00:41<00:17, 376.35it/s]

 71%|███████▏  | 15868/22254 [00:41<00:16, 377.35it/s]

 71%|███████▏  | 15907/22254 [00:42<00:16, 378.79it/s]

 72%|███████▏  | 15946/22254 [00:42<00:16, 380.18it/s]

 72%|███████▏  | 15985/22254 [00:42<00:16, 380.59it/s]

 72%|███████▏  | 16024/22254 [00:42<00:16, 378.22it/s]

 72%|███████▏  | 16062/22254 [00:42<00:16, 377.48it/s]

 72%|███████▏  | 16101/22254 [00:42<00:16, 378.39it/s]

 73%|███████▎  | 16139/22254 [00:42<00:16, 373.97it/s]

 73%|███████▎  | 16177/22254 [00:42<00:16, 374.62it/s]

 73%|███████▎  | 16215/22254 [00:42<00:16, 375.29it/s]

 73%|███████▎  | 16253/22254 [00:42<00:15, 375.42it/s]

 73%|███████▎  | 16292/22254 [00:43<00:15, 377.72it/s]

 73%|███████▎  | 16330/22254 [00:43<00:15, 378.36it/s]

 74%|███████▎  | 16369/22254 [00:43<00:15, 379.40it/s]

 74%|███████▎  | 16407/22254 [00:43<00:15, 378.95it/s]

 74%|███████▍  | 16446/22254 [00:43<00:15, 380.19it/s]

 74%|███████▍  | 16485/22254 [00:43<00:15, 379.29it/s]

 74%|███████▍  | 16524/22254 [00:43<00:15, 380.59it/s]

 74%|███████▍  | 16563/22254 [00:43<00:14, 380.17it/s]

 75%|███████▍  | 16602/22254 [00:43<00:14, 378.19it/s]

 75%|███████▍  | 16641/22254 [00:44<00:14, 380.27it/s]

 75%|███████▍  | 16680/22254 [00:44<00:14, 380.93it/s]

 75%|███████▌  | 16719/22254 [00:44<00:14, 379.81it/s]

 75%|███████▌  | 16758/22254 [00:44<00:14, 379.96it/s]

 75%|███████▌  | 16796/22254 [00:44<00:14, 378.97it/s]

 76%|███████▌  | 16835/22254 [00:44<00:14, 380.27it/s]

 76%|███████▌  | 16874/22254 [00:44<00:14, 378.99it/s]

 76%|███████▌  | 16912/22254 [00:44<00:14, 374.67it/s]

 76%|███████▌  | 16950/22254 [00:44<00:14, 375.96it/s]

 76%|███████▋  | 16989/22254 [00:44<00:13, 377.42it/s]

 77%|███████▋  | 17027/22254 [00:45<00:13, 377.42it/s]

 77%|███████▋  | 17065/22254 [00:45<00:13, 377.22it/s]

 77%|███████▋  | 17104/22254 [00:45<00:13, 379.26it/s]

 77%|███████▋  | 17142/22254 [00:45<00:13, 377.91it/s]

 77%|███████▋  | 17180/22254 [00:45<00:13, 377.63it/s]

 77%|███████▋  | 17218/22254 [00:45<00:13, 373.70it/s]

 78%|███████▊  | 17256/22254 [00:45<00:13, 374.35it/s]

 78%|███████▊  | 17295/22254 [00:45<00:13, 376.64it/s]

 78%|███████▊  | 17334/22254 [00:45<00:13, 378.01it/s]

 78%|███████▊  | 17372/22254 [00:45<00:12, 378.02it/s]

 78%|███████▊  | 17410/22254 [00:46<00:12, 375.80it/s]

 78%|███████▊  | 17448/22254 [00:46<00:12, 375.32it/s]

 79%|███████▊  | 17486/22254 [00:46<00:12, 374.76it/s]

 79%|███████▊  | 17524/22254 [00:46<00:12, 374.45it/s]

 79%|███████▉  | 17563/22254 [00:46<00:12, 376.46it/s]

 79%|███████▉  | 17601/22254 [00:46<00:12, 377.22it/s]

 79%|███████▉  | 17639/22254 [00:46<00:12, 376.95it/s]

 79%|███████▉  | 17678/22254 [00:46<00:12, 378.68it/s]

 80%|███████▉  | 17717/22254 [00:46<00:11, 380.15it/s]

 80%|███████▉  | 17756/22254 [00:46<00:11, 380.59it/s]

 80%|███████▉  | 17795/22254 [00:47<00:11, 379.58it/s]

 80%|████████  | 17834/22254 [00:47<00:11, 380.96it/s]

 80%|████████  | 17873/22254 [00:47<00:11, 381.42it/s]

 80%|████████  | 17912/22254 [00:47<00:11, 379.70it/s]

 81%|████████  | 17950/22254 [00:47<00:11, 376.27it/s]

 81%|████████  | 17988/22254 [00:47<00:11, 375.34it/s]

 81%|████████  | 18026/22254 [00:47<00:11, 376.69it/s]

 81%|████████  | 18065/22254 [00:47<00:11, 378.46it/s]

 81%|████████▏ | 18104/22254 [00:47<00:10, 380.10it/s]

 82%|████████▏ | 18143/22254 [00:47<00:10, 380.58it/s]

 82%|████████▏ | 18182/22254 [00:48<00:10, 379.38it/s]

 82%|████████▏ | 18221/22254 [00:48<00:10, 379.52it/s]

 82%|████████▏ | 18259/22254 [00:48<00:10, 379.56it/s]

 82%|████████▏ | 18298/22254 [00:48<00:10, 380.71it/s]

 82%|████████▏ | 18337/22254 [00:48<00:10, 381.32it/s]

 83%|████████▎ | 18376/22254 [00:48<00:10, 381.01it/s]

 83%|████████▎ | 18415/22254 [00:48<00:10, 381.60it/s]

 83%|████████▎ | 18454/22254 [00:48<00:09, 382.44it/s]

 83%|████████▎ | 18493/22254 [00:48<00:09, 380.93it/s]

 83%|████████▎ | 18532/22254 [00:49<00:09, 381.09it/s]

 83%|████████▎ | 18571/22254 [00:49<00:09, 380.19it/s]

 84%|████████▎ | 18610/22254 [00:49<00:09, 380.23it/s]

 84%|████████▍ | 18649/22254 [00:49<00:09, 379.34it/s]

 84%|████████▍ | 18687/22254 [00:49<00:09, 378.61it/s]

 84%|████████▍ | 18725/22254 [00:49<00:09, 377.82it/s]

 84%|████████▍ | 18764/22254 [00:49<00:09, 378.55it/s]

 84%|████████▍ | 18803/22254 [00:49<00:09, 379.28it/s]

 85%|████████▍ | 18841/22254 [00:49<00:09, 378.23it/s]

 85%|████████▍ | 18879/22254 [00:49<00:08, 378.44it/s]

 85%|████████▌ | 18917/22254 [00:50<00:08, 378.61it/s]

 85%|████████▌ | 18956/22254 [00:50<00:08, 379.63it/s]

 85%|████████▌ | 18994/22254 [00:50<00:08, 378.65it/s]

 86%|████████▌ | 19033/22254 [00:50<00:08, 379.72it/s]

 86%|████████▌ | 19072/22254 [00:50<00:08, 380.63it/s]

 86%|████████▌ | 19111/22254 [00:50<00:08, 379.26it/s]

 86%|████████▌ | 19150/22254 [00:50<00:08, 379.45it/s]

 86%|████████▌ | 19188/22254 [00:50<00:08, 379.54it/s]

 86%|████████▋ | 19227/22254 [00:50<00:07, 380.09it/s]

 87%|████████▋ | 19266/22254 [00:50<00:07, 380.49it/s]

 87%|████████▋ | 19305/22254 [00:51<00:07, 379.82it/s]

 87%|████████▋ | 19343/22254 [00:51<00:07, 379.17it/s]

 87%|████████▋ | 19382/22254 [00:51<00:07, 379.83it/s]

 87%|████████▋ | 19420/22254 [00:51<00:07, 378.83it/s]

 87%|████████▋ | 19459/22254 [00:51<00:07, 380.06it/s]

 88%|████████▊ | 19498/22254 [00:51<00:07, 378.62it/s]

 88%|████████▊ | 19536/22254 [00:51<00:07, 378.68it/s]

 88%|████████▊ | 19574/22254 [00:51<00:07, 377.88it/s]

 88%|████████▊ | 19613/22254 [00:51<00:06, 379.05it/s]

 88%|████████▊ | 19651/22254 [00:51<00:06, 378.82it/s]

 88%|████████▊ | 19689/22254 [00:52<00:06, 377.69it/s]

 89%|████████▊ | 19728/22254 [00:52<00:06, 380.19it/s]

 89%|████████▉ | 19767/22254 [00:52<00:06, 380.14it/s]

 89%|████████▉ | 19806/22254 [00:52<00:06, 379.97it/s]

 89%|████████▉ | 19844/22254 [00:52<00:06, 379.96it/s]

 89%|████████▉ | 19882/22254 [00:52<00:06, 377.42it/s]

 90%|████████▉ | 19921/22254 [00:52<00:06, 378.26it/s]

 90%|████████▉ | 19960/22254 [00:52<00:06, 379.86it/s]

 90%|████████▉ | 19998/22254 [00:52<00:05, 377.53it/s]

 90%|█████████ | 20037/22254 [00:52<00:05, 378.48it/s]

 90%|█████████ | 20076/22254 [00:53<00:05, 379.72it/s]

 90%|█████████ | 20114/22254 [00:53<00:05, 378.98it/s]

 91%|█████████ | 20152/22254 [00:53<00:05, 379.14it/s]

 91%|█████████ | 20190/22254 [00:53<00:05, 379.19it/s]

 91%|█████████ | 20228/22254 [00:53<00:05, 378.22it/s]

 91%|█████████ | 20266/22254 [00:53<00:05, 378.27it/s]

 91%|█████████ | 20304/22254 [00:53<00:05, 377.27it/s]

 91%|█████████▏| 20343/22254 [00:53<00:05, 378.05it/s]

 92%|█████████▏| 20381/22254 [00:53<00:04, 377.59it/s]

 92%|█████████▏| 20419/22254 [00:53<00:04, 378.22it/s]

 92%|█████████▏| 20458/22254 [00:54<00:04, 378.87it/s]

 92%|█████████▏| 20497/22254 [00:54<00:04, 379.30it/s]

 92%|█████████▏| 20536/22254 [00:54<00:04, 380.14it/s]

 92%|█████████▏| 20575/22254 [00:54<00:04, 376.36it/s]

 93%|█████████▎| 20613/22254 [00:54<00:04, 372.38it/s]

 93%|█████████▎| 20651/22254 [00:54<00:04, 374.39it/s]

 93%|█████████▎| 20689/22254 [00:54<00:04, 375.86it/s]

 93%|█████████▎| 20727/22254 [00:54<00:04, 375.98it/s]

 93%|█████████▎| 20766/22254 [00:54<00:03, 378.94it/s]

 93%|█████████▎| 20804/22254 [00:55<00:03, 377.84it/s]

 94%|█████████▎| 20842/22254 [00:55<00:03, 376.50it/s]

 94%|█████████▍| 20880/22254 [00:55<00:03, 377.14it/s]

 94%|█████████▍| 20919/22254 [00:55<00:03, 378.56it/s]

 94%|█████████▍| 20958/22254 [00:55<00:03, 379.79it/s]

 94%|█████████▍| 20996/22254 [00:55<00:04, 292.17it/s]

 95%|█████████▍| 21035/22254 [00:55<00:03, 314.31it/s]

 95%|█████████▍| 21073/22254 [00:55<00:03, 331.18it/s]

 95%|█████████▍| 21111/22254 [00:55<00:03, 344.15it/s]

 95%|█████████▌| 21149/22254 [00:56<00:03, 353.66it/s]

 95%|█████████▌| 21187/22254 [00:56<00:02, 360.14it/s]

 95%|█████████▌| 21225/22254 [00:56<00:02, 365.72it/s]

 96%|█████████▌| 21263/22254 [00:56<00:02, 369.60it/s]

 96%|█████████▌| 21302/22254 [00:56<00:02, 373.55it/s]

 96%|█████████▌| 21340/22254 [00:56<00:02, 375.03it/s]

 96%|█████████▌| 21379/22254 [00:56<00:02, 376.93it/s]

 96%|█████████▌| 21418/22254 [00:56<00:02, 378.47it/s]

 96%|█████████▋| 21456/22254 [00:56<00:02, 377.86it/s]

 97%|█████████▋| 21494/22254 [00:56<00:02, 375.16it/s]

 97%|█████████▋| 21532/22254 [00:57<00:01, 375.83it/s]

 97%|█████████▋| 21570/22254 [00:57<00:01, 375.73it/s]

 97%|█████████▋| 21609/22254 [00:57<00:01, 377.27it/s]

 97%|█████████▋| 21647/22254 [00:57<00:01, 377.70it/s]

 97%|█████████▋| 21685/22254 [00:57<00:01, 377.31it/s]

 98%|█████████▊| 21724/22254 [00:57<00:01, 378.53it/s]

 98%|█████████▊| 21763/22254 [00:57<00:01, 379.87it/s]

 98%|█████████▊| 21801/22254 [00:57<00:01, 378.50it/s]

 98%|█████████▊| 21839/22254 [00:57<00:01, 378.95it/s]

 98%|█████████▊| 21878/22254 [00:57<00:00, 379.55it/s]

 98%|█████████▊| 21916/22254 [00:58<00:00, 379.35it/s]

 99%|█████████▊| 21954/22254 [00:58<00:00, 379.23it/s]

 99%|█████████▉| 21993/22254 [00:58<00:00, 379.65it/s]

 99%|█████████▉| 22032/22254 [00:58<00:00, 380.15it/s]

 99%|█████████▉| 22071/22254 [00:58<00:00, 380.58it/s]

 99%|█████████▉| 22110/22254 [00:58<00:00, 380.06it/s]

100%|█████████▉| 22149/22254 [00:58<00:00, 379.86it/s]

100%|█████████▉| 22187/22254 [00:58<00:00, 377.98it/s]

100%|█████████▉| 22226/22254 [00:58<00:00, 379.91it/s]

100%|██████████| 22254/22254 [00:58<00:00, 377.57it/s]




Unnamed: 0,j,Polarity,Coverage,Overlaps,Conflicts,Correct,Incorrect,Emp. Acc.
lf_husband_wife,0,[1],0.089648,0.036642,0.017432,93,159,0.369048
lf_husband_wife_left_window,1,[1],0.025258,0.021345,0.003557,30,41,0.422535
lf_same_last_name,2,[1],0.040555,0.016009,0.008538,19,95,0.166667
lf_married,3,[1],0.01921,0.006759,0.00249,22,32,0.407407
lf_familial_relationship,4,[0],0.115617,0.051939,0.026325,310,15,0.953846
lf_family_left_window,5,[0],0.041266,0.03344,0.007826,114,2,0.982759
lf_other_relationship,6,[0],0.013874,0.002846,0.002846,33,6,0.846154
lf_distant_supervision,7,[1],0.001067,0.001067,0.0,2,1,0.666667
lf_distant_supervision_last_names,8,[1],0.001067,0.000711,0.000356,0,3,0.0


### Training the Label Model

Now, we'll train a model of the LFs to estimate their weights and combine their outputs. Once the model is trained, we can combine the outputs of the LFs into a single, noise-aware training label set for our extractor.

In [17]:
from snorkel.labeling.model.label_model import LabelModel

label_model = LabelModel(cardinality=2, verbose=True)
label_model.fit(train_L, Y_dev, n_epochs=5000, log_freq=500, seed=12345)

### Label Model Metrics
Since our dataset is highly unbalanced (91% of the labels are negative), even a trivial baseline that always outputs negative can get a high accuracy. So we evaluate the label model using the F1 score and ROC-AUC rather than accuracy.

In [18]:
from snorkel.analysis.metrics import metric_score
from snorkel.analysis.utils import probs_to_preds

Y_probs_dev = label_model.predict_proba(dev_L)
Y_preds_dev = probs_to_preds(Y_probs_dev)
print(
    f"Label model f1 score: {metric_score(Y_dev, Y_preds_dev, probs=Y_probs_dev, metric='f1')}"
)
print(
    f"Label model roc-auc: {metric_score(Y_dev, Y_preds_dev, probs=Y_probs_dev, metric='roc_auc')}"
)

Label model f1 score: 0.4199134199134199
Label model roc-auc: 0.7421454246069199


### Part 4: Training our End Extraction Model

In this final section of the tutorial, we'll use our noisy training labels alongside the development set labels to train our end machine learning model. We start by filtering out training examples which did not recieve a label from any LF, as these examples contain no signal. Then we concatenate them with dev set examples.


In [19]:
from snorkel.analysis.utils import preds_to_probs
from snorkel.labeling.utils import filter_unlabeled_dataframe

# Change dev labels 1D array to 2D probabilities array as required for training end model.
Y_probs_dev = preds_to_probs(Y_dev, 2)

Y_probs_train = label_model.predict_proba(train_L)
df_train_filtered, Y_probs_train_filtered = filter_unlabeled_dataframe(
    X=df_train, y=Y_probs_train, L=train_L
)

df_combined = pd.concat([df_dev, df_train_filtered])
Y_probs_combined = np.concatenate([Y_probs_dev, Y_probs_train_filtered], 0)

Next, we train a simple [LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory) network for classifying candidates. `tf_model` contains functions for processing features and building the keras model for training and evaluation.

In [20]:
from tf_model import get_model, get_feature_arrays

model = get_model()
tokens, idx1, idx2 = get_feature_arrays(df_combined)

batch_size = 64
num_epochs = 20  # TODO: Change this to ~50. Warning: Training takes several minutes!
model.fit(
    (tokens, idx1, idx2), Y_probs_combined, batch_size=batch_size, epochs=num_epochs
)

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
W0807 19:24:32.689306 140714815465280 deprecation.py:506] From /home/ubuntu/snorkel-tutorials/.tox/spouse/lib/python3.6/site-packages/tensorflow/python/keras/initializers.py:119: calling RandomUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


W0807 19:24:32.994455 140714815465280 deprecation.py:506] From /home/ubuntu/snorkel-tutorials/.tox/spouse/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


W0807 19:24:33.246058 140714815465280 deprecation.py:323] From /home/ubuntu/snorkel-tutorials/.tox/spouse/lib/python3.6/site-packages/tensorflow/python/keras/backend.py:3794: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


W0807 19:24:33.282091 140714815465280 deprecation_wrapper.py:119] From /home/ubuntu/snorkel-tutorials/spouse/tf_model.py:56: The name tf.train.AdagradOptimizer is deprecated. Please use tf.compat.v1.train.AdagradOptimizer instead.



W0807 19:24:34.100907 140714815465280 deprecation.py:506] From /home/ubuntu/snorkel-tutorials/.tox/spouse/lib/python3.6/site-packages/tensorflow/python/training/adagrad.py:76: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Epoch 1/20


  64/8545 [..............................] - ETA: 1:29 - loss: 0.6955

 192/8545 [..............................] - ETA: 32s - loss: 0.6625 

 320/8545 [>.............................] - ETA: 21s - loss: 0.6448

 448/8545 [>.............................] - ETA: 15s - loss: 0.6443

 576/8545 [=>............................] - ETA: 13s - loss: 0.6380

 704/8545 [=>............................] - ETA: 11s - loss: 0.6314

 832/8545 [=>............................] - ETA: 10s - loss: 0.6234

 960/8545 [==>...........................] - ETA: 9s - loss: 0.6233 

1088/8545 [==>...........................] - ETA: 8s - loss: 0.6230

1216/8545 [===>..........................] - ETA: 7s - loss: 0.6154

1344/8545 [===>..........................] - ETA: 7s - loss: 0.6079

1472/8545 [====>.........................] - ETA: 6s - loss: 0.6076

1600/8545 [====>.........................] - ETA: 6s - loss: 0.6054

1728/8545 [=====>........................] - ETA: 6s - loss: 0.6047

1856/8545 [=====>........................] - ETA: 5s - loss: 0.6022

1984/8545 [=====>........................] - ETA: 5s - loss: 0.6006









































































































Epoch 2/20


  64/8545 [..............................] - ETA: 4s - loss: 0.5859

 192/8545 [..............................] - ETA: 4s - loss: 0.6195

 320/8545 [>.............................] - ETA: 4s - loss: 0.6006

 448/8545 [>.............................] - ETA: 4s - loss: 0.5864

 576/8545 [=>............................] - ETA: 4s - loss: 0.5883

 704/8545 [=>............................] - ETA: 4s - loss: 0.5876

 832/8545 [=>............................] - ETA: 4s - loss: 0.5892

 960/8545 [==>...........................] - ETA: 4s - loss: 0.5873

1088/8545 [==>...........................] - ETA: 3s - loss: 0.5878

1216/8545 [===>..........................] - ETA: 3s - loss: 0.5907

1344/8545 [===>..........................] - ETA: 3s - loss: 0.5889

1472/8545 [====>.........................] - ETA: 3s - loss: 0.5882

1600/8545 [====>.........................] - ETA: 3s - loss: 0.5889

1728/8545 [=====>........................] - ETA: 3s - loss: 0.5863

1856/8545 [=====>........................] - ETA: 3s - loss: 0.5845

1984/8545 [=====>........................] - ETA: 3s - loss: 0.5867









































































































Epoch 3/20


  64/8545 [..............................] - ETA: 4s - loss: 0.5779

 192/8545 [..............................] - ETA: 4s - loss: 0.5437

 320/8545 [>.............................] - ETA: 4s - loss: 0.5703

 448/8545 [>.............................] - ETA: 4s - loss: 0.5699

 576/8545 [=>............................] - ETA: 4s - loss: 0.5872

 704/8545 [=>............................] - ETA: 4s - loss: 0.5820

 832/8545 [=>............................] - ETA: 4s - loss: 0.5866

 960/8545 [==>...........................] - ETA: 4s - loss: 0.5818

1088/8545 [==>...........................] - ETA: 4s - loss: 0.5820

1216/8545 [===>..........................] - ETA: 3s - loss: 0.5822

1344/8545 [===>..........................] - ETA: 3s - loss: 0.5865

1472/8545 [====>.........................] - ETA: 3s - loss: 0.5851

1600/8545 [====>.........................] - ETA: 3s - loss: 0.5856

1728/8545 [=====>........................] - ETA: 3s - loss: 0.5856

1856/8545 [=====>........................] - ETA: 3s - loss: 0.5853

1984/8545 [=====>........................] - ETA: 3s - loss: 0.5847









































































































Epoch 4/20


  64/8545 [..............................] - ETA: 4s - loss: 0.5634

 192/8545 [..............................] - ETA: 4s - loss: 0.5549

 320/8545 [>.............................] - ETA: 4s - loss: 0.5769

 448/8545 [>.............................] - ETA: 4s - loss: 0.5672

 576/8545 [=>............................] - ETA: 4s - loss: 0.5631

 704/8545 [=>............................] - ETA: 4s - loss: 0.5749

 832/8545 [=>............................] - ETA: 4s - loss: 0.5768

 960/8545 [==>...........................] - ETA: 4s - loss: 0.5805

1088/8545 [==>...........................] - ETA: 3s - loss: 0.5767

1216/8545 [===>..........................] - ETA: 3s - loss: 0.5781

1344/8545 [===>..........................] - ETA: 3s - loss: 0.5756

1472/8545 [====>.........................] - ETA: 3s - loss: 0.5792

1600/8545 [====>.........................] - ETA: 3s - loss: 0.5797

1728/8545 [=====>........................] - ETA: 3s - loss: 0.5777

1856/8545 [=====>........................] - ETA: 3s - loss: 0.5787

1984/8545 [=====>........................] - ETA: 3s - loss: 0.5788









































































































Epoch 5/20


  64/8545 [..............................] - ETA: 4s - loss: 0.5388

 192/8545 [..............................] - ETA: 4s - loss: 0.5717

 320/8545 [>.............................] - ETA: 4s - loss: 0.5741

 448/8545 [>.............................] - ETA: 4s - loss: 0.5754

 576/8545 [=>............................] - ETA: 4s - loss: 0.5647

 704/8545 [=>............................] - ETA: 4s - loss: 0.5671

 832/8545 [=>............................] - ETA: 4s - loss: 0.5715

 960/8545 [==>...........................] - ETA: 4s - loss: 0.5706

1088/8545 [==>...........................] - ETA: 4s - loss: 0.5727

1216/8545 [===>..........................] - ETA: 3s - loss: 0.5766

1344/8545 [===>..........................] - ETA: 3s - loss: 0.5766

1472/8545 [====>.........................] - ETA: 3s - loss: 0.5772

1600/8545 [====>.........................] - ETA: 3s - loss: 0.5814

1728/8545 [=====>........................] - ETA: 3s - loss: 0.5793

1856/8545 [=====>........................] - ETA: 3s - loss: 0.5808

1984/8545 [=====>........................] - ETA: 3s - loss: 0.5823









































































































Epoch 6/20


  64/8545 [..............................] - ETA: 4s - loss: 0.6507

 192/8545 [..............................] - ETA: 4s - loss: 0.6178

 320/8545 [>.............................] - ETA: 4s - loss: 0.6009

 448/8545 [>.............................] - ETA: 4s - loss: 0.5834

 576/8545 [=>............................] - ETA: 4s - loss: 0.5937

 704/8545 [=>............................] - ETA: 4s - loss: 0.5856

 832/8545 [=>............................] - ETA: 4s - loss: 0.5919

 960/8545 [==>...........................] - ETA: 4s - loss: 0.5866

1088/8545 [==>...........................] - ETA: 4s - loss: 0.5867

1216/8545 [===>..........................] - ETA: 3s - loss: 0.5866

1344/8545 [===>..........................] - ETA: 3s - loss: 0.5867

1472/8545 [====>.........................] - ETA: 3s - loss: 0.5852

1600/8545 [====>.........................] - ETA: 3s - loss: 0.5867

1728/8545 [=====>........................] - ETA: 3s - loss: 0.5876

1856/8545 [=====>........................] - ETA: 3s - loss: 0.5884

1984/8545 [=====>........................] - ETA: 3s - loss: 0.5861









































































































Epoch 7/20


  64/8545 [..............................] - ETA: 4s - loss: 0.5525

 192/8545 [..............................] - ETA: 4s - loss: 0.5672

 320/8545 [>.............................] - ETA: 4s - loss: 0.5879

 448/8545 [>.............................] - ETA: 4s - loss: 0.5872

 576/8545 [=>............................] - ETA: 4s - loss: 0.5825

 704/8545 [=>............................] - ETA: 4s - loss: 0.5867

 832/8545 [=>............................] - ETA: 4s - loss: 0.5834

 960/8545 [==>...........................] - ETA: 4s - loss: 0.5850

1088/8545 [==>...........................] - ETA: 4s - loss: 0.5868

1216/8545 [===>..........................] - ETA: 3s - loss: 0.5816

1344/8545 [===>..........................] - ETA: 3s - loss: 0.5841

1472/8545 [====>.........................] - ETA: 3s - loss: 0.5841

1600/8545 [====>.........................] - ETA: 3s - loss: 0.5896

1728/8545 [=====>........................] - ETA: 3s - loss: 0.5897

1856/8545 [=====>........................] - ETA: 3s - loss: 0.5922

1984/8545 [=====>........................] - ETA: 3s - loss: 0.5897









































































































Epoch 8/20


  64/8545 [..............................] - ETA: 4s - loss: 0.5179

 192/8545 [..............................] - ETA: 4s - loss: 0.5580

 320/8545 [>.............................] - ETA: 4s - loss: 0.5696

 448/8545 [>.............................] - ETA: 4s - loss: 0.5729

 576/8545 [=>............................] - ETA: 4s - loss: 0.5694

 704/8545 [=>............................] - ETA: 4s - loss: 0.5717

 832/8545 [=>............................] - ETA: 4s - loss: 0.5756

 960/8545 [==>...........................] - ETA: 4s - loss: 0.5827

1088/8545 [==>...........................] - ETA: 4s - loss: 0.5815

1216/8545 [===>..........................] - ETA: 3s - loss: 0.5803

1344/8545 [===>..........................] - ETA: 3s - loss: 0.5839

1472/8545 [====>.........................] - ETA: 3s - loss: 0.5808

1600/8545 [====>.........................] - ETA: 3s - loss: 0.5811

1728/8545 [=====>........................] - ETA: 3s - loss: 0.5818

1856/8545 [=====>........................] - ETA: 3s - loss: 0.5842

1984/8545 [=====>........................] - ETA: 3s - loss: 0.5854









































































































Epoch 9/20


  64/8545 [..............................] - ETA: 4s - loss: 0.6432

 192/8545 [..............................] - ETA: 4s - loss: 0.5965

 320/8545 [>.............................] - ETA: 4s - loss: 0.5631

 448/8545 [>.............................] - ETA: 4s - loss: 0.5542

 576/8545 [=>............................] - ETA: 4s - loss: 0.5700

 704/8545 [=>............................] - ETA: 4s - loss: 0.5758

 832/8545 [=>............................] - ETA: 4s - loss: 0.5771

 960/8545 [==>...........................] - ETA: 4s - loss: 0.5765

1088/8545 [==>...........................] - ETA: 3s - loss: 0.5746

1216/8545 [===>..........................] - ETA: 3s - loss: 0.5772

1344/8545 [===>..........................] - ETA: 3s - loss: 0.5728

1472/8545 [====>.........................] - ETA: 3s - loss: 0.5702

1600/8545 [====>.........................] - ETA: 3s - loss: 0.5736

1728/8545 [=====>........................] - ETA: 3s - loss: 0.5718

1856/8545 [=====>........................] - ETA: 3s - loss: 0.5696

1984/8545 [=====>........................] - ETA: 3s - loss: 0.5704









































































































Epoch 10/20


  64/8545 [..............................] - ETA: 4s - loss: 0.4909

 192/8545 [..............................] - ETA: 4s - loss: 0.5158

 320/8545 [>.............................] - ETA: 4s - loss: 0.5341

 448/8545 [>.............................] - ETA: 4s - loss: 0.5565

 576/8545 [=>............................] - ETA: 4s - loss: 0.5558

 704/8545 [=>............................] - ETA: 4s - loss: 0.5409

 832/8545 [=>............................] - ETA: 4s - loss: 0.5397

 960/8545 [==>...........................] - ETA: 4s - loss: 0.5529

1088/8545 [==>...........................] - ETA: 4s - loss: 0.5534

1216/8545 [===>..........................] - ETA: 3s - loss: 0.5492

1344/8545 [===>..........................] - ETA: 3s - loss: 0.5485

1472/8545 [====>.........................] - ETA: 3s - loss: 0.5467

1600/8545 [====>.........................] - ETA: 3s - loss: 0.5457

1728/8545 [=====>........................] - ETA: 3s - loss: 0.5477

1856/8545 [=====>........................] - ETA: 3s - loss: 0.5462

1984/8545 [=====>........................] - ETA: 3s - loss: 0.5437









































































































Epoch 11/20


  64/8545 [..............................] - ETA: 4s - loss: 0.4469

 192/8545 [..............................] - ETA: 4s - loss: 0.4365

 320/8545 [>.............................] - ETA: 4s - loss: 0.4587

 448/8545 [>.............................] - ETA: 4s - loss: 0.4455

 576/8545 [=>............................] - ETA: 4s - loss: 0.4442

 704/8545 [=>............................] - ETA: 4s - loss: 0.4615

 832/8545 [=>............................] - ETA: 4s - loss: 0.4623

 960/8545 [==>...........................] - ETA: 4s - loss: 0.4691

1088/8545 [==>...........................] - ETA: 3s - loss: 0.4684

1216/8545 [===>..........................] - ETA: 3s - loss: 0.4630

1344/8545 [===>..........................] - ETA: 3s - loss: 0.4631

1472/8545 [====>.........................] - ETA: 3s - loss: 0.4616

1600/8545 [====>.........................] - ETA: 3s - loss: 0.4624

1728/8545 [=====>........................] - ETA: 3s - loss: 0.4606

1856/8545 [=====>........................] - ETA: 3s - loss: 0.4624

1984/8545 [=====>........................] - ETA: 3s - loss: 0.4584









































































































Epoch 12/20


  64/8545 [..............................] - ETA: 4s - loss: 0.4925

 192/8545 [..............................] - ETA: 4s - loss: 0.4497

 320/8545 [>.............................] - ETA: 4s - loss: 0.4507

 448/8545 [>.............................] - ETA: 4s - loss: 0.4505

 576/8545 [=>............................] - ETA: 4s - loss: 0.4400

 704/8545 [=>............................] - ETA: 4s - loss: 0.4414

 832/8545 [=>............................] - ETA: 4s - loss: 0.4396

 960/8545 [==>...........................] - ETA: 4s - loss: 0.4410

1088/8545 [==>...........................] - ETA: 3s - loss: 0.4362

1216/8545 [===>..........................] - ETA: 3s - loss: 0.4335

1344/8545 [===>..........................] - ETA: 3s - loss: 0.4340

1472/8545 [====>.........................] - ETA: 3s - loss: 0.4311

1600/8545 [====>.........................] - ETA: 3s - loss: 0.4319

1728/8545 [=====>........................] - ETA: 3s - loss: 0.4304

1856/8545 [=====>........................] - ETA: 3s - loss: 0.4311

1984/8545 [=====>........................] - ETA: 3s - loss: 0.4315









































































































Epoch 13/20


  64/8545 [..............................] - ETA: 4s - loss: 0.4879

 192/8545 [..............................] - ETA: 4s - loss: 0.4137

 320/8545 [>.............................] - ETA: 4s - loss: 0.4067

 448/8545 [>.............................] - ETA: 4s - loss: 0.4287

 576/8545 [=>............................] - ETA: 4s - loss: 0.4280

 704/8545 [=>............................] - ETA: 4s - loss: 0.4341

 832/8545 [=>............................] - ETA: 4s - loss: 0.4345

 960/8545 [==>...........................] - ETA: 4s - loss: 0.4354

1088/8545 [==>...........................] - ETA: 3s - loss: 0.4413

1216/8545 [===>..........................] - ETA: 3s - loss: 0.4406

1344/8545 [===>..........................] - ETA: 3s - loss: 0.4401

1472/8545 [====>.........................] - ETA: 3s - loss: 0.4365

1600/8545 [====>.........................] - ETA: 3s - loss: 0.4351

1728/8545 [=====>........................] - ETA: 3s - loss: 0.4313

1856/8545 [=====>........................] - ETA: 3s - loss: 0.4335

1984/8545 [=====>........................] - ETA: 3s - loss: 0.4438









































































































Epoch 14/20


  64/8545 [..............................] - ETA: 4s - loss: 0.4521

 192/8545 [..............................] - ETA: 4s - loss: 0.4167

 320/8545 [>.............................] - ETA: 4s - loss: 0.4218

 448/8545 [>.............................] - ETA: 4s - loss: 0.4165

 576/8545 [=>............................] - ETA: 4s - loss: 0.4221

 704/8545 [=>............................] - ETA: 4s - loss: 0.4195

 832/8545 [=>............................] - ETA: 4s - loss: 0.4160

 960/8545 [==>...........................] - ETA: 4s - loss: 0.4148

1088/8545 [==>...........................] - ETA: 3s - loss: 0.4171

1216/8545 [===>..........................] - ETA: 3s - loss: 0.4183

1344/8545 [===>..........................] - ETA: 3s - loss: 0.4157

1472/8545 [====>.........................] - ETA: 3s - loss: 0.4124

1600/8545 [====>.........................] - ETA: 3s - loss: 0.4110

1728/8545 [=====>........................] - ETA: 3s - loss: 0.4124

1856/8545 [=====>........................] - ETA: 3s - loss: 0.4122

1984/8545 [=====>........................] - ETA: 3s - loss: 0.4130









































































































Epoch 15/20


  64/8545 [..............................] - ETA: 4s - loss: 0.3824

 192/8545 [..............................] - ETA: 4s - loss: 0.3963

 320/8545 [>.............................] - ETA: 4s - loss: 0.4032

 448/8545 [>.............................] - ETA: 4s - loss: 0.3996

 576/8545 [=>............................] - ETA: 4s - loss: 0.4016

 704/8545 [=>............................] - ETA: 4s - loss: 0.4048

 832/8545 [=>............................] - ETA: 4s - loss: 0.4077

 960/8545 [==>...........................] - ETA: 4s - loss: 0.4072

1088/8545 [==>...........................] - ETA: 3s - loss: 0.4133

1216/8545 [===>..........................] - ETA: 3s - loss: 0.4136

1344/8545 [===>..........................] - ETA: 3s - loss: 0.4107

1472/8545 [====>.........................] - ETA: 3s - loss: 0.4087

1600/8545 [====>.........................] - ETA: 3s - loss: 0.4033

1728/8545 [=====>........................] - ETA: 3s - loss: 0.4038

1856/8545 [=====>........................] - ETA: 3s - loss: 0.4042

1984/8545 [=====>........................] - ETA: 3s - loss: 0.4048









































































































Epoch 16/20


  64/8545 [..............................] - ETA: 4s - loss: 0.4057

 192/8545 [..............................] - ETA: 4s - loss: 0.4230

 320/8545 [>.............................] - ETA: 4s - loss: 0.3987

 448/8545 [>.............................] - ETA: 4s - loss: 0.3977

 576/8545 [=>............................] - ETA: 4s - loss: 0.3942

 704/8545 [=>............................] - ETA: 4s - loss: 0.3973

 832/8545 [=>............................] - ETA: 4s - loss: 0.3922

 960/8545 [==>...........................] - ETA: 4s - loss: 0.3943

1088/8545 [==>...........................] - ETA: 4s - loss: 0.3928

1216/8545 [===>..........................] - ETA: 3s - loss: 0.4002

1344/8545 [===>..........................] - ETA: 3s - loss: 0.3995

1472/8545 [====>.........................] - ETA: 3s - loss: 0.4035

1600/8545 [====>.........................] - ETA: 3s - loss: 0.4047

1728/8545 [=====>........................] - ETA: 3s - loss: 0.4076

1856/8545 [=====>........................] - ETA: 3s - loss: 0.4098

1984/8545 [=====>........................] - ETA: 3s - loss: 0.4096









































































































Epoch 17/20


  64/8545 [..............................] - ETA: 4s - loss: 0.4238

 192/8545 [..............................] - ETA: 4s - loss: 0.4339

 320/8545 [>.............................] - ETA: 4s - loss: 0.4172

 448/8545 [>.............................] - ETA: 4s - loss: 0.3975

 576/8545 [=>............................] - ETA: 4s - loss: 0.3867

 704/8545 [=>............................] - ETA: 4s - loss: 0.3970

 832/8545 [=>............................] - ETA: 4s - loss: 0.3992

 960/8545 [==>...........................] - ETA: 4s - loss: 0.4006

1088/8545 [==>...........................] - ETA: 4s - loss: 0.3984

1216/8545 [===>..........................] - ETA: 3s - loss: 0.3970

1344/8545 [===>..........................] - ETA: 3s - loss: 0.3954

1472/8545 [====>.........................] - ETA: 3s - loss: 0.3985

1600/8545 [====>.........................] - ETA: 3s - loss: 0.3987

1728/8545 [=====>........................] - ETA: 3s - loss: 0.3973

1856/8545 [=====>........................] - ETA: 3s - loss: 0.3990

1984/8545 [=====>........................] - ETA: 3s - loss: 0.3987









































































































Epoch 18/20


  64/8545 [..............................] - ETA: 4s - loss: 0.3906

 192/8545 [..............................] - ETA: 4s - loss: 0.4041

 320/8545 [>.............................] - ETA: 4s - loss: 0.4057

 448/8545 [>.............................] - ETA: 4s - loss: 0.3982

 576/8545 [=>............................] - ETA: 4s - loss: 0.3985

 704/8545 [=>............................] - ETA: 4s - loss: 0.3958

 832/8545 [=>............................] - ETA: 4s - loss: 0.3951

 960/8545 [==>...........................] - ETA: 4s - loss: 0.3893

1088/8545 [==>...........................] - ETA: 4s - loss: 0.3890

1216/8545 [===>..........................] - ETA: 3s - loss: 0.3856

1344/8545 [===>..........................] - ETA: 3s - loss: 0.3854

1472/8545 [====>.........................] - ETA: 3s - loss: 0.3865

1600/8545 [====>.........................] - ETA: 3s - loss: 0.3867

1728/8545 [=====>........................] - ETA: 3s - loss: 0.3880

1856/8545 [=====>........................] - ETA: 3s - loss: 0.3877

1984/8545 [=====>........................] - ETA: 3s - loss: 0.3895









































































































Epoch 19/20


  64/8545 [..............................] - ETA: 4s - loss: 0.3594

 192/8545 [..............................] - ETA: 4s - loss: 0.3823

 320/8545 [>.............................] - ETA: 4s - loss: 0.3933

 448/8545 [>.............................] - ETA: 4s - loss: 0.3875

 576/8545 [=>............................] - ETA: 4s - loss: 0.3878

 704/8545 [=>............................] - ETA: 4s - loss: 0.3830

 832/8545 [=>............................] - ETA: 4s - loss: 0.3843

 960/8545 [==>...........................] - ETA: 4s - loss: 0.3811

1088/8545 [==>...........................] - ETA: 3s - loss: 0.3797

1216/8545 [===>..........................] - ETA: 3s - loss: 0.3851

1344/8545 [===>..........................] - ETA: 3s - loss: 0.3874

1472/8545 [====>.........................] - ETA: 3s - loss: 0.3864

1600/8545 [====>.........................] - ETA: 3s - loss: 0.3842

1728/8545 [=====>........................] - ETA: 3s - loss: 0.3846

1856/8545 [=====>........................] - ETA: 3s - loss: 0.3839

1984/8545 [=====>........................] - ETA: 3s - loss: 0.3832









































































































Epoch 20/20


  64/8545 [..............................] - ETA: 4s - loss: 0.4432

 192/8545 [..............................] - ETA: 4s - loss: 0.4275

 320/8545 [>.............................] - ETA: 4s - loss: 0.3919

 448/8545 [>.............................] - ETA: 4s - loss: 0.3956

 576/8545 [=>............................] - ETA: 4s - loss: 0.3950

 704/8545 [=>............................] - ETA: 4s - loss: 0.3841

 832/8545 [=>............................] - ETA: 4s - loss: 0.3867

 960/8545 [==>...........................] - ETA: 4s - loss: 0.3911

1088/8545 [==>...........................] - ETA: 3s - loss: 0.3892

1216/8545 [===>..........................] - ETA: 3s - loss: 0.3866

1344/8545 [===>..........................] - ETA: 3s - loss: 0.3835

1472/8545 [====>.........................] - ETA: 3s - loss: 0.3867

1600/8545 [====>.........................] - ETA: 3s - loss: 0.3877

1728/8545 [=====>........................] - ETA: 3s - loss: 0.3902

1856/8545 [=====>........................] - ETA: 3s - loss: 0.3879

1984/8545 [=====>........................] - ETA: 3s - loss: 0.3886









































































































<tensorflow.python.keras.callbacks.History at 0x7ff9cc2b0f98>

Finally, we evaluate the trained model by measuring its F1 score and ROC_AUC.

In [21]:
test_tokens, test_idx1, test_idx2 = get_feature_arrays(df_test)
probs = model.predict((test_tokens, test_idx1, test_idx2))
preds = probs_to_preds(probs)
print(
    f"Test F1 when trained with soft labels: {metric_score(Y_test, preds=preds, metric='f1')}"
)
print(
    f"Test ROC-AUC when trained with soft labels: {metric_score(Y_test, probs=probs, metric='roc_auc')}"
)

Test F1 when trained with soft labels: 0.4031936127744511
Test ROC-AUC when trained with soft labels: 0.7879045398618865


## Summary
In this tutorial, we showed how Snorkel can be used for Information Extraction. We demonstrated how to create LFs that leverage keywords and external knowledge bases (distant supervision). Finally, we showed how a model trained using the probabilistic outputs of the Label Model can achieve comparable performance while generalizing to all examples.