# Babble Tutorial

### You will work with Wikipedia plot descriptions of films that are either comedy or drama.


In [1]:
import sys
sys.path.append("..")
from data.preparer import load_news_dataset
from babble import Explanation
from babble import BabbleStream
from babble.Candidate import Candidate 

from metal.analysis import lf_summary
from metal.analysis import label_coverage
from metal import LabelModel
from metal.tuners import RandomSearchTuner
from babble.utils import ExplanationIO

import pandas as pd
from datetime import datetime
from snorkel.labeling import filter_unlabeled_dataframe

stat_history = pd.DataFrame()
import nltk
nltk.download("punkt")

import pandas as pd
from datetime import datetime
stat_history = pd.DataFrame()

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/nofarcarmeli/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## The Data

These texts discuss either gun politics (1) or computer electronics (0).

If you're not sure about the correct label, that's fine -- either make your best guess or just skip the example.

In [2]:
# Unzip the data. (Don't worry about this, it should be already unzipped.)
# Replace PASSWORD with the password to unzip the data, or download it directly from Kaggle.

#!unzip -P PASSWORD data/data.zip

Load the dataset into training, validation, development, and test sets

In [3]:
df_train, df_dev, df_valid, df_test = load_news_dataset()
print("{} training examples".format(len(df_train)))
print("{} development examples".format(len(df_dev)))

577 training examples
500 development examples


  result = method(y)


Define the labels for this task.

In [4]:
ABSTAIN = -1
ELECTRONICS = 0
GUNS = 1

In [5]:
# start timer
stat_history.append({"time": datetime.now(), "num_lfs": 0}, ignore_index=True)

Unnamed: 0,num_lfs,time
0,0.0,2020-01-31 18:40:13.458286


Transform the data into a format compatible with Babble Labble:

In [6]:
# this is a helper class to transform our data into a format Babble can parse

dfs = [df_train, df_dev, df_test]

for df in dfs:
    df["id"] = range(len(df))

Cs = [df.apply(lambda x: Candidate(x), axis=1) for df in dfs]

# babble labble uses 1 and 2 for labels, while our data uses 0 and 1
# add 1 to convert
Ys = [df.label.values + 1 for df in dfs]
Ys[0] -= 1 # no label (training set) should be set to -1

Load the data into a *BabbleStream*: an object that iteratively displays candidates, collects and parses explanations.

In [13]:
# aliases are a way to refer to a set of words in a rule.
aliases = {"units": ["joules", "volts", "ohms", "MHz"]}
babbler = BabbleStream(Cs, Ys, balanced=True, shuffled=True, seed=456, aliases=aliases)

Grammar construction complete.


In [14]:
def prettyprint(candidate):
    # just a helper function to print the candidate nicely
    print(candidate.text)
    

Let's look at an example candidate!

In [12]:
candidate = babbler.next()
prettyprint(candidate)

I just installed a Motorola XC68882RC50 FPU in an Amiga A2630 board (25 MHz
68030 + 68882 with capability to clock the FPU separately).  Previously
a MC68882RC25 was installed and everything was working perfectly.  Now the
systems displays a yellow screen (indicating a exception) when it check for
the presence/type of FPU.  When I reinstall an MC68882RC25 the system works
fine, but with the XC68882 even at 25 MHz it does not work.  The designer
of the board mentioned that putting a pullup resistor on data_strobe (470 Ohm)
might help, but that didn't change anything.  Does anybody have some
suggestions what I could do?  Does this look like a CPU-FPU communications
problem or is the particular chip dead (it is a pull, not new)?
Moreover, the place I bought it from is sending me an XC68882RC33.  I thought
that the 68882RC33 were labeled MC not XC (for not finalized mask design). 
Are there any MC68882RC33?


Next, we'll learn how to write a labelling function from a natural language explanation of why you chose a label for a given candidate.

## Create Explanations

Creating explanations generally happens in five steps:
1. View candidates
2. Write explanations
3. Get feedback
4. Update explanations 
5. Apply label aggregator

Steps 3-5 are optional; explanations may be submitted without any feedback on their quality. However, in our experience, observing how well explanations are being parsed and what their accuracy/coverage on a dev set are (if available) can quickly lead to simple improvements that yield significantly more useful labeling functions. Once a few labeling functions have been collected, you can use the label aggregator to identify candidates that are being mislabeled and write additional explanations targeting those failure modes.

__Your task for this tutorial is to write 5 labeling functions.__

Feel free to consult the internet or ask your experiment leader.

*(For the real task, you will be asked to write 10 labeling functions, as quickly and accurately as possible. You will still be allowed to use the internet in this phase, but not ask your experiment leader.)*

### Collection

Use `babbler` to show candidates

In [76]:
candidate = babbler.next()
prettyprint(candidate)


Those rules/regulations/laws would be subject to the same attack:  that
they are attempting to preempt federal authority to regulate (or not)
radio communications.  Of course, as the original poster noted, court
challenges of this kind can get expensive.


Is it about guns or electronics? What makes you think that? (If you don't know, it's okay to make your best guess or skip an example.)

Run the three examples given below, then parse them, and analyze them.
Then, you can try editing them and writing your own functions!

In [28]:
e0 = Explanation(
    # name of this rule, for your reference
    name='power units', 
    # label to assign
    label=ELECTRONICS, 
    # natural language description of why you label the candidate this way
    condition='because units occur', 
)

In [29]:
e1 = Explanation(
    name = 'firearm', 
    label = GUNS, 
    condition = 'because "doesn\'t" occurs between "system" and "work"', 
)

Below is an example of an explanation that uses an alias: "couples"

You can define more aliases where the BabbleStream is initialized.

In [30]:
e2 = Explanation(
    name = 'defense', 
    label = GUNS, 
    condition = 'because "defense" occurs between "self" and "gun"', 
)

In [31]:
e3 = Explanation(
    name = "e3", 
    label = ABSTAIN, 
    condition = "", 
    # candidate is an optional argument, it should be the id of an example labeled by this rule.
    # if the rule doesn't apply to the candidate you provide, it will be filtered!
    #candidate = candidate.mention_id 
)

In [32]:
e4 = Explanation(
    name = "e4", 
    label = ABSTAIN, 
    condition = "", 
    #candidate = candidate.mention_id 
)

In [33]:
explanations = [e0, e1, e2, e3, e4]

Babble will parse your explanations into functions, then filter out functions that are duplicates, incorrectly label their given candidate, or assign the same label to all examples.

In [34]:
parses, filtered = babbler.apply(explanations)

Building list of target candidate ids...
Collected 0 unique target candidate ids from 5 explanations.
No candidate hashes were provided. Skipping linking.
2 explanation(s) out of 5 were parseable.
4 parse(s) generated from 5 explanation(s).
2 parse(s) remain (2 parse(s) removed by DuplicateSemanticsFilter).
Note: 2 LFs did not have candidates and therefore could not be filtered.
2 parse(s) remain (0 parse(s) removed by ConsistencyFilter).
Applying labeling functions to investigate labeling signature.

0 parse(s) remain (2 parse(s) removed by UniformSignatureFilter: (2 None, 0 All)).


### Analysis
See how your parsed explanations performed

In [24]:
babbler.analyze(parses)

ValueError: zero-size array to reduction operation

See which explanations were filtered and why

In [25]:
babbler.filtered_analysis(filtered)

SUMMARY
7 TOTAL:
3 Unparseable Explanation
2 Duplicate Semantics
0 Inconsistency with Example
2 Uniform Signature
0 Duplicate Signature
0 Lowest Coverage

[#1]: Unparseable Explanation

Explanation: because units occur in the text

Reason: This explanation couldn't be parsed.

Semantics: None


[#2]: Unparseable Explanation

Explanation: 

Reason: This explanation couldn't be parsed.

Semantics: None


[#3]: Unparseable Explanation

Explanation: 

Reason: This explanation couldn't be parsed.

Semantics: None


[#4]: Duplicate Semantics

Parse: return 1 if 'not'.(.eq(z) for all z in ['do','work']) else 0

Reason: This parse is identical to one produced by the following explanation:
	"because "not" occurs between "do" and "work""

Semantics: ('.root', ('.label', ('.int', 1), ('.call', ('.composite_and', ('.eq',), ('.list', ('.string', 'do'), ('.string', 'work'))), ('.string', 'not'))))


[#5]: Duplicate Semantics

Parse: return 1 if 'defense'.(.eq(z) for all z in ['self','gun']) else 0



In [26]:
babbler.commit()

### Evaluation
Get feedback on the performance of your explanations

In [None]:
Ls = [babbler.get_label_matrix(split) for split in [0,1,2]]
lf_names = [lf.__name__ for lf in babbler.get_lfs()]
lf_summary(Ls[1], Ys[1], lf_names=lf_names)

In [None]:
search_space = {
    'n_epochs': [50, 100, 500],
    'lr': {'range': [0.01, 0.001], 'scale': 'log'},
    'show_plots': False,
}

tuner = RandomSearchTuner(LabelModel, seed=123)

label_aggregator = tuner.search(
    search_space, 
    train_args=[Ls[0]], 
    X_dev=Ls[1], Y_dev=Ys[1], 
    max_search=20, verbose=False, metric='f1')

# record statistics over time
pr, re, f1 = label_aggregator.score(Ls[1], Ys[1], metric=['precision', 'recall', 'f1'])
stats = {
    "precision": pr,
    "recall": re,
    "f1": f1,
    "time": datetime.now(),
    "training_label_coverage": label_coverage(Ls[0]),
    "training_label_size": label_coverage(Ls[0])*len(dfs[0])
}
stat_history = stat_history.append(stats, ignore_index=True)

If you'd like to save the explanations you've generated, you can use the `ExplanationIO` object to write to or read them from file.

In [None]:
stat_history.to_csv("babbler_tutorial_statistics_history.csv")
FILE = "babbler_tutorial_explanations.tsv"
exp_io = ExplanationIO()
exp_io.write(explanations, FILE)
explanations = exp_io.read(FILE)

## Train Model
We can train a simple bag of words model on these labels, and see test accuracy.

(This step may take a while).

In [44]:
from analyzer import train_model

train_model(label_aggregator, df_train, df_valid, df_test, L_train)

NameError: name 'label_aggregator' is not defined