# Babble Tutorial

### You will work with Wikipedia plot descriptions of films that are either comedy or drama.


In [None]:
from data.preparer import load_film_dataset
from babble import Explanation
from babble import BabbleStream
from babble.Candidate import Candidate 

from metal.analysis import lf_summary
from metal.analysis import label_coverage
from metal import LabelModel
from metal.tuners import RandomSearchTuner
from babble.utils import ExplanationIO
from snorkel.labeling import filter_unlabeled_dataframe

import nltk
nltk.download("punkt")

import pandas as pd
from datetime import datetime
stat_history = pd.DataFrame()

## The Data

These movie plot descriptions are from [Kaggle](https://www.kaggle.com/jrobischon/wikipedia-movie-plots).
You will be labeling films as either "comedy" or "drama" based on their plot descriptions.

If you're not sure about the correct label, that's fine -- either make your best guess or just skip the example.

In [None]:
# Unzip the data. (Don't worry about this, it should be already unzipped.)
# Replace PASSWORD with the password to unzip the data, or download it directly from Kaggle.

#!unzip -P PASSWORD data/data.zip

Load the dataset into training, validation, development, and test sets

In [None]:
df_train, df_dev, df_valid, df_test = load_film_dataset()
print("{} training examples".format(len(df_train)))
print("{} development examples".format(len(df_dev)))
print("{} validation examples".format(len(df_valid)))
print("{} test examples".format(len(df_test)))

Define the labels for this task.

In [None]:
ABSTAIN = 0
DRAMA = 1
COMEDY = 2

Transform the data into a format compatible with Babble Labble:

In [None]:
# this is a helper class to transform our data into a format Babble can parse

dfs = [df_train, df_dev, df_test]

for df in dfs:
    df["id"] = range(len(df))

Cs = [df.apply(lambda x: Candidate(x), axis=1) for df in dfs]

# babble labble uses 1 and 2 for labels, while our data uses 0 and 1
# add 1 to convert
Ys = [df.label.values + 1 for df in dfs]
Ys[0] -= 1 # no label (training set) should be set to -1

Load the data into a *BabbleStream*: an object that iteratively displays candidates, collects and parses explanations.

In [None]:
# aliases are a way to refer to a set of words in a rule.
aliases = {"couples": ["girlfriend", "boyfriend", "wife", "husband"]}
babbler = BabbleStream(Cs, Ys, balanced=True, shuffled=True, seed=456, aliases=aliases)

In [None]:
def prettyprint(candidate):
    # just a helper function to print the candidate nicely
    print(candidate.text)
    

Let's look at an example candidate!

In [None]:
candidate = babbler.next()
prettyprint(candidate)

Next, we'll learn how to write a labelling function from a natural language explanation of why you chose a label for a given candidate.

## Create Explanations

Creating explanations generally happens in five steps:
1. View candidates
2. Write explanations
3. Get feedback
4. Update explanations 
5. Apply label aggregator

Steps 3-5 are optional; explanations may be submitted without any feedback on their quality. However, in our experience, observing how well explanations are being parsed and what their accuracy/coverage on a dev set are (if available) can quickly lead to simple improvements that yield significantly more useful labeling functions. Once a few labeling functions have been collected, you can use the label aggregator to identify candidates that are being mislabeled and write additional explanations targeting those failure modes.

__Your task for this tutorial is to write 5 labeling functions.__

Feel free to consult the internet or ask your experiment leader.

*(For the real task, you will be asked to write 10 labeling functions, as quickly and accurately as possible. You will still be allowed to use the internet in this phase, but not ask your experiment leader.)*

### Collection

Use `babbler` to show candidates

In [None]:
candidate = babbler.next()
prettyprint(candidate)

Is it a comedy or a drama? What makes you think that? (If you don't know, it's okay to make your best guess or skip an example.)

Run the three examples given below, then parse them, and analyze them.
Then, you can try editing them and writing your own functions!

In [None]:
e0 = Explanation(
    # name of this rule, for your reference
    name='bounty_hunter', 
    # label to assign
    label=DRAMA, 
    # natural language description of why you label the candidate this way
    condition='The phrase "bounty hunter" is in the text', 
)

In [None]:
e1 = Explanation(
    name = 'feelings', 
    label = DRAMA, 
    condition = 'because "have" or "had" or "has" occurs between "they" and "feelings"', 
)

Below is an example of an explanation that uses an alias: "couples"

You can define more aliases where the BabbleStream is initialized.

In [None]:
e2 = Explanation(
    name = 'couple', 
    label = COMEDY, 
    condition = 'couples occur in the text', 
)

In [None]:
e3 = Explanation(
    name = "e3", 
    label = ABSTAIN, 
    condition = "", 
    # candidate is an optional argument, it should be the id of an example labeled by this rule.
    # if the rule doesn't apply to the candidate you provide, it will be filtered!
    candidate = candidate.mention_id 
)

In [None]:
e4 = Explanation(
    name = "e4", 
    label = ABSTAIN, 
    condition = "", 
    candidate = candidate.mention_id 
)

In [None]:
explanations = [e0, e1, e2, e3, e4]

Babble will parse your explanations into functions, then filter out functions that are duplicates, incorrectly label their given candidate, or assign the same label to all examples.

In [None]:
parses, filtered = babbler.apply(explanations)

### Analysis
See how your parsed explanations performed

In [None]:
babbler.analyze(parses)

See which explanations were filtered and why

In [None]:
babbler.filtered_analysis(filtered)

In [None]:
babbler.commit()

### Evaluation
Get feedback on the performance of your explanations

In [None]:
Ls = [babbler.get_label_matrix(split) for split in [0,1,2]]
lf_names = [lf.__name__ for lf in babbler.get_lfs()]
lf_summary(Ls[1], Ys[1], lf_names=lf_names)

In [None]:
search_space = {
    'n_epochs': [50, 100, 500],
    'lr': {'range': [0.01, 0.001], 'scale': 'log'},
    'show_plots': False,
}

tuner = RandomSearchTuner(LabelModel, seed=123)

label_aggregator = tuner.search(
    search_space, 
    train_args=[Ls[0]], 
    X_dev=Ls[1], Y_dev=Ys[1], 
    max_search=20, verbose=False, metric='f1')

# record statistics over time
pr, re, f1 = label_aggregator.score(Ls[1], Ys[1], metric=['precision', 'recall', 'f1'])
stats = {
    "precision": pr,
    "recall": re,
    "f1": f1,
    "time": datetime.now(),
    "training_label_coverage": label_coverage(Ls[0]),
    "training_label_size": label_coverage(Ls[0])*len(dfs[0])
}
stat_history = stat_history.append(stats, ignore_index=True)

If you'd like to save the explanations you've generated, you can use the `ExplanationIO` object to write to or read them from file.

In [None]:
stat_history.to_csv("babbler_tutorial_statistics_history.csv")
FILE = "babbler_tutorial_explanations.tsv"
exp_io = ExplanationIO()
exp_io.write(explanations, FILE)
explanations = exp_io.read(FILE)