![Machine Learning Workshop: Content Insights 2020](assets/mlci_banner.jpg)

# Machine Learning Workshop: Content Insights 2020

Welcome to the workshop notebooks!  These notebooks are designed to give you a walk through the steps of creating a model, refining it with user labels, and testing it on content.  You can access the main [workshop forum page](https://INFO_SITE/forums/html/forum?id=241a0b77-7aa6-4fef-9f25-5ea351825725&ps=25), the [workshop files repo](https://INFO_SITE/communities/service/html/communityview?communityUuid=fb400868-b17c-44d8-8b63-b445d26a0be4#fullpageWidgetId=W403a0d6f86de_45aa_8b67_c52cf90fca16&folder=d8138bef-9182-4bdc-8b12-3c88158a219c), or the [symposium home page](https://software.web.DOMAIN) for additional help.

The notebooks are divided into five core components: (A) setup & data, (B) model exploration, (C) labeling, (D) active labeling, (E) and deployment.  You are currently viewing the *setup & data* workbook.

In [3]:
# constants for running the workshop; we'll repeat these in the top line of each workbook.
#   why repeat them? the backup routine only serializes .ipynb files, so others will need 
#   to be downloaded again if your compute instance restarts (a small price to pay, right?)

AGG_AVFEAT = "agg_avfeature.pkl.gz"             # custom file for merged audio and video features
CLASS_LABELS_FLAT = "assets/labels_final.json"  # provided file for label info


# Notebook D: Active Labeling Analysis

Now that we've reviewed **how** you can solicit labels and update the order of that solicitation, let's anlayze the implications of solicitation reording.  The late notebook focused on interacting with the labeling interface, so here we'll just use offline labels and simulate labeler entries.  Additionally, this notebook will focus on the training of a custom classifier instead of reusing other tags.

In this notebook, we evaluate a few critical questions.

1. Does reranking unlabeled instance (e.g. online learning) help to improve efficiency?
1. What strategies for ordering results can improve labeling efficiency?
1. What consensus measures should be taken for multiple labels?
1. Are there trends in performance curves that can point to a moment of model stability?

If you're really curious about the space, this overview paper, [A Wholistic View of Continual Learning with Deep Neural Networks:  Forgotten Lessons and the Bridge to Active and Open World Learning](https://arxiv.org/abs/2009.01797), gives a great (and dizzying) review of active learning topics.

![Machine Learning Workshop: Content Insights 2020](assets/active_overview.jpg)


## Modeling Basics
The cell below provides our basic training functions that are utilized in the notebook.  It is derived from the av-featuretraing method (classifier 3) evaluated in notebook B.

In [6]:
from sklearn import metrics
import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path

# define scorng functions
def classifier_score(df_prediction, df_labels, class_name):
    """Functiont to provide metric outputs for the evaluation of a prediction dataframe.
    
    Parameters:
        df_prediction (DataFrame): dataframe containing 'asset' and 'score' as columns
        df_labels (DataFrame): dataframe containing 'asset' and 'label' for labels
        class_name (str): class name for evaluation against labels

    Returns:
        dict of metrics (AUC, AP, precision, recall) ({"ap":X, "class":Y, ...}) and joined dataframe
    """
    metrics_obj = {"class":class_name}
    
    # clean up input labels, prune to relevant class
    df_labels = df_labels[df_labels["class"] == class_name].drop(columns=["etag", "url"]) 
    # join labels and scores by asset, nomalize score to float
    df_join = df_prediction.set_index('asset').join(df_labels.set_index('asset'), how="left").fillna(0)  # joint at asset level, 0 for nonscoring
    df_join["class"] = df_join["class"].apply(lambda x: 1 if x != 0 else 0).astype(int)
    df_join = df_join.reset_index().sort_values("score", ascending=False)

    # print(f"{class_name}: Found {len(df_join)} samples from {len(df_labels)} labels and {len(df_prediction)} scores.")

    def thresh(x):
        return 1 if x >= 0.5 else 0
    
    metrics_obj["AP"] = metrics.average_precision_score(df_join['class'], df_join['score'])
    fpr, tpr, thresholds = metrics.roc_curve(df_join['class'], df_join['score'])
    metrics_obj["AUC"] = metrics.auc(fpr, tpr)
    metrics_obj["Accuracy"] = metrics.accuracy_score(df_join['class'], df_join['score'].apply(thresh))
    metrics_obj["Recall"] = metrics.recall_score(df_join['class'], df_join['score'].apply(thresh))
    metrics_obj["F1"] = metrics.f1_score(df_join['class'], df_join['score'].apply(thresh))
    # print(f"{class_name}: {metrics_obj}")
        
    # return our computation!
    return metrics_obj, df_join

def classifier_plot(metrics_obj, df_scored):
    fpr, tpr, thresholds = metrics.roc_curve(df_scored['class'], df_scored['score'])
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 4))

    lw = 2
    ax1.plot(fpr, tpr, color='darkorange',
             lw=lw, label=f"AUC curve (area={metrics_obj['AUC']:0.2})")
    ax1.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
    ax1.set_xlim([0.0, 1.0])
    ax1.set_ylim([0.0, 1.05])
    ax1.set_xlabel('False Positive Rate')
    ax1.set_ylabel('True Positive Rate')
    ax1.legend(loc="lower right")

    precision, recall, thresholds = metrics.precision_recall_curve(df_scored['class'], df_scored['score'])
    ax2.plot(recall, precision, color='red',
             lw=lw, label=f"PR Curve (AP={metrics_obj['AP']:0.2}, F1={metrics_obj['F1']:0.2})")
    ax2.plot([1, 0], [0, 1], color='navy', lw=lw, linestyle='--')
    ax2.set_xlim([0.0, 1.0])
    ax2.set_ylim([0.0, 1.05])
    ax2.set_ylabel('Precision')
    ax2.set_xlabel('Recall')
    ax2.legend(loc="upper right")
    plt.show()
    
# read label data for use later!
df_labels = pd.read_json(CLASS_LABELS_FLAT).explode('labels').fillna('none of the above')
df_labels.rename(columns={"data":"url", "labels":"class"}, inplace=True)
df_labels["asset"] = df_labels['url'].replace(regex={r'^' + WORKSHOP_BASE + '/': ''})
print(f"Loaded a total of {len(df_labels)} labels across {len(df_labels['asset'].unique())} samples and {len(df_labels['class'].unique())} classes.")

# clear out other performance stores
df_performance = None

# load features
path_features = Path(AGG_AVFEAT)
if not path_features.exists():
    raise Exception(f"""
       Sorry, the set of aggregate features was not found.  
       Please return to notebook B to create file '{str(path_features)}'...
    """)
df_avfeature = pd.read_pickle(str(path_features))
print(f"Loaded a total of {len(df_avfeature)} samples.")


Loaded a total of 1229 labels across 779 samples and 6 classes.
Loaded a total of 1033 samples.


In [7]:
df_labels

Unnamed: 0,etag,class,url,asset
0,a039cb37f290fe4a4127bbd2,holiday,https://vmlr-workshop.STORAGE/gifts/v...,gifts/vid_gift_give_take_8-4-of-12.mp4
0,a039cb37f290fe4a4127bbd2,gift giving,https://vmlr-workshop.STORAGE/gifts/v...,gifts/vid_gift_give_take_8-4-of-12.mp4
0,a039cb37f290fe4a4127bbd2,family moments,https://vmlr-workshop.STORAGE/gifts/v...,gifts/vid_gift_give_take_8-4-of-12.mp4
1,b6dbe02fdf00f08a61a44b8f,holiday,https://vmlr-workshop.STORAGE/gifts/v...,gifts/vid_gift_give_take_28-7-of-16.mp4
1,b6dbe02fdf00f08a61a44b8f,gift giving,https://vmlr-workshop.STORAGE/gifts/v...,gifts/vid_gift_give_take_28-7-of-16.mp4
...,...,...,...,...
775,8355de7a556d5068a4e0bb67,family moments,https://vmlr-workshop.STORAGE/xmas/vi...,xmas/vid_xmas_3-40-of-79.mp4
776,628fe25f2e8c656689d70026,family moments,https://vmlr-workshop.STORAGE/gifts/v...,gifts/vid_gift_give_take_43-2-of-4.mp4
777,28a0b165ed9b04d82dd2139c,holiday,https://vmlr-workshop.STORAGE/xmas/vi...,xmas/vid_xmas_9-23-of-67.mp4
777,28a0b165ed9b04d82dd2139c,shopping scenes,https://vmlr-workshop.STORAGE/xmas/vi...,xmas/vid_xmas_9-23-of-67.mp4


In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_validate, cross_val_predict
from sklearn.calibration import CalibratedClassifierCV

def classify_av():
    cv_folds = 5
    feature = "combined"
    score_calibrate = False
    cv_jobs = -1  # -1 is auto, otherwise specific number
    dict_results = {}
    for idx in range(len(df_classes)):  # iterate classes for evaluation
        row = df_classes.iloc[idx]
        tag_match = []

        df_label_sub = df_labels[df_labels["class"]==row["class"]]  # subselect for this class
        df_feat = df_avfeature.set_index("asset").copy()   # get slice of right features
        df_feat = df_feat.join(df_label_sub.set_index("asset"), how="left").fillna(0)  # join with labels
        df_feat["class"] = df_feat["class"].apply(lambda x: 1 if x != 0 else 0).astype(int)  # blank out text
        
        model = LogisticRegression()  # basic logistic regression
        if score_calibrate:   # try to re-calibrate outputs for better threshold?
            model = CalibratedClassifierCV(model, method="sigmoid")
        probs = cross_val_predict(model, np.vstack(df_feat[feature]), 
                                  df_feat["class"], cv=cv_folds, 
                                  n_jobs=cv_jobs, method='predict_proba')
        df_feat["score"] = probs[:,1]  # grab prediction for second class
        df_feat = df_feat.reset_index().drop(columns=['class'])  # reset index, drop label

        metrics_obj, df_scored = classifier_score(df_feat[["asset", "score"]], df_labels, row['class'])
        dict_results[row['class']] = {'class':row['class'], 'method':f'avfeat', 
                                      'token':['audio', 'video'] if feature=="combined" else [feature], 
                                      "details":f"{feature}_{score_calibrate}",
                                      'metrics': metrics_obj, 'scored': df_scored}
    return dict_results

def result_retrain(run_name, modality, calibrate):
    global df_performance
    details_mode = f"{modality}_{calibrate}"
    if len(df_performance[df_performance["details"]==details_mode]) == 0:
        print("Model condition change detected, retraining classifier...")
        dict_results = classify_av(modality, calibrate)
        df_performance = result_update(df_performance, dict_results)   # save results

# first time run
dict_results = classify_av()
df_performance = result_update(df_performance, dict_results)   # save results

# use widget interaction basic
dropdown = widgets.interactive(result_retrain
    ,run_name=widgets.Dropdown(
        options=list(df_performance[df_performance['method']=="avfeat"].index),  # send run names
        value=list(df_performance[df_performance['method']=="avfeat"].index)[0],  # send run names
        description='Class Name:',
        disabled=False)
    ,modality=widgets.Dropdown(options=['combined', 'audio', 'video'], description='Modality:')
    ,calibrate=widgets.Checkbox(value=False, description='Score re-calibration')
)
output = dropdown.children[-1]  # anti-flicker trick (https://ipywidgets.readthedocs.io/en/stable/examples/Using%20Interact.html#Flickering-and-jumping-output)
# output.layout.height = '750px'  # disable this if you make your output window longer!
display(dropdown)


# Reranking to Improve Efficency

# Reranking Strategies

# Consensus Measures


# End of Active Learning Material

This is where the core technical evaluations end, congratulations -- you made it!  Armed with this knowledge, you have a few strategies to map from a new concept into a custom av-centric classifier with several evaluation metrics along the way.  

The next notebook, [notebook E](E_deployment.ipynb) *(that link may not work)* visits advanced methods that can apply and utilize models generated from these work books.
