# Edge Probing Predictions Sandbox

Use this notebook as a starting point for #datascience on Edge Probing predictions. The code below (from `probing/analysis.py`) will load predictions from a run, do some pre-processing for convenience, and expose two DataFrames for analysis.

We load the data into Pandas so it's easier to filter by various fields, and to select particular columns of interest (such as `labels.khot` and `preds.proba` for computing metrics). For an introduction to Pandas, see here: https://pandas.pydata.org/pandas-docs/stable/10min.html 

In [1]:
import sys, os, re, json
import itertools
import collections
from importlib import reload
import pandas as pd
import numpy as np

In [2]:
import analysis
reload(analysis)

run_dir = "/nfs/jsalt/home/iftenney/exp/edges-20180725/elmo-full-edges-spr2/run"
preds = analysis.Predictions.from_run(run_dir, 'edges-spr2', 'test')
print("Number of examples: %d" % len(preds.example_df))
print("Number of targets:  %d" % len(preds.target_df))

Number of examples: 276
Number of targets:  582


### Top-level example info

`preds.example_df` contains information on the top-level examples. Mostly, this just stores the input text and any metadata fields that were present in the original data. This is useful if you want to link the targets back to the text, but you shouldn't need it to compute most metrics.

In [3]:
preds.example_df.head()

Unnamed: 0_level_0,idx,info.grammatical,info.sent-id,info.sent_id,info.source,info.split,preds.proba,text
idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,0,5.0,1008,1008,SPR2,test,"[[0.9560839533805847, 0.06530793756246567, 0.0...","In a timid voice , he says : &quot; If an airp..."
1,1,5.0,1009,1009,SPR2,test,"[[0.8448460102081299, 0.16005221009254456, 0.0...",&quot; Wonderful ! &quot; Winston beams .
2,2,5.0,1017,1017,SPR2,test,"[[0.9815192222595215, 0.02376113459467888, 0.0...",&quot; Our new lunar transportation system uti...
3,3,2.0,1023,1023,SPR2,test,"[[0.9837549328804016, 0.10073678940534592, 0.0...",They want to use LTS to tie into NASA &apos; s...
4,4,5.0,1024,1024,SPR2,test,"[[0.9833780527114868, 0.02780323289334774, 0.0...",&quot; We are so excited that the White House ...


### Target info and predictions

`preds.target_df` contains the per-target input fields (`span1`, `span2`, and `label`) as well as any metadata associated with individual targets. The `idx` column references a row in `example_df` that this target belongs to, if you need to recover the original text.

The loader code does some preprocessing for convenience. In particular, we add a `label.ids` column which maps the list-of-string `label` column into a list of integer ids for these targets, as well as `label.khot` which contains a K-hot encoding of these ids. 

Each entry in `label.khot` should align to the corresponding entry in `preds.proba`, which contains the model's predicted probabilities $\hat{y} \in [0,1]$ for each class. These two columns should be sufficient to compute most metrics.

In [4]:
preds.target_df.head()

Unnamed: 0,idx,info.is_pilot,info.pred_lemma,info.span1_text,info.span2_txt,label,preds.proba,span1,span2,label.ids,label.khot
0,0,False,say,says,he,"[awareness, existed_after, existed_before, exi...","[0.9560839533805847, 0.06530793756246567, 0.00...","[6, 7]","[5, 6]","[0, 6, 7, 8, 10, 15, 17, 19]","[1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, ..."
1,0,False,carry,carrying,winston peters,"[awareness, change_of_location, change_of_stat...","[0.8325857520103455, 0.8400908708572388, 0.158...","[12, 13]","[13, 15]","[0, 1, 4, 6, 7, 8, 10, 15, 17, 18, 19]","[1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, ..."
2,0,False,blow,blown,an airplane carrying winston peters,"[change_of_location, change_of_state, existed_...","[0.21247316896915436, 0.7873210310935974, 0.15...","[16, 17]","[10, 15]","[1, 3, 7, 8]","[0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, ..."
3,1,False,beam,beams,winston,"[awareness, change_of_state_continuous, existe...","[0.8448460102081299, 0.16005221009254456, 0.02...","[5, 6]","[4, 5]","[0, 4, 6, 7, 8, 10, 13, 15, 17, 18, 19]","[1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, ..."
4,2,False,tell,told,kistler,"[awareness, existed_after, existed_before, exi...","[0.9815192222595215, 0.02376113459467888, 0.01...","[30, 31]","[29, 30]","[0, 6, 7, 8, 10, 15, 17, 19]","[1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, ..."


Here's an example of looking at the predictions and labels for a single target:

In [5]:
ii = 42
y_pred = preds.target_df.loc[ii, 'preds.proba']
y_true = preds.target_df.loc[ii, 'label.khot']
print(y_pred)
print(y_true)

[0.9787541031837463, 0.35202914476394653, 0.009635817259550095, 0.16440175473690033, 0.3447839915752411, 0.002200247021391988, 0.9879361987113953, 0.9941818118095398, 0.9937254786491394, 0.01471084076911211, 0.8769131302833557, 0.0034170907456427813, 0.004444719757884741, 0.3075305223464966, 0.0028424744959920645, 0.9795645475387573, 0.0027781652752310038, 0.9310384392738342, 0.6011472344398499, 0.9610588550567627]
[1 0 0 1 1 0 1 1 1 0 1 0 0 0 0 1 0 1 1 1]


And a nicer way of looking at them:

In [6]:
_df = pd.DataFrame({'label_id': range(len(y_pred)), 
                    'y_pred': y_pred, 'y_true': y_true})
_df['label'] = [preds.vocab.get_token_from_index(i, namespace=preds.label_namespace)
                for i in _df['label_id']]
_df['y_pred.discrete'] = (_df['y_pred'] >= 0.5).map(int)
_df

Unnamed: 0,label_id,y_pred,y_true,label,y_pred.discrete
0,0,0.978754,1,awareness,1
1,1,0.352029,0,change_of_location,0
2,2,0.009636,0,change_of_possession,0
3,3,0.164402,1,change_of_state,0
4,4,0.344784,1,change_of_state_continuous,0
5,5,0.0022,0,changes_possession,0
6,6,0.987936,1,existed_after,1
7,7,0.994182,1,existed_before,1
8,8,0.993725,1,existed_during,1
9,9,0.014711,0,exists_as_physical,0
