## Testing out Statistical Differences between distributions

We test out certain statistical properties between the distributions - we do this in order to understand if there is any obvious differences in the distributions of the attention that we can leverage directly. 

In [1]:
import sys; sys.path.append("../../../../..")
import torch 
from src.experiment import AttentionExperiment, ClassificationExperiment
from src.dataset import ExperimentDataset
from src.params import Params
from src.utils.attention_utils import reduce_attention_dist, return_idx_attention_dist
from src.utils.classification_utils import run_bootstrapping
from src.utils.shared_utils import get_bias_predictions

In [2]:
%load_ext autoreload
%autoreload 2

In [6]:
params = Params.read_params("linear-params.json")
print("layers = {}".format(params.intermediary_task["attention"]["layers"]))

layers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]


In [7]:
# Loading in the dataset that we are using in this experiments 
# typically this dataset is the small set of ground-truth labels
dataset = ExperimentDataset.init_dataset(params.dataset)

03/27/2020 11:34:19 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at ./cache/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
386it [00:00, 3763.51it/s]


### Attention Experiment: 
* Is a class that wraps useful methods to extract attention distributions from a given BERT-based model 
* The user has to provide in two config files: One to specify parameters for how the attention scores should be extracted and combined, and other to specify the intermediary model from which the attention scores should be extracted from
* The user needs to instantiate the attention experiment with a function that tells the model how to run 
 inference on the given model. The function header is specified below: 
 
 ``` def initialize_attention_experiment(cls, intermediary_task_params, dataset_params, verbose=False) ```
 


In [8]:
attention_dataloader = dataset.return_dataloader(batch_size=params.intermediary_task['attention']['attention_extraction_batch_size']) 
attention_experiment = AttentionExperiment.initialize_attention_experiment(params.intermediary_task, params.dataset, verbose=True)

03/27/2020 11:34:20 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at ./cache/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
03/27/2020 11:34:20 - INFO - pytorch_pretrained_bert.modeling -   loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz from cache at ./cache/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
03/27/2020 11:34:20 - INFO - pytorch_pretrained_bert.modeling -   extracting archive file ./cache/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba to temp dir /tmp/tmpfixno4au
03/27/2020 11:34:24 - INFO - pytorch_pretrained_bert.modeling -   Model config {
  "attention_probs_d

Instantiated joint model with pretrained weights.
Succesfully loaded in attention experiment!


```extract_attention_scores()``` works out of the box because the attention experiment has the config file saved, and knows what BERT model to use/load in, which layers to extract the attention scores from, and what the inference function is that should be used on this particular BERT model.

Attention_scores is then a list of dictionaries. The keys in this dictionary are the specific layers of a BERT model and the values are the corresponding attention distributions extracted from that particular layer.

In [9]:
attention_scores = attention_experiment.extract_attention_scores(attention_dataloader)

HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))




Getting the predictions from the BERT model trained to detect bias, and using those to index into the attention scores

In [10]:
bias_predictions = get_bias_predictions(dataset, params.intermediary_task, params.dataset, batch_size=8)

03/27/2020 11:34:42 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at ./cache/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
03/27/2020 11:34:42 - INFO - pytorch_pretrained_bert.modeling -   loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz from cache at ./cache/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
03/27/2020 11:34:42 - INFO - pytorch_pretrained_bert.modeling -   extracting archive file ./cache/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba to temp dir /tmp/tmpadz0egzp
03/27/2020 11:34:46 - INFO - pytorch_pretrained_bert.modeling -   Model config {
  "attention_probs_d

HBox(children=(FloatProgress(value=0.0, max=41.0), HTML(value='')))




In [11]:
bias_indices = torch.argmax(bias_predictions == 1, dim=1).tolist()

Testing out which layers have the greatest KL divergence 

In [12]:
attention_dist = return_idx_attention_dist(attention_scores, bias_indices)

In [13]:
attention_dist_dict = {}
for attention_dict in attention_dist: 
    for key, val in attention_dict.items():
        if key not in attention_dist_dict: 
            attention_dist_dict[key] = val
        # otherwise we need to concatenate the distributions together
        else: 
            prev_val = attention_dist_dict[key]
            attention_dist_dict[key] = torch.cat((prev_val, val), dim=0)

In [14]:
labels = dataset.get_val('bias_label')

In [15]:
labels_0_indices = (labels == 0).nonzero()
labels_1_indices = labels.nonzero()

In [16]:
from scipy.spatial import distance
from src.utils.attention_utils import window_attention_dist

In [None]:
for key, attention_dist in attention_dist_dict.items():
    windowed_dist = window_attention_dist(attention_dist, bias_indices, window_size=7)
    attention_dist_0 = windowed_dist[labels_0_indices].squeeze()
    attention_dist_1 = windowed_dist[labels_1_indices].squeeze()
    
    avg_jsd = 0
    num_samples = min(attention_dist_0.shape[0], attention_dist_1.shape[0])
    for i in range(num_samples):
        avg_jsd += distance.jensenshannon(attention_dist_0[i], attention_dist_1[i])
    avg_jsd = avg_jsd/num_samples
    #print("Layer {} - JSD: {}".format(key, avg_jsd))
    print(avg_jsd)

In [19]:
from sklearn.mixture import GaussianMixture

In [73]:
for key, attention_dist in attention_dist_dict.items():
    windowed_dist = window_attention_dist(attention_dist, bias_indices, window_size=9)
    gmm = GaussianMixture(n_components=2)
    gmm = gmm.fit(windowed_dist)
    labels = gmm.predict(windowed_dist)
    gt_labels = dataset.get_val('bias_label')
    
    correct = (torch.tensor(labels) == gt_labels.to(dtype=torch.long)).float()
    #print("Layer: {} -- avg {}".format(key, torch.mean(correct)))
    print(torch.mean(correct).item())

0.6327160596847534
0.6111111044883728
0.3672839403152466
0.3827160596847534
0.48148149251937866
0.5586419701576233
0.5185185074806213
0.4722222089767456
0.48765432834625244
0.4444444477558136
0.422839492559433
0.37962964177131653


In [64]:
for key, attention_dist in attention_dist_dict.items():
    windowed_dist = window_attention_dist(attention_dist, bias_indices, window_size=1)
    attention_dist_0 = windowed_dist[labels_0_indices].squeeze()
    attention_dist_1 = windowed_dist[labels_1_indices].squeeze()
    
    print(torch.mean(attention_dist_0, dim=0) - torch.mean(attention_dist_1, dim=0))

tensor([-0.0017,  0.0124, -0.0072])
tensor([-0.0095,  0.0151,  0.0079])
tensor([-0.0045,  0.0192, -0.0001])
tensor([-0.0096,  0.0183,  0.0052])
tensor([-0.0052,  0.0101,  0.0042])
tensor([-0.0058,  0.0165,  0.0110])
tensor([-0.0025,  0.0033,  0.0103])
tensor([-0.0036,  0.0191,  0.0153])
tensor([ 0.0032, -0.0194,  0.0129])
tensor([0.0039, 0.0039, 0.0066])
tensor([ 0.0065, -0.0894,  0.0010])
tensor([ 0.0061, -0.0504, -0.0019])
