# `diagNNose` demo

This notebook contains various example scripts for running experiments with the `diagNNose` library.

The documentation of the library can be found here: https://diagnnose.readthedocs.io/


-----------
The library is available on `pip` and can be installed as follows:


In [None]:
!pip install diagnnose

Collecting diagnnose
[?25l  Downloading https://files.pythonhosted.org/packages/b6/de/4e6697dbf6d59cb0a115c3c54ee4764cbf95501441768b0b95e421ea7dcc/diagNNose-1.1-py3-none-any.whl (76kB)
[K     |████████████████████████████████| 81kB 3.4MB/s 
[?25hCollecting transformers>=4.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/ed/d5/f4157a376b8a79489a76ce6cfe147f4f3be1e029b7144fa7b8432e8acb26/transformers-4.4.2-py3-none-any.whl (2.0MB)
[K     |████████████████████████████████| 2.0MB 6.2MB/s 
Collecting torchtext==0.6.0
[?25l  Downloading https://files.pythonhosted.org/packages/f2/17/e7c588245aece7aa93f360894179374830daf60d7ed0bbb59332de3b3b61/torchtext-0.6.0-py3-none-any.whl (64kB)
[K     |████████████████████████████████| 71kB 6.9MB/s 
[?25hCollecting skorch>=0.7.0
[?25l  Downloading https://files.pythonhosted.org/packages/18/c7/2f6434f9360c91a4bf14ae85f634758e5dacd3539cca4266a60be9f881ae/skorch-0.9.0-py3-none-any.whl (125kB)
[K     |████████████████████████████████| 

## Corpus Sample

Let's define a very simple corpus that we will use in this demo. It contains 2 columns: one for a sentence, and one that depicts the number of the subject.

In [None]:
corpus_header = ["sentence", "subject_number"]
corpus_list = [
    ("The cats is chasing the dog.", "plural"),
    ("The dogs are eating their food.", "plural"),
    ("The professor of logic is eating chicken.", "singular"),
    ("A moth walks into a podiatrists office.", "singular"),
    ("That rug really tied the room together.", "singular"),
]

corpus = ["\t".join(corpus_header)]
corpus.extend([
    "\t".join(line) for line in corpus_list
])

with open("corpus.tsv", "w") as f:
    f.write("\n".join(corpus))

## Activation Extraction

The activations of a model can be extracted using an `Extractor` that takes care of batching and selecting activations of interest.

Fine-grained activation selection is possible by defining a `selection_func`, that selects an activation based on the current sentence index and corpus item. As an example we will only extract activations from the 2nd token on, for sentences with a `"singular"` subject.


**Configuration** -- 
The scripts can also be ran as separate `.py` files, in which case the configuration should be provided via a separate `.json` config file. Here we pass the configuration directly as a dictionary.

In [None]:
config_dict = {
    "model": {
        "transformer_type": "distilgpt2",      # Any of the Huggingface transformer models can be set here
        "mode": "causal_lm"              # This defines the model mode, one of `causal_lm`, `masked_lm`, `sequence_classification`, `token_classification`, or `question_answering`.
    },
    "tokenizer": {
        "path": "distilgpt2",            # For Huggingface models the transformer is the same as the model_name (for LSTMs it should point towards a vocab file)
    },
    "corpus": {
        "path": "./corpus.tsv",          # Corpus location
        "header_from_first_line": True,  # We have defined the column headers on the first line
        "sen_column": "sentence",        # The column containing the sentences, that will be tokenized and processed 
    },
}

In [None]:
import warnings
warnings.filterwarnings('ignore')

from torchtext.data import Example

from diagnnose.corpus import Corpus
from diagnnose.extract import Extractor
from diagnnose.models import LanguageModel, import_model
from diagnnose.tokenizer.create import create_tokenizer  

# Import tokenizer, corpus & model
tokenizer = create_tokenizer(**config_dict["tokenizer"])
corpus = Corpus.create(tokenizer=tokenizer, **config_dict["corpus"])
model: LanguageModel = import_model(**config_dict["model"])

def selection_func(w_idx: int, item: Example) -> bool:
    return item.subject_number == "singular"

extractor = Extractor(
    model, corpus, selection_func=selection_func
)
activation_reader = extractor.extract()

for activation in activation_reader[:]:
    print(activation.shape)

Using pad_token, but it is not set yet.
  0%|          | 0/1 [00:00<?, ?batch/s]


Starting extraction of 5 sentences...


100%|██████████| 1/1 [00:00<00:00,  4.66batch/s]

Extraction finished, 26 activations have been extracted.
torch.Size([0, 50257])
torch.Size([0, 50257])
torch.Size([8, 50257])
torch.Size([10, 50257])
torch.Size([8, 50257])





The `extract` procedure returns an `ActivationReader` that allows us to access the activations that have been extracted, split per sentence. It can be seen that the first two sentences in the corpus (who were `"plural"`) do not return any activations.

In this example the activations have not been written to disk. In case this is done (by passing an `activation_dir` to `Extractor`), an `ActivationReader` can be used to retrieve the activations later on. This allows for an efficient split between extracting activations, and conducting experiments on them such as probing.

## Probing

The library also allows to train Diagnostic Classifiers (DCs, aka "probes"), based on a corpus containing sentences and labels. Training a DC can also be done on intermediate layers of a model.

In this example we will train a DC on an LSTM LM, namely that of Gulordava et al. (2018). We can download this model via the following commands.

We will train the DC on POS tags, for a small sample of Wikipedia sentences; the library provides functionality to POS tag a sentence automatically.

In [None]:
!wget https://dl.fbaipublicfiles.com/colorless-green-rnns/training-data/English/vocab.txt -P .

# LM
!curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=19Lp3AM4NEPycp_IBgoHfLc_V456pmUom" > /dev/null
!curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=19Lp3AM4NEPycp_IBgoHfLc_V456pmUom" -o ./state_dict.pt

# Corpus
!curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=1VswzlkOWcVjWgC1aiQJt-OWJQXXlfGBL" > /dev/null
!curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=1VswzlkOWcVjWgC1aiQJt-OWJQXXlfGBL" -o ./probe_corpus.txt
!head -5 probe_corpus.tsv

--2020-11-20 02:09:03--  https://dl.fbaipublicfiles.com/colorless-green-rnns/training-data/English/vocab.txt
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.74.142, 172.67.9.4, 104.22.75.142, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.74.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 411459 (402K) [text/plain]
Saving to: ‘./vocab.txt.2’


2020-11-20 02:09:04 (1.18 MB/s) - ‘./vocab.txt.2’ saved [411459/411459]

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   408    0   408    0     0   5589      0 --:--:-- --:--:-- --:--:--  5589
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  273M    0  273M    0     0   151M      0 --:--:--  0:00:01 --:--:--  182M
  % Total    % Received % Xferd

In [None]:
config_dict = {
    "model": {
        "rnn_type": "ForwardLSTM",
        "state_dict": "./state_dict.pt"
    },
    "tokenizer": {
        "path": "./vocab.txt",
    },
    "corpus": {
        "path": "./probe_corpus.txt",
        "create_pos_tags": True,
        "header": ["sen"],
        "labels_column": "pos_tags"
    },
    "probe": {
        "save_dir": "./probe_data",
        "verbose": 1
    }
}

Before training can commence we first need to set the data in place with a `DataLoader`. This class allows the evaluation set to be defined in different ways: 1) from a different corpus, 2) based on a separate `selection_func`, or 3) based on a random train/test split. In this example we opt for the latter.

Running the following cell might take a while (1~2 minutes), as over 250.000 activations need to be extracted.

In [None]:
from diagnnose.activations.selection_funcs import return_all
from diagnnose.config import create_config_dict
from diagnnose.corpus import Corpus
from diagnnose.models import LanguageModel, import_model, set_init_states
from diagnnose.probe import DataLoader, DCTrainer
from diagnnose.tokenizer import create_tokenizer


tokenizer = create_tokenizer(**config_dict["tokenizer"])
corpus = Corpus.create(tokenizer=tokenizer, **config_dict["corpus"])
model: LanguageModel = import_model(**config_dict["model"])

for ex in corpus:
    ex.pos_tags = " ".join(ex.pos_tags)

data_loader = DataLoader(
    corpus,
    model=model,
    train_test_ratio=0.9,               # 90/10 train/test split
    activation_names=[(1, "hx")],       # We train on the hidden states of the top layer
    train_selection_func=return_all,    # We train on all activations
)

dc_trainer = DCTrainer(
    data_loader,
    **config_dict["probe"],
)

results = dc_trainer.train()

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
Tagging corpus...
Loading pretrained model...
Model initialisation finished.
Extracting activations
train/test: 242755/26626

Starting fitting model on (1, 'hx')...
  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.9140[0m       [32m0.8697[0m        [35m0.5003[0m  0.7182
      2        [36m0.4013[0m       [32m0.8793[0m        [35m0.4353[0m  0.7225
      3        [36m0.3235[0m       [32m0.8843[0m        [35m0.4177[0m  0.6680
      4        [36m0.2825[0m       [32m0.8845[0m        [35m0.4135[0m  0.6666
      5        [36m0.2562[0m       0.8834        0.4149  0.6635
      6        [36m0.2374[0m       0.8843        0.4187  0.6735
      7        [36m0.2231[0m       0.8827        0.4239  0.7026

## Feature attributions

Feature attributions are create on a **model-agnostic** basis, but still based on a Shapley propagating procedure a la SHAP and Contextual Decomposition. This means that any type of model architecture can be explained by the procedure!

It is achieved by the `__torch_function__` functionality that recently has been added by `torch`. This functionality allows the behaviour of individual `torch` operations to be overwritten. `diagNNose` employs this to calculate the Shapley values of the input features to an operation, and propagate these contributions forward to the next operation.

In this demo we'll demonstrate this for RoBERTa, on a masked language modelling example, for the sentence "*The athletes above Barbara <mask>.*", and predicting "*walk*" (correct) or "*walks*" (incorrect). Interestingly enough, RoBERTa wrongly assigns a higher probability to "*walks*", making this an intruiging example to compute the input feature contributions for.

In [None]:
config_dict = {
  "model": {
    "transformer_type": "roberta-base",
    "mode": "masked_lm"
  },
  "tokenizer": {
    "path": "roberta-base"
  }
}

In [None]:
from diagnnose.attribute import ShapleyDecomposer, Explainer
from diagnnose.models import LanguageModel, import_model
from diagnnose.tokenizer import create_tokenizer


model: LanguageModel = import_model(**config_dict["model"])
tokenizer = create_tokenizer(**config_dict["tokenizer"])

decomposer = ShapleyDecomposer(model, num_samples=20)
explainer = Explainer(decomposer, tokenizer)

sens = [f"The athletes above Barbara <mask>."]
tokens = ["walk", "walks"]

full_probs, contribution_probs = explainer.explain(sens, tokens)

explainer.print_attributions(full_probs, contribution_probs, sens, tokens)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=481.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=501200538.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898823.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355863.0, style=ProgressStyle(descript…




IndexError: ignored

Here we can clearly see that the presence of "Barbara" has tricked the model into predicting the wrong form of the verb!

Note, however, that calculating Shapley values is computationally challenging (NP-hard!), and we used a polynomial, sampling-based approximation method for this. I am currently still investigating several alternatives (SHAP, DASP) that might be better at approximating Shapley values, this module is still in active development :-)