## **Environment setup to run the notebook**
We assume that this notebook will be run on google colab. In order to run this notebook, the following steps have to be performed:


1.   Please make sure your google drive has the following [file](https://drive.google.com/file/d/1QWY78l6kVKk6YE3kT21IVt6N0yNBXIBL/view?usp=sharing) and the [model](https://gitlab.lrz.de/xai-lab-ws2021/edoardo/beyond-simple-word-level-input-relevance-explanations-team-2/-/blob/master/model_weights/grammar_bilstm.p)
2.   Please create the directory '*preprocessing*' and upload [grammar_preprocessing.py](https://gitlab.lrz.de/xai-lab-ws2021/edoardo/beyond-simple-word-level-input-relevance-explanations-team-2/-/blob/master/preprocessing/grammar_preprocessing.py). Please create a directory named '*utils*' and upload the file [utils.py](https://gitlab.lrz.de/xai-lab-ws2021/edoardo/beyond-simple-word-level-input-relevance-explanations-team-2/-/blob/master/utils/util.py).

Once the above setup is done each cell can be run one by one. Please note that your google drive will be mounted to the colab instance. Once the notebook has completed running, we pickle the explainer which will be used for explanations.



In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!unzip '/content/drive/MyDrive/data.zip'

Archive:  /content/drive/MyDrive/data.zip
   creating: data/
  inflating: data/fields             
  inflating: data/train_data         
  inflating: data/vocab              
  inflating: data/test_data          
  inflating: data/label              
  inflating: data/val_data           


In [None]:
!pip install shap

Collecting shap
[?25l  Downloading https://files.pythonhosted.org/packages/44/20/54381999efe3000f70a7f68af79ba857cfa3f82278ab0e02e6ba1c06b002/shap-0.38.1.tar.gz (352kB)
[K     |█                               | 10kB 26.4MB/s eta 0:00:01[K     |█▉                              | 20kB 29.7MB/s eta 0:00:01[K     |██▉                             | 30kB 21.8MB/s eta 0:00:01[K     |███▊                            | 40kB 19.9MB/s eta 0:00:01[K     |████▋                           | 51kB 22.2MB/s eta 0:00:01[K     |█████▋                          | 61kB 19.1MB/s eta 0:00:01[K     |██████▌                         | 71kB 18.8MB/s eta 0:00:01[K     |███████▍                        | 81kB 19.2MB/s eta 0:00:01[K     |████████▍                       | 92kB 16.2MB/s eta 0:00:01[K     |█████████▎                      | 102kB 17.0MB/s eta 0:00:01[K     |██████████▎                     | 112kB 17.0MB/s eta 0:00:01[K     |███████████▏                    | 122kB 17.0MB/s eta 0:00:0

In [None]:
!pip install pytorch_lightning 

Collecting pytorch_lightning
[?25l  Downloading https://files.pythonhosted.org/packages/c2/dd/0c326da04e021a9849a1e75dd639d8c2e22d3abb296a9fc39bed518d2879/pytorch_lightning-1.1.7-py3-none-any.whl (695kB)
[K     |▌                               | 10kB 23.7MB/s eta 0:00:01[K     |█                               | 20kB 31.1MB/s eta 0:00:01[K     |█▍                              | 30kB 21.8MB/s eta 0:00:01[K     |█▉                              | 40kB 20.0MB/s eta 0:00:01[K     |██▍                             | 51kB 21.1MB/s eta 0:00:01[K     |██▉                             | 61kB 15.9MB/s eta 0:00:01[K     |███▎                            | 71kB 16.3MB/s eta 0:00:01[K     |███▊                            | 81kB 17.0MB/s eta 0:00:01[K     |████▎                           | 92kB 15.1MB/s eta 0:00:01[K     |████▊                           | 102kB 16.3MB/s eta 0:00:01[K     |█████▏                          | 112kB 16.3MB/s eta 0:00:01[K     |█████▋                  

##**Import the required libraries**


In [None]:
import shap
from preprocessing.grammar_preprocessing import custom_tokenizer,extract_phrases
import torch
import numpy as np
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')
from utils import util
import dill
import torchtext
import nltk

##**Load the model**
Here we load the already trained model. We then create the explainers on this model.

In [None]:
model = util.load_model('/content/drive/MyDrive/biLstm_85.9.p')
model.to(device)
model.eval()

LSTMModel(
  (embedding): Embedding(200002, 100, padding_idx=1)
  (lstm): LSTM(100, 7, num_layers=2, dropout=0.1319029367577066, bidirectional=True)
  (fc): Linear(in_features=14, out_features=1, bias=True)
  (dropout): Dropout(p=0.1319029367577066, inplace=False)
  (criterion): BCEWithLogitsLoss()
)

## **Load the data**
The Kernelexplainer in the shap library nees background data. Hence we load the previously saved data.

In [None]:
with open("/content/data/vocab","rb")as f:
     textField=dill.load(f)
with open("/content/data/train_data","rb")as f:
     train_data=dill.load(f)
with open("/content/data/val_data","rb")as f:
     val_data=dill.load(f)
with open("/content/data/fields","rb")as f:
     fields=dill.load(f)
train_dataloaders = torchtext.data.BucketIterator(torchtext.data.Dataset(train_data,fields),
                                                                             sort_key = lambda x: len(x.review), batch_size=64,
                                                                             device=torch.device('cuda' if torch.cuda.is_available() else 'cpu'))

**Define a function that can predict the output of the model** <br/>
We define a predict(), that can take a sentence as an input and then predict the sentiment. The same function will be used by the KernelExplainer while generating the explanations.

In [None]:
def predict(text):
  with torch.no_grad():
    if isinstance(text,str):
      tokenized = custom_tokenizer(text)
      print(tokenized)
      indexed = [textField.vocab.stoi[t] for t in tokenized]
      tensor = torch.from_numpy(np.array(indexed)).unsqueeze(1).to(device)
    else:
      text = text.T
      indexed = text
      tensor = torch.from_numpy(np.array(indexed)).to(device)

    prediction_logits = model(tensor)
    preds = -(prediction_logits)
    return preds.squeeze(1).cpu().numpy()

In [None]:
sample = next(iter(train_dataloaders))
background = sample.review
background_numpy = background.cpu().numpy()

## **Generate the explainer for the model**
Here we use the KernelExplainer to generate the explainer for our model. The explainer is then saved for future usage


In [None]:
explainer = shap.KernelExplainer(predict,background_numpy)

Using 369 background data samples could cause slower run times. Consider using shap.sample(data, K) or shap.kmeans(data, K) to summarize the background as K samples.


In [None]:
with open("/content/background_data","wb")as f:
     dill.dump(background_numpy,f)

In [None]:
with open("/content/explainer","wb")as f:
     dill.dump(explainer,f)

**Shap values for the background data is generated**

In [None]:
shap_values = explainer.shap_values(background_numpy,nsamples=150)

HBox(children=(FloatProgress(value=0.0, max=369.0), HTML(value='')))




In [None]:
with open("/content/shap_values","wb")as f:
     dill.dump(shap_values,f)