# Interpretable Explanations 

___

Let’s start with a motivating example. Consider a classifier that is able to recognize whether there is a wolf or a husky dog in the photo (see the image below). The classfier has a high precision – it is only wrong once. The model incorrectly classified the image in the lower left corner as a wolf, although in reality it is a husky dog. The remaining images were classified correctly.
<img src="images/husky_wolf1.png" alt="husky_wolf" style="float: left; margin-right: 10px;"  width="600"/>
Now the question is: __Can we reliably employ this model?__ 


Now check  the illustration below in which, we used an interpretablity technique to see which parts of the picture were informative for the classifier. The gray parts are less relevant to the decision of the classifier in each image. It turns out that our classifier, instead of identifying wolves, detects **snow**! 
<img src="images/husky_wolf2.png" alt="husky_wolf2" style="float: left; margin-right: 10px;"  width="600"/>

If not for the approach to explainability of artificial intelligence models, we would not know that decisions were made on the basis of the wrong parts of the photo.
**Do not** trust the model blindly! We need to understand how our models work and try to generate explanations.
In this notebook, we will learn about [**Local Interpretable Model-agnostic Explanations**](https://arxiv.org/abs/1602.04938) or LIME for short. 
<img src="images/lime_logo.jpg" alt="lime_logo" style="float: left; margin-right: 10px;" align="center" width="100"/>



## How does LIME work?

The easiest solution to the interpretability problem is to use the so called “interpretable” models, such as a linear model or a decision tree. Consider as an example, a perceptron. The weight associated to each feature of the input is a proxy of how important that feature is. Here, of course, we assume all features are normalized in the same range (say $[0,1]$). Now, if for say feature 5, the weight of the perceptron is 1000 while the weight of feature 32 is -0.2, you can comfortably say that feature 5 is more important for decisions than frature 32. Just recall that
\begin{align}
f(\mathbf{x}) = \mathrm{sign}\big(w_1x[1] + w_2x[2] + \cdots + w_nx[n]\big) 
\end{align}

As long as the model is accurate for the task, and uses a reasonably restricted number of parameters, such approaches provide extremely useful insights. But what about more involved and non-linear models such as Deep Neural Networks (DNNs)?

### Interpretability for black-box models

The black-box model’s complex decision function $f$ (unknown to LIME) is represented by the blue/pink background, which cannot be approximated well by a linear model. The bold red cross is the instance being explained. LIME samples instances, gets predictions using $f$, and weighs them by the proximity to the instance being explained (represented here by size). The dashed line is the learned explanation that is locally (but not globally) faithful.

<img src="images/lime.png" alt="lime_expl" style="float: left; margin-right: 10px;"  width="600"/>







## LIME package for Python

In this notebook, we will use the [LIME package](https://github.com/marcotcr/lime) for python. You can simply install the package on your machine by invoking

<pre>
pip install lime
</pre>



As mentioned above, LIME takes a model $f$ and an individual sample $x$ and generates data by perturbing $x$ (switching on/off the features of $x$). It then calculates a similarity metric between perturbed data and $x$. This helps to understand how similar perturbed  data is compared to the original sample $x$. The lime library methods provide us with options to try different similarity metrics for this purpose.

The LIME algorithm then makes predictions on the perturbed data using the black-box model $f$.
It then picks features that best describe the black-box model $f$ model's performance on perturbed data. 
The library let us provide how many features to pick up.

The library then fits a simple model (like linear or logistic regression) on the combination of perturbed data taking into account the similarity scores computed in the earlier step. The lime library lets us provide a simple model that we want to use. Generally, it is linear regression or logistic regression but we can change it.

It then uses weights derived from that simple model for each feature to explain how each feature contributed to making a prediction for that sample when predicted using an original complex model.
The lime package has three main modules which can be used with different types of datasets:

1. **lime_tabular** - used for generating explanations for structured datasets.
2. **lime_text** - used for generating explanations for text datasets.
3. **lime_image** - used for generating explanations for image datasets.

We are working with the lime_tabular and lime_image in this notebook.

## LIME for tabular data


<img src="images/Breast-Cancer-Ribbon.png" alt="ribbon_logo" style="float: left; margin-right: 10px;" align="right" width="200"/>
We will use LIME for a binary classification problem using the Wisconsin breast cancer dataset available from <a href="https://archive-beta.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+diagnostic">UCI machine learning repository</a>. The dataset contains features computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. See below for a brief  description of each sample in the dataset




**1.** ID number <br>
**2.** Diagnosis (M = malignant, B = benign) <br>
**3 - 32** Ten real-valued features are computed for each cell nucleus:

1. radius (mean of distances from center to points on the perimeter)
2. texture (standard deviation of gray-scale values)
3. perimeter
4. area
5. smoothness (local variation in radius lengths)
6. compactness (perimeter^2 / area - 1.0)
7. concavity (severity of concave portions of the contour)
8. concave points (number of concave portions of the contour)
9. symmetry
10. fractal dimension ("coastline approximation" - 1)

In [None]:
# general packages
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# torch and related packages
import torch
import torch.nn.functional as F
import torch.nn as nn


# reading lime_tabular
from lime import lime_tabular


In [None]:
# Set random seed for reproducibility.
np.random.seed(0)
torch.manual_seed(0) 


device ="cuda:0" if torch.cuda.is_available() else "cpu"

In [None]:
# read the data
wdbc_pd = pd.read_csv(os.path.join("data","breast-cancer.csv"))

# let's check the records
wdbc_pd

### Cleaning data

There are some rubbish features that we need to remove from our data. 
1. Check the data above, identify them and remove them using Pandas function [drop](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html).

2. the diagnosis column is categorical data, we need to convert that to 0 for benign and 1 for malignent cases. Pandas basic functionalities are exaplained [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html).

In [None]:
# remove the id column
wdbc_pd = wdbc_pd.drop(["id","Unnamed: 32"], axis=1)

# replace M with 1 and B with 0 for the diagnosis column
diag_map = {'M':1, 'B':0}
wdbc_pd["diagnosis"] = wdbc_pd["diagnosis"].map(diag_map)

# print the records
wdbc_pd

### Preparing the dataset

Since LIME works with numpy arrays, we convert Pandas DataFrame to a numpy array below. We also create training and test data below. 

In [None]:
# Convert features and labels to numpy arrays.
wdbc_labels = wdbc_pd["diagnosis"].to_numpy()
wdbc_pd = wdbc_pd.drop(["diagnosis"], axis=1)
feature_names = list(wdbc_pd.columns)
wdbc_data = wdbc_pd.to_numpy()

# normalizing features so the features are normalized to the range [0,1]
wdbc_data -= np.min(wdbc_data, axis=0) # removing the mean
wdbc_data /= np.max(wdbc_data, axis=0) # divide by the max 


# create training and test sets 
# randomly choose 90% of data as training set
trn_idx = np.random.choice(len(wdbc_labels), int(0.9*len(wdbc_labels)), replace=False) 
trn_data = wdbc_data[trn_idx]        # training data
trn_labels = wdbc_labels[trn_idx]    # training labels
# choose the remaining 10% as test data
tst_idx = list(set(range(len(wdbc_labels))) - set(trn_idx))
tst_data = wdbc_data[tst_idx]
tst_labels = wdbc_labels[tst_idx]

## MLP to classify breast cancer data

We will use an MLP with 1 hidden layer to classify our data. Implement an MLP and be courages with activation function (eg., use LeakyReLU).  

In [None]:
class WDBC_MLP(nn.Module):
    def __init__(self, hidden_size=32):
        super().__init__()
        self.fc1  = nn.Linear(30, hidden_size)
        self.relu = nn.LeakyReLU(inplace=True)
        self.fc2  = nn.Linear(hidden_size, 2)
        

    def forward(self, x):
        fc1_out = self.relu(self.fc1(x))
        fc2_out = self.fc2(fc1_out)
        return fc2_out

### Utility functions

1. write a function to accept a numpy array and predict the response of your MLP per each row of that array. Again, recall that LIME works with numpy array so you need to convert its data to torch.

2. write a function to evaluate your network on training and testing data. We will use this for training our network.

In [None]:
def predict_WDBC_MLP(inp_array_numpy):
    net.eval()
    inp_tensor = torch.from_numpy(inp_array_numpy).type(torch.FloatTensor).to(device)    
    logits = net(inp_tensor)
    probs = F.softmax(logits, dim=1).detach().numpy()
    return probs  


def evaluate_WDBC_MLP(trn_data,trn_labels,tst_data,tst_labels):
    out_probs = predict_WDBC_MLP(trn_data)
    out_classes_trn = np.argmax(out_probs, axis=1)
    out_probs = predict_WDBC_MLP(tst_data)
    out_classes_tst = np.argmax(out_probs, axis=1)
    trn_acc = sum(out_classes_trn == trn_labels) / len(trn_labels)
    tst_acc = sum(out_classes_tst == tst_labels) / len(tst_labels)
    return trn_acc, tst_acc




### Train MLP 

- Train your MLP for a few epochs below
- Use Adam as optimizer for efficiency 

In [None]:
net = WDBC_MLP(hidden_size=64).to(device)



criterion = nn.CrossEntropyLoss()
num_epochs = 50

optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
input_tensor = torch.from_numpy(trn_data).type(torch.FloatTensor).to(device)
label_tensor = torch.from_numpy(trn_labels).to(device)
for epoch in range(num_epochs):   
    net.train()
    output = net(input_tensor)
    loss = criterion(output, label_tensor)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    trn_acc, tst_acc = evaluate_WDBC_MLP(trn_data,trn_labels,tst_data,tst_labels)
    if epoch % 10 == 0:
        print ('Epoch {:>3}/{:>3}: Loss = {:.2f}, train accuracy = {:.2f}, test accuracy = {:.2f}'.format(epoch+1, 
                                                                                                    num_epochs,
                                                                                                    loss.item(),
                                                                                                    100*trn_acc,
                                                                                                    100*tst_acc))

torch.save(net, 'models/wdbc_model.pt')

### Define the LIME explainer

To use LIME for tabular data, we need to first define an explainer object and provide it with relervant information about our task and data. Check the cell below for this purpose. 

In [None]:
WDBC_class_names = ["benign", "malignant"]
explainer = lime_tabular.LimeTabularExplainer(trn_data, mode="classification",
                                              class_names=WDBC_class_names,
                                              feature_names=feature_names,
                                             )



### Use LIME to explain a sample

Now, we can use the method [explain_instance](https://lime-ml.readthedocs.io/en/latest/lime.html#lime.lime_tabular.LimeTabularExplainer.explain_instance) from the explainer object to understand the behaviour of MLP. Run the "explain_instance" and analyze and discuss the results. 

In [None]:
idx = 2
inp_explainer = np.expand_dims(tst_data[idx], axis=0)
explanation = explainer.explain_instance(tst_data[idx], predict_WDBC_MLP,
                                         num_features=len(feature_names))

print("Prediction : ", WDBC_class_names[np.argmax(predict_WDBC_MLP(inp_explainer))])
print("Actual :     ", WDBC_class_names[tst_labels[idx]])

explanation.show_in_notebook()
 

## References

1. ["Why Should I Trust You?": Explaining the Predictions of Any Classifier](https://arxiv.org/abs/1602.04938)

2. [Interpretable Machine Learning](https://christophm.github.io/interpretable-ml-book/)

3. [How do machine learning algorithms work?
On the example of LIME and model explainability](https://theblue.ai/blog/lime-models-explanation/)