# Explainable AI

More and more important to understand why AI comes to decisions.
Some regulated industries need to better understand black box models.
So far these could only us very simple models (linear regression, decision trees) which are well understood - but less powerful.

For others GDPR is reason enough to better understand their AI models.
Others again worry about adversarial AI and explicitly want to understand the decision boundaries of their models.


There are 3 main packages in python to explain models.
Most of these somehow wiggle with the parameters and produce many local observations and thus can show the effect well of changing the input values of a particular observation:

- https://github.com/oracle/Skater
- https://github.com/marcotcr/lime local exploration of the model (sparse linear models around each prediction, faster)
- https://github.com/slundberg/shap Shap value has recently become the most popular one. Based on game theory and real math (average marginal contribution of a feature value over all possible coalitions, slower unless optimized model is used).

Reading further: 
- https://towardsdatascience.com/explainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
- https://blog.dominodatalab.com/shap-lime-python-libraries-part-1-great-explainers-pros-cons/

## Local evaluations (LIME)

In [None]:
import lime
import sklearn
import numpy as np
import sklearn
import sklearn.ensemble
import sklearn.metrics

from sklearn.datasets import fetch_20newsgroups
categories = ['alt.atheism', 'soc.religion.christian']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
newsgroups_test = fetch_20newsgroups(subset='test', categories=categories)
class_names = ['atheism', 'christian']

vectorizer = sklearn.feature_extraction.text.TfidfVectorizer(lowercase=False)
train_vectors = vectorizer.fit_transform(newsgroups_train.data)
test_vectors = vectorizer.transform(newsgroups_test.data)

rf = sklearn.ensemble.RandomForestClassifier(n_estimators=500)
rf.fit(train_vectors, newsgroups_train.target)

pred = rf.predict(test_vectors)
sklearn.metrics.f1_score(newsgroups_test.target, pred, average='binary')

We see that this classifier achieves a very high F score. [The sklearn guide to 20 newsgroups](https://scikit-learn.org/stable/datasets/#filtering-text-for-more-realistic-training) indicates that Multinomial Naive Bayes overfits this dataset by learning irrelevant stuff, such as headers. Let's see if random forests do the same.



In [None]:
from lime import lime_text
from sklearn.pipeline import make_pipeline
c = make_pipeline(vectorizer, rf)

print(c.predict_proba([newsgroups_test.data[0]]))


Now we create an explainer object. We pass the class_names a an argument for prettier display for an arbitrary observation.

In [None]:
from lime.lime_text import LimeTextExplainer
explainer = LimeTextExplainer(class_names=class_names)

idx = 83
exp = explainer.explain_instance(newsgroups_test.data[idx], c.predict_proba, num_features=6)
print('Document id: %d' % idx)
print('Probability(christian) =', c.predict_proba([newsgroups_test.data[idx]])[0,1])
print('True class: %s' % class_names[newsgroups_test.target[idx]])

The classifier got this example right (it predicted atheism).
The explanation is presented below as a list of weighted features.

In [None]:
exp.as_list()

These weighted features are a linear model, which approximates the behaviour of the random forest classifier in the vicinity of the test example. Roughly, if we remove 'Posting' and 'Host' from the document , the prediction should move towards the opposite class (Christianity) by about 0.27 (the sum of the weights for both features). Let's see if this is the case.

In [None]:
print('Original prediction:', rf.predict_proba(test_vectors[idx])[0,1])
tmp = test_vectors[idx].copy()
tmp[0,vectorizer.vocabulary_['Posting']] = 0
tmp[0,vectorizer.vocabulary_['Host']] = 0
print('Prediction removing some features:', rf.predict_proba(tmp)[0,1])
print('Difference:', rf.predict_proba(tmp)[0,1] - rf.predict_proba(test_vectors[idx])[0,1])

Pretty close!
The words that explain the model around this document seem very arbitrary - not much to do with either Christianity or Atheism.
In fact, these are words that appear in the email headers (you will see this clearly soon), which make distinguishing between the classes much easier.

Visualizing explanations
The explanations can be returned as a matplotlib barplot:

In [None]:
%matplotlib inline
fig = exp.as_pyplot_figure()

In [None]:
exp.show_in_notebook(text=False)

Finally, we can also include a visualization of the original document, with the words in the explanations highlighted. Notice how the words that affect the classifier the most are all in the email header.

In [None]:
exp.show_in_notebook(text=True)

## Shap value

In [None]:
import xgboost
import shap

# load JS visualization code to notebook
shap.initjs()

# train XGBoost model
X,y = shap.datasets.boston()
model = xgboost.train({"learning_rate": 0.01}, xgboost.DMatrix(X, label=y), 100)

# explain the model's predictions using SHAP values
# (same syntax works for LightGBM, CatBoost, and scikit-learn models)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)

# visualize the first prediction's explanation (use matplotlib=True to avoid Javascript)
shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])

The above explanation shows features each contributing to push the model output from the base value (the average model output over the training dataset we passed) to the model output. Features pushing the prediction higher are shown in red, those pushing the prediction lower are in blue.

visualize the training set predictions

In [None]:
shap.force_plot(explainer.expected_value, shap_values, X)

create a SHAP dependence plot to show the effect of a single feature across the whole dataset

In [None]:
shap.dependence_plot("RM", shap_values, X)

summarize the effects of all the features

In [None]:
shap.summary_plot(shap_values, X)

We can also just take the mean absolute value of the SHAP values for each feature to get a standard bar plot (produces stacked bars for multi-class outputs):

In [None]:
shap.summary_plot(shap_values, X, plot_type="bar")