<a class="anchor" id="0"></a>
# **Explain your model predictions with Shapley Values**


Hello friends,


In this kernel, I will introduce you to **SHAP library** and **Shapley Values** in Python. These are used to explain your model predictions and get insights into the model development process.

So, let's get started.

**As always, I hope you find this kernel useful and your <font color="red"><b>UPVOTES</b></font> would be highly appreciated**.

<a class="anchor" id="0.1"></a>
# **Table of Contents**

1. [Interpretable Machine Learning](#1)
2. [Introduction to SHAP library and Shapely Values](#2)
  - 2.1 [Shapely Values](#2.1)
  - 2.2 [SHAP Library](#2.2)
3. [Python implementation of model development](#3)
4. [SHAP Explanation Force Plots](#4)
5. [SHAP Feature Importance](#5)
6. [SHAP Summary Plot](#6)
7. [SHAP Dependence Plot](#7)
8. [References](#8) 
  


# **1. Interpretable Machine Learning** <a class="anchor" id="1"></a>

[Table of Contents](#0.1)

- Machine learning has great potential for improving products, processes and services. A dataset is supplied as input and algorithms produce the desired output. But, algorithms do not explain their predictions. It acts as a barrier to the adoption of machine learning. In this case, interpretable machine learning models come to the rescue.

- Tim Miller - “Explanation in Artificial Intelligence: Insights from the Social Sciences.” defines interpretability as -

    **“the degree to which a human can understand the cause of a decision in a model”.** So it means it’s something        that you achieve in some sort of “degree”.       
       
- In the context of machine learning, interpretability helps us to understand how a model has made a particular decision. 

- Our model should be interpretable and they should also display the following traits:-

  - 1 **Fairness**  -  Ensuring that predictions are unbiased and do not implicitly or explicitly discriminate against protected groups. An interpretable model can tell you why it has decided that a certain person should not get a loan, and it becomes easier for a human to judge whether the decision is based on a learned demographic (e.g. racial) bias.
  - 2 **Privacy**  -  Ensuring that sensitive information in the data is protected.
  - 3 **Reliability or Robustness**  -  Ensuring that small changes in the input do not lead to large changes in the prediction.
  - 4 **Causality**  -  Check that only causal relationships are picked up.
  - 5 **Trust**  -  It is easier for humans to trust a system that explains its decisions compared to a black box.   


# **Most commonly used methods for explainability**

- These methods do not rely on any particularity of the model. The advantage of these methods lies in their flexibility. Machine learning developers are free to use any machine learning model they like. The interpretation methods can be applied to any model. These methods are given below:-

 - 1 Shapley values (explained in this kernel)
 - 2 LIME 
 - 3 Feature importance
 - 4 Feature interaction 
 - 5 Surrogate Models 

# **2. Introduction to SHAP library and Shapely values** <a class="anchor" id="2"></a>

[Table of Contents](#0.1)


- Python provides a library called [SHAP (SHapley Additive exPlanations)](https://christophm.github.io/interpretable-ml-book/shap.html) by Lundberg and Lee, It is used to explain individual model predictions. SHAP is based on the game theoretically optimal [Shapley Values](https://christophm.github.io/interpretable-ml-book/shapley.html#shapley).

- Let's first talk about Shapley Values.

## **2.1 Shapley Values** <a class="anchor" id="2.1"></a>


[Table of Contents](#0.1)


- In terms of [Interpretable Machine Learning - Shapley Values](https://christophm.github.io/interpretable-ml-book/shapley.html#shapley), **Shapley Values** can be defined as-

     **A prediction can be explained by assuming that each feature value of the instance is a “player” in a game    where the prediction is the payout. Shapley values – a method from coalitional game theory – tells us how to fairly distribute the “payout” among the features.**


- For an in-depth discussion of Shapley Values, please read the chapter - [Shapley Values](https://christophm.github.io/interpretable-ml-book/shapley.html#shapley).


## **2.2 SHAP Library** <a class="anchor" id="2.2"></a>


[Table of Contents](#0.1)


- The SHAP library in Python has inbuilt functions to use Shapley values for interpreting machine learning models. It has optimized functions for interpreting tree-based models and a model agnostic explainer function for interpreting any black-box model for which the predictions are known.

- Lundberg and Lee implemented SHAP in the [SHAP](https://github.com/slundberg/shap) Python package. This implementation works for tree-based models in the scikit-learn machine learning library for Python.

- The SHAP authors proposed **KernelSHAP**, an alternative, kernel-based estimation approach for Shapley values inspired by [local surrogate models](https://christophm.github.io/interpretable-ml-book/lime.html#lime). 

- Also they proposed **TreeSHAP**, an efficient estimation approach for tree-based models. 

- [SHAP](https://github.com/slundberg/shap) comes with many global interpretation methods based on aggregations of Shapley values. We will demonstrate them in this kernel.

- For an in-depth discussion of [SHAP](https://github.com/slundberg/shap) , please read the chapter - [SHAP](https://christophm.github.io/interpretable-ml-book/shap.html).

- Now, let's get to the implementation.

# **3. Python Implementation of Model Development** <a class="anchor" id="3"></a>

[Table of Contents](#0.1) 

## **3.1 Initial Set-Up** <a class="anchor" id="3.1"></a>

[Table of Contents](#0.1) 

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # data visualization
import seaborn as sns # statistical data visualization

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.


In [None]:
# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

## **3.2 Reading Data** <a class="anchor" id="3.2"></a>

[Table of Contents](#0.1) 

In [None]:
# Load and preview data
df = pd.read_csv('/kaggle/input/california-housing-prices/housing.csv')
df.head()

- The target variable is `median_house_value`.

## **3.3 View Summary of data** <a class="anchor" id="3.3"></a>

[Table of Contents](#0.1)

In [None]:
# View summary of data
df.info()

- We can see that `total_bedrooms` have missing values.

## **3.4 Missing Value Treatment** <a class="anchor" id="3.4"></a>

[Table of Contents](#0.1)

In [None]:
# Plot the distribution of total bedrooms
df['total_bedrooms'].value_counts().plot.bar()

- The `total_bedrooms` distribution have skewed distribution. So, I will use median to fill the missing values.

In [None]:
# Imputing missing values in total_bedrooms by median
df['total_bedrooms'].fillna(df['total_bedrooms'].median(), inplace=True)

In [None]:
# now check for missing values in total bedrooms
df.isnull().sum()

There are no missing values in the dataset.

## **3.5 Feature Vector and Target Variable** <a class="anchor" id="3.5"></a>

[Table of Contents](#0.1)

In [None]:
# Declare feature vector and target variable
X = df[['longitude','latitude','housing_median_age','total_rooms',
        'total_bedrooms','population','households','median_income']]
y = df['median_house_value']

## **3.6 Train-Test Split** <a class="anchor" id="3.6"></a>

[Table of Contents](#0.1)

In [None]:
# Split the data into train and test data:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)

## **3.7 Build the model** <a class="anchor" id="3.7"></a>

[Table of Contents](#0.1)

In [None]:
# Build the model with Random Forest Classifier :
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(max_depth=6, random_state=0, n_estimators=10)
model.fit(X_train, y_train)

## **3.8 Generate Predictions** <a class="anchor" id="3.8"></a>

[Table of Contents](#0.1)

In [None]:
y_pred = model.predict(X_test)

## **3.9 Evaluating Performance** <a class="anchor" id="3.9"></a>

[Table of Contents](#0.1)

In [None]:
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred)**(0.5)
mse

 # **4. SHAP Explanation Force Plots** <a class="anchor" id="4"></a>

[Table of Contents](#0.1)


- We will use SHAP to explain individual predictions. We can use the fast TreeSHAP estimation method instead of the slower KernelSHAP method, since a random forest is an ensemble of trees.

- Since SHAP computes Shapley values, the interpretation is the same as in the [Shapley value chapter](https://christophm.github.io/interpretable-ml-book/shapley.html#shapley. But with the Python shap package comes a different visualization: You can visualize feature attributions such as Shapley values as “forces”. Each feature value is a force that either increases or decreases the prediction. The prediction starts from the baseline. The baseline for Shapley values is the average of all predictions. In the plot, each Shapley value is an arrow that pushes to increase (positive value) or decrease (negative value) the prediction. These forces balance each other out at the actual prediction of the data instance.

- The following figure shows SHAP explanation force plots for the California Housing Prices dataset.

In [None]:
# import shap library
import shap

# explain the model's predictions using SHAP
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_train)

# visualize the first prediction's explanation 
shap.initjs()
shap.force_plot(explainer.expected_value, shap_values[0,:], X_train.iloc[0,:])

### **Interpretation**

- The above plot shows features each contributing to push the model output from the base value (the average model output over the training dataset we passed) to the model output. Features pushing the prediction higher are shown in red and those pushing the prediction lower are in blue.

- So, `housing_median_age` pushes the prediction higher and `median_income`,`latitude` and `longitude` pushes the prediction lower.

- The base value of the `median_house_value` is 2.063e+5 = 206300.

- The output value is 70189.83 with `housing_median_age=52`, `median_income=1.975`, `latitude=36.73` and  `longitude=-119.8`.

- If we take many explanations such as the one shown above, rotate them 90 degrees, and then stack them horizontally, we can see explanations for an entire dataset as shown below.

- The following plot is interactive. Just scroll the mouse and see the different values.

In [None]:
# visualize the training set predictions
shap.force_plot(explainer.expected_value, shap_values, X_train)

 # **5. SHAP Feature Importance** <a class="anchor" id="5"></a>

[Table of Contents](#0.1)


- The idea behind SHAP feature importance is simple. Features with large absolute Shapley values are important. Since we want the global importance, we average the absolute Shapley values per feature across the data.

- Next, we sort the features by decreasing importance and plot them. The following figure shows the SHAP feature importance for the trained random forest model.

In [None]:
shap_values = shap.TreeExplainer(model).shap_values(X_train)
shap.summary_plot(shap_values, X_train, plot_type="bar")

- The above plot shows the SHAP feature importance measured as the mean absolute Shapley values. 

- The variable `median_income` was the most important feature, changing the predicted `median_house_value` on average by 56000 on x-axis.

- SHAP is based on magnitude of feature attributions. The feature importance plot is useful, but contains no information beyond the importances. For a more informative plot, we will next look at the summary plot.

# **6. SHAP Summary Plot** <a class="anchor" id="6"></a>

[Table of Contents](#0.1)


- The summary plot combines feature importance with feature effects. Each point on the summary plot is a Shapley value for a feature and an instance. The position on the y-axis is determined by the feature and on the x-axis by the Shapley value. 

- The color represents the value of the feature from low to high. Overlapping points are jittered in y-axis direction, so we get a sense of the distribution of the Shapley values per feature. The features are ordered according to their importance.


In [None]:
shap.summary_plot(shap_values, X_train)

- The above plot shows the SHAP summary plot. The summary plot combines feature importance with feature effects. 

- Each point on the summary plot is a Shapley value for a feature and an instance. The position on the y-axis is determined by the feature and on the x-axis by the Shapley value. The color represents the value of the feature from low to high. Overlapping points are jittered in y-axis direction, so we get a sense of the distribution of the Shapley values per feature. The features are ordered according to their importance.

- This plot is made of all the dots in the train data. It demonstrates the following information:

  - *Feature importance*: Variables are ranked in descending order.
  - *Impact*: The horizontal location shows whether the effect of that value is associated with a higher or lower prediction.
  - *Original value*: Color shows whether that variable is high (in red) or low (in blue) for that observation.
  - *Correlation*: A high level of the `median_income` has a high and positive impact on the `median_house_value`. The “high” comes from the red color, and the “positive” impact is shown on the X-axis. 
  
- Similarly, `housing_median_age` is positively correlated with the target variable `median_house_value`.

# **7. SHAP Dependence Plot** <a class="anchor" id="7"></a>

[Table of Contents](#0.1)


- The SHAP Dependence plot shows the marginal effect one or two features have on the predicted outcome of a machine learning model  It tells whether the relationship between the target and a feature is linear, monotonic or more complex. 

- We can create a dependence plot as follows:-

In [None]:
shap.dependence_plot('median_income', shap_values, X_train)

- The function automatically includes another variable that the chosen variable interacts most with. The above plot shows there is an approximately linear and positive trend between `median_income` and the target variable, and `median_income` interacts with `housing_median_age` frequently.

- Now, suppose we want to know `longitude` and the variable that it interacts the most.

- We can do `shap.dependence_plot(“longitude”, shap_values, X_train). 

- The plot below shows there exists an approximately linear but negative relationship between `longitude` and the target variable. This negative relationship is already demonstrated in the variable importance plot. It interacts with `median_income` variable frequently.

In [None]:
shap.dependence_plot('longitude', shap_values, X_train)

# **8. References** <a class="anchor" id="8"></a>

[Table of Contents](#0.1)

The ideas and concepts are taken from following books and websites:-

- 1 https://github.com/slundberg/shap
- 2 https://www.kaggle.com/dansbecker/shap-values
- 3 https://www.kaggle.com/dansbecker/advanced-uses-of-shap-values
- 4 https://christophm.github.io/interpretable-ml-book/
- 5 https://christophm.github.io/interpretable-ml-book/shapley.html
- 6 https://christophm.github.io/interpretable-ml-book/shap.html
- 7 https://towardsdatascience.com/explain-your-model-with-the-shap-values-bc36aac4de3d



[Go to Top](#0)