## <a name="top"></a>Understanding Vehicle Fuel Economy (MPG) Predictions

In this Notebook we will try to understand how a trained [XGBoost](https://xgboost.readthedocs.io/en/stable/) model makes it's predictions about vehicle fuel efficiency a.k.a. miles per gallon (MPG).

Jump to the relevant section
- [Library Imports](#imports)
- [Data Import](#data)
- [Building Model](#building_model)
- [shap.KernelExplainer](#shap_ke)
- [shap.Explainer](#shap_exp)
- [Summary Plot - Feature Importance](#summ_fi)
- [Dependence Plot](#dep_plot)
- [Visualising a Single Prediction - Waterfall Plot](#waterfall)
- [Force Plot](#force)

### <a name="imports"></a>Library Imports

In [None]:
import pandas as pd
import numpy as np
from numpy import absolute
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
import xgboost as xgb
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

import sklearn
print("Scikit-Learn Version : {}".format(sklearn.__version__))

import shap
from shap import Explanation
print("SHAP Version : {}".format(shap.__version__))

# JavaScript Important for the interactive charts later on
shap.initjs()

### <a name="data"></a>Data Import

In [None]:
data = pd.read_csv('../data/auto-mpg.csv')
data = data.loc[(data['horsepower']!='?')]
data['horsepower'] = data['horsepower'].astype('int')
data.info()

In [None]:
X = data.drop(['mpg', 'car name'], axis=1)
y = data['mpg']

In [None]:
print(X.shape)
print(y.shape)

### <a name="building_model"></a>Building Model

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X,
                                                   y,
                                                   test_size=0.2,
                                                   random_state=42)

In [None]:
model = xgb.XGBRegressor()

In [None]:
# create an xgboost regression model
model = xgb.XGBRegressor(n_estimators=100)

# fit the model
model.fit(X_train, y_train)

In [None]:
# define model evaluation method
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
# force scores to be positive
scores = absolute(scores)
print('Mean MAE: %.3f (%.3f)' % (scores.mean(), scores.std()) )

In [None]:
# Check Actual Vs Predictions
fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111)

# Plot points
x_points = y_test
y_points = model.predict(X_test)

ax.scatter(x_points, y_points)
ax.set_title('Actual Vs Predicted MPG Values')
ax.set_ylabel('Predicted MPG')
ax.set_xlabel('Actual MPG')

ax.plot([0, 40],
       [0, 40],
       color='r',
       linestyle='-',
       linewidth=2)

plt.grid()
plt.show()

### <a name="shap_ke"></a>shap.KernelExplainer

The below is taken from the documentation, available [here](https://shap-lrjball.readthedocs.io/en/latest/generated/shap.KernelExplainer.html)

*"Uses Shapley values to explain any machine learning model or python function.*

*This is the primary explainer interface for the SHAP library. It takes any combination of a model and masker and returns a callable subclass object that implements the particular estimation algorithm that was chosen."*


For the shap.Explainer the first parameter must meet the following requirement and therefore must be `model.predict` "User supplied function that takes a matrix of samples (# samples x # features) and computes a the output of the model for those samples. The output can be a vector (# samples) or a matrix (# samples x # model outputs)."

See the methods available for [`shap.KernelExplainer`](https://shap-lrjball.readthedocs.io/en/latest/generated/shap.KernelExplainer.html) for example `.shap_values(...)`

In [None]:
# Instantiate KernelExplainer object....
kernal_shap_values = shap.KernelExplainer(model.predict,
                                          data=X_test).shap_values(X_test)

In [None]:
# You can then use `shap.summary_plot`
# however if you don't specify the `features` arguement
# then you will get the below...
shap.summary_plot(kernal_shap_values)

In [None]:
# Once added in, you get the feature names (left) and values (right)
shap.summary_plot(kernal_shap_values, X_test)

### <a name="shap_exp"></a>shap.Explainer

The below is taken from the documentation, available [here](https://shap-lrjball.readthedocs.io/en/latest/generated/shap.Explainer.html)

*"Uses Shapley values to explain any machine learning model or python function.*

*This is the primary explainer interface for the SHAP library. It takes any combination of a model and masker and returns a callable subclass object that implements the particular estimation algorithm that was chosen."*

In [None]:
# Obtain shap values
shap_values = shap.Explainer(model).shap_values(X_test)
shap_values

In [None]:
# Obtain shap interaction values
shap_interaction_values = shap.Explainer(model).shap_interaction_values(X_test)
shap_interaction_values

### <a name="summ_fi"></a>[Summary Plot - Feature Importance](https://shap-lrjball.readthedocs.io/en/latest/generated/shap.summary_plot.html?highlight=summary%20plot)

In [None]:
shap.summary_plot(shap_values,
                  X_test,
                  plot_type="bar")

In [None]:
# Summary - Beeswarm plot
shap.summary_plot(shap_values,
                  X_test)

In [None]:
# Summary - Violin plot
shap.summary_plot(shap_values,
                  X_test,
                  plot_type="violin")

### <a name="dep_plot"></a>[Dependence Plot](https://shap-lrjball.readthedocs.io/en/latest/generated/shap.dependence_plot.html)

*"Create a SHAP dependence plot, colored by an interaction feature.*

*Plots the value of the feature on the x-axis and the SHAP value of the same feature on the y-axis. This shows how the model depends on the given feature, and is like a richer extenstion of the classical parital dependence plots. Vertical dispersion of the data points represents interaction effects. Grey ticks along the y-axis are data points where the feature’s value was NaN."*

In [None]:
# Lets see the features and respective index numbers
for e, i in enumerate(X_test.columns):
    print(f"{e} - {i}")

In [None]:
# Now to create a dependence plot for each...
# Remember - Y-axis - is SHAP value for respective feature value
# X-axis - is the freature's value
for e, i in enumerate(X_test.columns):
    shap.dependence_plot(e, shap_values, X_test)

### <a name="waterfall"></a>Visualising a Single Prediction - [Waterfall Plot](https://shap-lrjball.readthedocs.io/en/latest/generated/shap.waterfall_plot.html)

You can use the `waterfall_plot` method to inpect a single prediction.
To use this you need to use the shap.Explanation

In [None]:
# compute SHAP values
# when variable `shap_values` was created above it used slightly different params...
# shap_values = shap.Explainer(model).shap_values(X_test)
explainer2 = shap.Explainer(model, X_train)
shap_values2 = explainer2(X)

In [None]:
print(type(shap_values2))

# note the different attributes i.e. values, base_values, etc
shap_values2

In [None]:
# idx of value to check
idx = 0
shap.plots.waterfall(shap_values2[idx])

### <a name="force"></a>[Force Plot](https://shap-lrjball.readthedocs.io/en/latest/generated/shap.force_plot.html)

*"Visualize the given SHAP values with an additive force layout."*

In [None]:
e = shap.Explainer(model, X_test)

e.expected_value

In [None]:
# See how the predicted value above compares to average predicted value below
y_pred = model.predict(X_test)
y_pred.mean()

In [None]:
shap_values[0,:]

In [None]:
X_test.iloc[0,:]

In [None]:
# Inspecting a single record using `shap.force_plot`
idx = 0
shap.force_plot(e.expected_value, # base_value i.e. expected value i.e. mean of predictions
                shap_values[idx,:], # shap_values i.e. matrix of SHAP values 
                X_test.iloc[idx,:]) # features i.e. should be the same as shap_values, above

In [None]:
# Multiple values
# Interactive plot with 2 different drop downs - left and top
shap.force_plot(e.expected_value,
                shap_values,
                X_test)

#### References Used
https://shap.readthedocs.io/en/latest/example_notebooks/api_examples/plots/waterfall.html?highlight=waterfall
https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/tree_based_models/Fitting%20a%20Linear%20Simulation%20with%20XGBoost.html