To see part I go [here](https://www.kaggle.com/jth359/deep-dive-into-ml-interpretability-part-i/)


## Model Agnostic Interpretability Techniques

All of the previous examples were dependent on the specific model (for example we could not do a feature importance on KNN). Model agnostic interpretability techniques can be used on any model.

### Partial Dependency Plots

Partial Dependency Plots show the impact of changing a single feature on the predicted value 


To start I am going to look at the partial dependency plot for the random forest on the feature `cont2` (since it appeared to be the important feature) 

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
# load in dataset 
df = pd.read_csv('/kaggle/input/tabular-playground-series-jan-2021/train.csv')

# set id to be the index 
df.set_index('id', inplace = True)

# begin with a train test split 
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor

X = df.drop('target', axis = 1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=11)

# make instance of model
rf = RandomForestRegressor()

# fit model
rf.fit(X_train, y_train)

# check root mean squared error 
np.sqrt(mean_squared_error(y_test, rf.predict(X_test)))

In [None]:
from sklearn.inspection import plot_partial_dependence

# look at the partial 
plot_partial_dependence(rf, X_train, [1])

We see that at around 3.5 the prediction jumps from 7.84 to 7.94.  If this was a feature that we were familiar with we could validate that this jump in predicted value makes sense or not.  

Due to partial dependency plots being model agnostic, we can also use them on neural networks.  

In [None]:
from keras.wrappers.scikit_learn import KerasRegressor
import keras

# make Neural net 
def neural_net():
    model = keras.Sequential([
          keras.layers.Dense(64, activation='relu', input_shape=[14]),
          keras.layers.Dense(32, activation='relu'),
          keras.layers.Dense(1, activation = 'linear')
          ])

    model.compile(loss='mse',
          optimizer='adam')
    
    return model

# make neural net with keras wrapper (so I can use partial dependency plotter)
kr = KerasRegressor(build_fn=neural_net ,verbose=0)
kr._estimator_type = "regressor"
kr.fit(X_train,y_train, epochs=10, validation_data = (X_test, y_test))
# needed for partial dependency plotter (I have no dummy variables)
kr.dummy_ = None

# check RMSE 
np.sqrt(mean_squared_error(y_test, kr.predict(X_test))) 

In [None]:
# plot partial dependence plot for the first feature
plot_partial_dependence(kr,X_train,[1])

We see the neural network PDP is very different than the random forest PDP.  The neural network shows `cont2` having a positive lienar relationship until about 0.55 and then levels off.  This shows how the different models think the relationship between `cont2` and the target variable is.  

### Individual Conditional Expectation (ICE) Plot

An ICE plot is very similar to a PDP.  A PDP plot takes the average of each sample, while an ICE plot shows the impact on each sample.  

In [None]:
!pip install pdpbox

In [None]:
from pdpbox import pdp

pdp_weekofyear = pdp.pdp_isolate(
model=rf, dataset=X_train, model_features=X_train.columns, feature='cont2'
)
fig, axes = pdp.pdp_plot(pdp_weekofyear, 'cont2', plot_lines = True)

This dataset is rather large, so a PDP would probably be a better use case.  But the ICE plot does show the variance in change as the `cont2` feature increases.  

### Permutation Importance 

Permutation feature importance measures the increase in the prediction error of the model after we permuted the feature's values, which breaks the relationship between the feature and the true outcome.<br> 
[Source](https://christophm.github.io/interpretable-ml-book/feature-importance.html)

Since I was not able to do feature importance earlier on a neural network, I am going to use permutation importance on my neural network.  

In [None]:
from sklearn.inspection import permutation_importance
import matplotlib.pyplot as plt 

result = permutation_importance(kr, X_test, y_test, n_repeats=10,
                                random_state=11)
sorted_idx = result.importances_mean.argsort()

fig, ax = plt.subplots()
ax.boxplot(result.importances[sorted_idx].T,
           vert=False, labels=X_test.columns[sorted_idx])
ax.set_title("Permutation Importances (test set)")
fig.tight_layout()
plt.show()