Explaining model predictions using LIME.

In [1]:
from random import sample
import keras
import pickle
import numpy as np
import pandas as pd
import lime
import lime.lime_tabular

Using TensorFlow backend.


Loading best model. ***Note: Due to the file size this model object is not shared on the Github but I am happy to share it if requested.***

In [2]:
model = keras.models.load_model('../model/model_4.h5') 

Loading matrix of features for individuals in the validation set. ***Note: This file is not in the repository and I am unable to share it as it contains the challenge data.*** Also loading a list with the names of features (the names of the columns of X_test) that are assumed to be categorical.

In [3]:
X_test = pickle.load(open('../../X_test.p', 'rb'))
cat_cols = pickle.load(open('../../cat_cols.p', 'rb'))

Now I can use the model and these data to initialize an LIME explainer class.

In [4]:
names = X_test.columns
explainer = lime.lime_tabular.LimeTabularExplainer(np.array(X_test),
                                                   feature_names=names,
                                                   categorical_features=cat_cols,
                                                   class_names=['r'],
                                                   verbose=True,
                                                   mode='regression')

Now predicting the GPA for each person and randomly selectiong 100 observations for further inspection using LIME. This should provide an insight into the factors that are associated with different predicted GPA scores.

In [5]:
num_test_obs = X_test.shape[0]
sampled_indices = sample(range(0, num_test_obs), 100)

In [6]:
print(sampled_indices)

[114, 228, 248, 79, 415, 144, 207, 76, 89, 236, 326, 198, 71, 227, 191, 332, 265, 158, 209, 181, 226, 16, 338, 318, 281, 366, 54, 393, 249, 256, 171, 352, 333, 232, 100, 413, 359, 416, 85, 112, 253, 348, 81, 170, 157, 406, 3, 5, 49, 220, 206, 111, 356, 241, 294, 127, 28, 397, 250, 365, 96, 216, 165, 389, 80, 72, 410, 260, 43, 138, 264, 97, 276, 50, 37, 268, 48, 69, 47, 308, 99, 18, 193, 117, 246, 197, 129, 139, 64, 86, 273, 161, 203, 284, 290, 411, 277, 295, 334, 396]


In the original script the LIME explanation was done using the sklearn-wrapper around the Keras model. The output format from a standard Keras model object, as used here, is different and does not work with the LIME explainer class. To solve this issue without modifying the source code of either package I used a wrapper function.

In [None]:
def predict_modified(X):
    """
    This wrapper function takes a numpy array X, predicts values for the 
    array using the keras model specified above, and then converts these 
    predictions from an array of 1-D arrays returned by Keras into a single
    array.
    """
    predicted_vals = model.predict(X) # X is already a numpy array
    return np.array([x[0] for x in predicted_vals[:,]])

Finally, running the LIME explainer algorithm.

In [None]:
%%time
explanation_dict = {}
for i in sampled_indices:
    print("Getting explanation for observation ", str(i))
    exp = explainer.explain_instance(np.array(X_test)[i,:], predict_modified, num_features=5)
    explanation_dict[i] = exp.as_list()

Getting explanation for observation  114


Note that print statements do not show correctly due to a known problem with Jupyter notebooks (printing stops showing when disconnected from a remote server). The process took approximately 12 hours to complete.

In [None]:
pickle.dump(explanation_dict, open('lime_explanations_dict.p', 'wb'))