Explaining model predictions using LIME.

In [1]:
import keras
import pickle
import numpy as np
import pandas as pd
import lime
import lime.lime_tabular

Using TensorFlow backend.


Loading best model. ***Note: Due to the file size this model object is not shared on the Github but I am happy to share it if requested.***

In [2]:
model = keras.models.load_model('../models/model_4.h5') 

Loading matrix of features for individuals in the validation set. ***Note: This file is not in the repository and I am unable to share it as it contains the challenge data.*** Also loading a list with the names of features (the names of the columns of X_test) that are assumed to be categorical.

In [3]:
X_test = pickle.load(open('X_test.p', 'rb'))
cat_cols = pickle.load(open('cat_cols.p', 'rb'))

Now I can use the model and these data to initialize an LIME explainer class.

In [4]:
names = X_test.columns
explainer = lime.lime_tabular.LimeTabularExplainer(np.array(X_test),
                                                   feature_names=names,
                                                   categorical_features=cat_cols,
                                                   class_names=['r'],
                                                   verbose=True,
                                                   mode='regression')

Now predicting the GPA for each person and selecting the 50 observations with the lowest and highest predicted values for further inspection using LIME. This should provide an insight into the factors that are associated with different predicted GPA scores.

In [5]:
y_preds = model.predict(np.array(X_test))

In [6]:
lowest = np.argsort(y_preds,axis=0)[:50]
highest =  np.argsort(y_preds,axis=0)[::-1][:50]

In [7]:
obs = [x[0] for x in list(lowest)] + [x[0] for x in list(highest)]

In [8]:
len(obs)

100

In the original script the LIME explanation was done using the sklearn-wrapper around the Keras model. The output format from a standard Keras model object, as used here, is different and does not work with the LIME explainer class. To solve this issue without modifying the source code of either package I used a wrapper function.

In [9]:
def predict_modified(X):
    """
    This wrapper function takes a numpy array X, predicts values for the 
    array using the keras model specified above, and then converts these 
    predictions from an array of 1-D arrays returned by Keras into a single
    array.
    """
    predicted_vals = model.predict(X) # X is already a numpy array
    return np.array([x[0] for x in predicted_vals[:,]])

Finally, running the LIME explainer algorithm.

In [10]:
%%time
explanation_dict = {}
for i in obs:
    print("Getting explanation for observation ", str(i))
    exp = explainer.explain_instance(np.array(X_test)[i,:], predict_modified, num_features=5)
    explanation_dict[i] = exp.as_list()

Getting explanation for observation  54
Intercept 2.8573009835
Prediction_local [ 2.94502686]
Right: 2.66543
Getting explanation for observation  226
Intercept 2.83724764899
Prediction_local [ 2.94126338]
Right: 2.66543
Getting explanation for observation  263
Intercept 2.87411286381
Prediction_local [ 2.87604411]
Right: 2.66543
Getting explanation for observation  210
Intercept 2.81560719253
Prediction_local [ 2.95379783]
Right: 2.66543
Getting explanation for observation  30
Intercept 2.93336423826
Prediction_local [ 2.93550774]
Right: 2.66543
Getting explanation for observation  318
Intercept 2.82960299242
Prediction_local [ 2.93116679]
Right: 2.66543
Getting explanation for observation  335
Intercept 2.8116995074
Prediction_local [ 2.93820374]
Right: 2.66543
Getting explanation for observation  162
Intercept 2.77998123067
Prediction_local [ 2.92331543]
Right: 2.66543
Getting explanation for observation  183
Intercept 2.70648581351
Prediction_local [ 2.94359206]
Right: 2.66543
Getti

In [11]:
pickle.dump(explanation_dict, open('new_lime_explanations_dict.p', 'wb'))

In [13]:
explanation_dict

{15: [('m3j4b_6_1.0 <= 0.00', 0.082601255055925388),
  ('f3d3a_5_1.0 <= 0.00', 0.062078020834230613),
  ('m4d4_4.0 <= 0.00', 0.014356585195862134),
  ('p5j10a_3.0 <= 0.00', 0.010538206245675492),
  ('m4h1l_2.0 <= 0.00', 0.0094738829952781805)],
 19: [('f3i2_9.0 <= 0.00', 0.029054045455762831),
  ('t5b3e_3.0 <= 0.00', 0.028583245177947875),
  ('hv3f5_6.0 <= 0.00', 0.014999446000797939),
  ('f3j2f > 0.33', -0.0094305725743658499),
  ('0.00 < f5k14f_1.0 <= 1.00', -0.0077164551024945655)],
 21: [('m2b18g_6.0 <= 0.00', 0.044521990055136927),
  ('p5i10a_10.0 <= 0.00', 0.033210535574034854),
  ('m1f15_1.0 <= 0.00', 0.031400485711111009),
  ('m5a8f03_1.0 > 0.00', -0.010279181353953523),
  ('f4k10 <= -0.33', 0.0086821913586804921)],
 28: [('f3d1a_4.0 <= 0.00', 0.070615491440118097),
  ('p5q2a_3.0 <= 0.00', 0.036323032580108459),
  ('hv3k1b_5.0 <= 0.00', 0.011213509721147406),
  ('m2d4a_4.0 <= 0.00', 0.0082278754288294276),
  ('m4c13_1.0 <= 0.00', 0.0076683240766350852)],
 29: [('hv3k1a_4.0 <= 0