Explaining model predictions using LIME.

In [1]:
import random
import os
import keras
import pickle
import numpy as np
import pandas as pd
import tensorflow as tf
import lime
import lime.lime_tabular

Using TensorFlow backend.


In [2]:
os.environ['PYTHONHASHSEED'] = '0' # Must equal zero

In [3]:
# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(42)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
random.seed(12345)

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds-have-to-be-set-where-to-realize-100-reproducibility-of-training-res
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)

# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed
tf.set_random_seed(98765)

Loading best model. ***Note: Due to the file size this model object is not shared on the Github but I am happy to share it if requested.***

In [4]:
model = keras.models.load_model('../../output/models/model_1.h5') 

Loading matrix of features for individuals in the validation set. ***Note: This file is not in the repository and I am unable to share it as it contains the challenge data.*** Also loading a list with the names of features (the names of the columns of X_test) that are assumed to be categorical.

In [None]:
X_test = pd.read_csv('../../data/X_test.csv', index_col='challengeID')
cat_cols = pd.read_csv('../../data/cat_cols.csv', header=None)

In [None]:
cat_cols = list(cat_cols[1])

Now I can use the model and these data to initialize an LIME explainer class.

In [None]:
names = X_test.columns
explainer = lime.lime_tabular.LimeTabularExplainer(np.array(X_test),
                                                   feature_names=names,
                                                   categorical_features=cat_cols,
                                                   class_names=['r'],
                                                   verbose=True,
                                                   mode='regression')

Now predicting the GPA for each person and randomly selectiong 100 observations for further inspection using LIME. This should provide an insight into the factors that are associated with different predicted GPA scores.

In [None]:
num_test_obs = X_test.shape[0]
sampled_indices = random.sample(range(0, num_test_obs), 100) # Note this are based on position in X_test, not actual index

In the original script the LIME explanation was done using the sklearn-wrapper around the Keras model. The output format from a standard Keras model object, as used here, is different and does not work with the LIME explainer class. To solve this issue without modifying the source code of either package I used a wrapper function.

In [None]:
def predict_modified(X):
    """
    This wrapper function takes a numpy array X, predicts values for the 
    array using the keras model specified above, and then converts these 
    predictions from an array of 1-D arrays returned by Keras into a single
    array.
    """
    predicted_vals = model.predict(X) # X is already a numpy array
    return np.array([x[0] for x in predicted_vals[:,]])

Finally, running the LIME explainer algorithm.

In [None]:
%%time
explanation_dict = {}
for i in sampled_indices:
    print("Getting explanation for observation ", str(i))
    exp = explainer.explain_instance(np.array(X_test)[i,:], predict_modified, num_features=5)
    explanation_dict[i] = exp.as_list()

Getting explanation for observation  213


Note that print statements do not show correctly due to a known problem with Jupyter notebooks (printing stops showing when disconnected from a remote server). The process took approximately 12 hours to complete.

In [None]:
pickle.dump(explanation_dict, open('../../output/lime_explanations_dict.p', 'wb'))