numerical data (categoric & continuous) explanation on SVC, and NN #73

yusufazishty · 2017-06-13T04:24:38Z

I follow your example from https://marcotcr.github.io/lime/tutorials/Tutorial%20-%20continuous%20and%20categorical%20features.html for continuous and categorical data and give it a try with different models. I used SVC (from sklearn) and NN (from keras). somehow both of the method I used get crashed and restart the kernel when it try to get the explanation (exp=...), code below.

`
#1 using SVC
predict_fn = lambda x: svm_linear.predict_proba(encoder.transform(x))
explainer = lime.lime_tabular.LimeTabularExplainer(train ,feature_names = feature_names,class_names=class_names, categorical_features=categorical_features, categorical_names=categorical_names, kernel_width=3)

all_explains = {}
for i in range(test.shape[0]):
exp = explainer.explain_instance(test[i], predict_fn, num_features=5)
all_explains[i]=exp`

`
#2 using NN
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

model = Sequential()
model.add(Dense(32, input_dim=encoded_train.shape[1], activation='relu'))
model.add(Dense(32,activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy'])

model.fit(encoded_train_toa, labels_train, epochs=30, batch_size=128)
score = model.evaluate(encoded_test_toa, labels_test, batch_size=128)

def trans(x):
x = encoder.transform(x).toarray()
return model.predict_proba(x)

import lime
from lime import lime_tabular
explainer = lime.lime_tabular.LimeTabularExplainer(train ,feature_names = feature_names,class_names=class_names,
categorical_features=categorical_features,
categorical_names=categorical_names)
all_explains={}
predict_fn = lambda x: trans(x)
for i in range(test.shape[0]):
temp = test[i,:]
exp = explainer.explain_instance(temp, predict_fn, num_features=5)
all_explains[i]=exp
`
is SVM and NN not supported yet for numerical data? Because I have no problem using it on tree-based classifiers.

yusufazishty · 2017-06-13T08:57:11Z

After I re-check the paper stating "Formally, we define an explanation as a model g ∈ G, where G is a class of potentially interpretable models, such as linear models, decision trees, or falling rule lists"

I do another trial this time using sklearn.svm.LinearSVC.

The problem then there is no predict_proba in it, but fortunately scikit-learn has sklearn.calibration.CalibratedClassifierCV to provide the probabilty of prediction.

and everything works well :). But still no clue about NN ._.

yusufazishty · 2017-06-15T02:47:48Z

I found out that the error when I use NN model is in the feature lookup.
I use the same pattern as in continuous and categorical tutorial so whenever I am dealing with categorical feature it must be OneHotEncoded first. so it apply either in training data or testing data.

When I apply to the dataset provided, the 14 features will be 108 after encoding.
I use class LimeTabularExplainer, and leave discretizer param unchanged, so inside def __data_inverse it will give categorical_features = range(data_row.shape[0]) (which is 108, because I use encoded data_row for testing). And it raise KeyError:14 when lookup this
for column in categorical_features: values = self.feature_values[column] freqs = self.feature_frequencies[column]

but the problem is, I used the same scheme for another models I tried and it just works fine. Is it my fault or there is a bug? thanks alot, sory for long post

marcotcr · 2017-06-17T17:53:19Z

As in the tutorial, LimeTabularExplainer assumes you give it the non-encoded version of the data, and that encoding will be folded into the predict function (like you did in your pasted code above).

If you have 14 encoded features, the categorical_features parameter should be range(14), rather than range(108). Does this make sense?

yusufazishty · 2017-06-19T05:57:53Z

yeah I think, I'll give it a shot first. Thanks :)

IamPR09 · 2017-07-03T04:49:39Z

Hi Yusuf and marco,
I also ended up having the same problem. Did it work for you?
because when i try it with xgboost, it works but when i try it with a simple MLP, it throws off an error.
Also, my feature set consists of both a free flowing text column and some categorical columns.

rishanki · 2018-04-04T21:32:55Z

i'm trying to do the same thing using Keras. In my final explainer instance
exp = explainer.explain_instance(X_validation[[100]], Model.predict_proba, num_features=5)
exp.show_in_notebook(show_table=True)
i get an error saying "could not broadcast input array from shape (11) into shape (1)" . I have 11 features going in my NN. I don't understand why is it doing this.

marcotcr · 2018-04-04T22:49:32Z

@IamPR09 : sorry, I didn't see this before. What error are you getting? There is no simple way of combining free flowing text with categorical data at the moment though.

@rishanki What is X_validation.shape? Are you using LimeTabularExplainer?

rishanki · 2018-04-05T14:19:33Z

@marcotcr : Yes I am using LimeTabularExplainer and the validation shape is (3000,11).. do you need to see the entire code to make sense of it?

rishanki · 2018-04-05T17:34:56Z

ValueError Traceback (most recent call last)
in ()
1 i=100
----> 2 exp = explainer.explain_instance(X_validation[[11]], Model.predict_proba, num_features=5)
3 exp.show_in_notebook(show_table=True)

/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/lime/lime_tabular.pyc in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
260 explanations.
261 """
--> 262 data, inverse = self.data_inverse(data_row, num_samples)
263 scaled_data = (data - self.scaler.mean) / self.scaler.scale
264

/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/lime/lime_tabular.pyc in __data_inverse(self, data_row, num_samples)
411 else:
412 first_row = self.discretizer.discretize(data_row)
--> 413 data[0] = data_row.copy()
414 inverse = data.copy()
415 for column in categorical_features:

ValueError: could not broadcast input array from shape (11) into shape (1)

marcotcr · 2018-04-05T18:28:41Z

What is X_validation[[11]].shape? Is it a 1d numpy array?

rishanki · 2018-04-05T19:04:40Z

It is a numpy array with shape (1,11) . If I use the vanila way:
i=100
exp = explainer.explain_instance(X_validation[i], Model.predict_proba, num_features=5)
it gives me an error saying IndexError: index 1 is out of bounds for axis 1 with size 1
ANN wont give the probability for a single observation if I do Model.predict_proba(X_validation[1]) [Throws an error : expected dense_1_input to have shape (11,) but got array with shape (1,) ]
Hence i did Model.predict_proba(X_validation[[1]])

marcotcr · 2018-04-05T22:46:10Z

The input to explain_instance should be a 1D array. You're giving it a 2d array, even if one of the dimensions is of size 1.

wthrif · 2018-05-21T17:56:51Z

Hey @rishanki
Were you able to solve this issue? I'm also looking for a way to get keras to output predictions as a 1D array so that it can be used by LIME.
Thanks.

rishanki · 2018-05-21T17:59:32Z

Hey No, I couldn't do it. I actually gave up on it... didn't have the time to sit on it more. But lets work on it together if you want to resolve the issue.

…

On Mon, May 21, 2018 at 1:56 PM, wthrif ***@***.***> wrote: Hey @rishanki <https://github.com/rishanki> Were you able to solve this issue? I'm also looking for a way to get keras to output predictions as a 1D array so that it can be used by LIME. Thanks. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#73 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AM5DN7X0dCi6gkuBIXzMelNVQiS7LnwKks5t0v_rgaJpZM4N3-FL> .

wthrif · 2018-05-21T18:39:10Z

I'm not really a programmer so bear with me.
If we were able to ravel the predictions it would work. eg, for a single output node model.predict(x) gives an array with shape (#predictions, 1), but np.ravel(model.predict(x)) gives the correct 1D array populated by predictions.
Right now I'm looking into the lime_tabular python file to see if there is a place I can insert an np.ravel to ravel the prediction outputs before it sends it out to the next step.

Not sure if this is the best way to solve the problem.

esther3587 · 2018-07-03T22:32:05Z

Has anyone found solution for this? I need to use lime with Keras model too but can't get pass that array sizing issue. I got exact same error message as @rishanki ...

arjunnagpn · 2018-11-02T20:18:54Z

I'm facing the same issue as well. Please let me know if any of you guys were able to figure it out.
@marcotcr - Can you please let me know your thoughts on this?

rhamnett · 2019-05-06T15:43:48Z

Did anyone manage to find a solution to this issue (particularly when using a convnet with embedding
categorical features and numerical together).

eg. https://www.kaggle.com/rhamnett/predicting-churn-with-keras-and-embedding-99-6

Thank you

marcotcr closed this as completed Jun 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numerical data (categoric & continuous) explanation on SVC, and NN #73

numerical data (categoric & continuous) explanation on SVC, and NN #73

yusufazishty commented Jun 13, 2017

yusufazishty commented Jun 13, 2017

yusufazishty commented Jun 15, 2017

marcotcr commented Jun 17, 2017

yusufazishty commented Jun 19, 2017

IamPR09 commented Jul 3, 2017 •

edited

rishanki commented Apr 4, 2018

marcotcr commented Apr 4, 2018

rishanki commented Apr 5, 2018

rishanki commented Apr 5, 2018

marcotcr commented Apr 5, 2018

rishanki commented Apr 5, 2018

marcotcr commented Apr 5, 2018

wthrif commented May 21, 2018

rishanki commented May 21, 2018 via email

wthrif commented May 21, 2018

esther3587 commented Jul 3, 2018

arjunnagpn commented Nov 2, 2018 •

edited

rhamnett commented May 6, 2019 •

edited

numerical data (categoric & continuous) explanation on SVC, and NN #73

numerical data (categoric & continuous) explanation on SVC, and NN #73

Comments

yusufazishty commented Jun 13, 2017

yusufazishty commented Jun 13, 2017

yusufazishty commented Jun 15, 2017

marcotcr commented Jun 17, 2017

yusufazishty commented Jun 19, 2017

IamPR09 commented Jul 3, 2017 • edited

rishanki commented Apr 4, 2018

marcotcr commented Apr 4, 2018

rishanki commented Apr 5, 2018

rishanki commented Apr 5, 2018

marcotcr commented Apr 5, 2018

rishanki commented Apr 5, 2018

marcotcr commented Apr 5, 2018

wthrif commented May 21, 2018

rishanki commented May 21, 2018 via email

wthrif commented May 21, 2018

esther3587 commented Jul 3, 2018

arjunnagpn commented Nov 2, 2018 • edited

rhamnett commented May 6, 2019 • edited

IamPR09 commented Jul 3, 2017 •

edited

arjunnagpn commented Nov 2, 2018 •

edited

rhamnett commented May 6, 2019 •

edited