Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numerical data (categoric & continuous) explanation on SVC, and NN #73

Closed
yusufazishty opened this issue Jun 13, 2017 · 18 comments
Closed

Comments

@yusufazishty
Copy link

I follow your example from https://marcotcr.github.io/lime/tutorials/Tutorial%20-%20continuous%20and%20categorical%20features.html for continuous and categorical data and give it a try with different models. I used SVC (from sklearn) and NN (from keras). somehow both of the method I used get crashed and restart the kernel when it try to get the explanation (exp=...), code below.

`
#1 using SVC
predict_fn = lambda x: svm_linear.predict_proba(encoder.transform(x))
explainer = lime.lime_tabular.LimeTabularExplainer(train ,feature_names = feature_names,class_names=class_names, categorical_features=categorical_features, categorical_names=categorical_names, kernel_width=3)

all_explains = {}
for i in range(test.shape[0]):
exp = explainer.explain_instance(test[i], predict_fn, num_features=5)
all_explains[i]=exp`

`
#2 using NN
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

model = Sequential()
model.add(Dense(32, input_dim=encoded_train.shape[1], activation='relu'))
model.add(Dense(32,activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy'])

model.fit(encoded_train_toa, labels_train, epochs=30, batch_size=128)
score = model.evaluate(encoded_test_toa, labels_test, batch_size=128)

def trans(x):
x = encoder.transform(x).toarray()
return model.predict_proba(x)

import lime
from lime import lime_tabular
explainer = lime.lime_tabular.LimeTabularExplainer(train ,feature_names = feature_names,class_names=class_names,
categorical_features=categorical_features,
categorical_names=categorical_names)
all_explains={}
predict_fn = lambda x: trans(x)
for i in range(test.shape[0]):
temp = test[i,:]
exp = explainer.explain_instance(temp, predict_fn, num_features=5)
all_explains[i]=exp
`
is SVM and NN not supported yet for numerical data? Because I have no problem using it on tree-based classifiers.

@yusufazishty
Copy link
Author

After I re-check the paper stating "Formally, we define an explanation as a model g ∈ G, where G is a class of potentially interpretable models, such as linear models, decision trees, or falling rule lists"

I do another trial this time using sklearn.svm.LinearSVC.

The problem then there is no predict_proba in it, but fortunately scikit-learn has sklearn.calibration.CalibratedClassifierCV to provide the probabilty of prediction.

and everything works well :). But still no clue about NN ._.

@yusufazishty
Copy link
Author

I found out that the error when I use NN model is in the feature lookup.
I use the same pattern as in continuous and categorical tutorial so whenever I am dealing with categorical feature it must be OneHotEncoded first. so it apply either in training data or testing data.

When I apply to the dataset provided, the 14 features will be 108 after encoding.
I use class LimeTabularExplainer, and leave discretizer param unchanged, so inside def __data_inverse it will give categorical_features = range(data_row.shape[0]) (which is 108, because I use encoded data_row for testing). And it raise KeyError:14 when lookup this
for column in categorical_features: values = self.feature_values[column] freqs = self.feature_frequencies[column]

but the problem is, I used the same scheme for another models I tried and it just works fine. Is it my fault or there is a bug? thanks alot, sory for long post

@marcotcr
Copy link
Owner

As in the tutorial, LimeTabularExplainer assumes you give it the non-encoded version of the data, and that encoding will be folded into the predict function (like you did in your pasted code above).

If you have 14 encoded features, the categorical_features parameter should be range(14), rather than range(108). Does this make sense?

@yusufazishty
Copy link
Author

yeah I think, I'll give it a shot first. Thanks :)

@IamPR09
Copy link

IamPR09 commented Jul 3, 2017

Hi Yusuf and marco,
I also ended up having the same problem. Did it work for you?
because when i try it with xgboost, it works but when i try it with a simple MLP, it throws off an error.
Also, my feature set consists of both a free flowing text column and some categorical columns.

@rishanki
Copy link

rishanki commented Apr 4, 2018

i'm trying to do the same thing using Keras. In my final explainer instance
exp = explainer.explain_instance(X_validation[[100]], Model.predict_proba, num_features=5)
exp.show_in_notebook(show_table=True)
i get an error saying "could not broadcast input array from shape (11) into shape (1)" . I have 11 features going in my NN. I don't understand why is it doing this.

@marcotcr
Copy link
Owner

marcotcr commented Apr 4, 2018

@IamPR09 : sorry, I didn't see this before. What error are you getting? There is no simple way of combining free flowing text with categorical data at the moment though.

@rishanki What is X_validation.shape? Are you using LimeTabularExplainer?

@rishanki
Copy link

rishanki commented Apr 5, 2018

@marcotcr : Yes I am using LimeTabularExplainer and the validation shape is (3000,11).. do you need to see the entire code to make sense of it?

@rishanki
Copy link

rishanki commented Apr 5, 2018


ValueError Traceback (most recent call last)
in ()
1 i=100
----> 2 exp = explainer.explain_instance(X_validation[[11]], Model.predict_proba, num_features=5)
3 exp.show_in_notebook(show_table=True)

/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/lime/lime_tabular.pyc in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
260 explanations.
261 """
--> 262 data, inverse = self.data_inverse(data_row, num_samples)
263 scaled_data = (data - self.scaler.mean
) / self.scaler.scale

264

/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/lime/lime_tabular.pyc in __data_inverse(self, data_row, num_samples)
411 else:
412 first_row = self.discretizer.discretize(data_row)
--> 413 data[0] = data_row.copy()
414 inverse = data.copy()
415 for column in categorical_features:

ValueError: could not broadcast input array from shape (11) into shape (1)

@marcotcr
Copy link
Owner

marcotcr commented Apr 5, 2018

What is X_validation[[11]].shape? Is it a 1d numpy array?

@rishanki
Copy link

rishanki commented Apr 5, 2018

It is a numpy array with shape (1,11) . If I use the vanila way:
i=100
exp = explainer.explain_instance(X_validation[i], Model.predict_proba, num_features=5)
it gives me an error saying IndexError: index 1 is out of bounds for axis 1 with size 1
ANN wont give the probability for a single observation if I do Model.predict_proba(X_validation[1]) [Throws an error : expected dense_1_input to have shape (11,) but got array with shape (1,) ]
Hence i did Model.predict_proba(X_validation[[1]])

@marcotcr
Copy link
Owner

marcotcr commented Apr 5, 2018

The input to explain_instance should be a 1D array. You're giving it a 2d array, even if one of the dimensions is of size 1.

@wthrif
Copy link

wthrif commented May 21, 2018

Hey @rishanki
Were you able to solve this issue? I'm also looking for a way to get keras to output predictions as a 1D array so that it can be used by LIME.
Thanks.

@rishanki
Copy link

rishanki commented May 21, 2018 via email

@wthrif
Copy link

wthrif commented May 21, 2018

I'm not really a programmer so bear with me.
If we were able to ravel the predictions it would work. eg, for a single output node model.predict(x) gives an array with shape (#predictions, 1), but np.ravel(model.predict(x)) gives the correct 1D array populated by predictions.
Right now I'm looking into the lime_tabular python file to see if there is a place I can insert an np.ravel to ravel the prediction outputs before it sends it out to the next step.

Not sure if this is the best way to solve the problem.

@esther3587
Copy link

Has anyone found solution for this? I need to use lime with Keras model too but can't get pass that array sizing issue. I got exact same error message as @rishanki ...

@arjunnagpn
Copy link

arjunnagpn commented Nov 2, 2018

I'm facing the same issue as well. Please let me know if any of you guys were able to figure it out.
@marcotcr - Can you please let me know your thoughts on this?

@rhamnett
Copy link

rhamnett commented May 6, 2019

Did anyone manage to find a solution to this issue (particularly when using a convnet with embedding
categorical features and numerical together).

eg. https://www.kaggle.com/rhamnett/predicting-churn-with-keras-and-embedding-99-6

Thank you

@marcotcr marcotcr closed this as completed Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants