New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
numerical data (categoric & continuous) explanation on SVC, and NN #73
Comments
After I re-check the paper stating "Formally, we define an explanation as a model g ∈ G, where G is a class of potentially interpretable models, such as linear models, decision trees, or falling rule lists" I do another trial this time using sklearn.svm.LinearSVC. The problem then there is no predict_proba in it, but fortunately scikit-learn has sklearn.calibration.CalibratedClassifierCV to provide the probabilty of prediction. and everything works well :). But still no clue about NN ._. |
I found out that the error when I use NN model is in the feature lookup. When I apply to the dataset provided, the 14 features will be 108 after encoding. but the problem is, I used the same scheme for another models I tried and it just works fine. Is it my fault or there is a bug? thanks alot, sory for long post |
As in the tutorial, LimeTabularExplainer assumes you give it the non-encoded version of the data, and that encoding will be folded into the predict function (like you did in your pasted code above). If you have 14 encoded features, the categorical_features parameter should be range(14), rather than range(108). Does this make sense? |
yeah I think, I'll give it a shot first. Thanks :) |
Hi Yusuf and marco, |
i'm trying to do the same thing using Keras. In my final explainer instance |
@marcotcr : Yes I am using LimeTabularExplainer and the validation shape is (3000,11).. do you need to see the entire code to make sense of it? |
ValueError Traceback (most recent call last) /opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/lime/lime_tabular.pyc in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor) /opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/lime/lime_tabular.pyc in __data_inverse(self, data_row, num_samples) ValueError: could not broadcast input array from shape (11) into shape (1) |
What is X_validation[[11]].shape? Is it a 1d numpy array? |
It is a numpy array with shape (1,11) . If I use the vanila way: |
The input to |
Hey @rishanki |
Hey
No, I couldn't do it. I actually gave up on it... didn't have the time to
sit on it more. But lets work on it together if you want to resolve the
issue.
…On Mon, May 21, 2018 at 1:56 PM, wthrif ***@***.***> wrote:
Hey @rishanki <https://github.com/rishanki>
Were you able to solve this issue? I'm also looking for a way to get keras
to output predictions as a 1D array so that it can be used by LIME.
Thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#73 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AM5DN7X0dCi6gkuBIXzMelNVQiS7LnwKks5t0v_rgaJpZM4N3-FL>
.
|
I'm not really a programmer so bear with me. Not sure if this is the best way to solve the problem. |
Has anyone found solution for this? I need to use lime with Keras model too but can't get pass that array sizing issue. I got exact same error message as @rishanki ... |
I'm facing the same issue as well. Please let me know if any of you guys were able to figure it out. |
Did anyone manage to find a solution to this issue (particularly when using a convnet with embedding eg. https://www.kaggle.com/rhamnett/predicting-churn-with-keras-and-embedding-99-6 Thank you |
I follow your example from https://marcotcr.github.io/lime/tutorials/Tutorial%20-%20continuous%20and%20categorical%20features.html for continuous and categorical data and give it a try with different models. I used SVC (from sklearn) and NN (from keras). somehow both of the method I used get crashed and restart the kernel when it try to get the explanation (exp=...), code below.
`
#1 using SVC
predict_fn = lambda x: svm_linear.predict_proba(encoder.transform(x))
explainer = lime.lime_tabular.LimeTabularExplainer(train ,feature_names = feature_names,class_names=class_names, categorical_features=categorical_features, categorical_names=categorical_names, kernel_width=3)
all_explains = {}
for i in range(test.shape[0]):
exp = explainer.explain_instance(test[i], predict_fn, num_features=5)
all_explains[i]=exp`
`
#2 using NN
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(32, input_dim=encoded_train.shape[1], activation='relu'))
model.add(Dense(32,activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy'])
model.fit(encoded_train_toa, labels_train, epochs=30, batch_size=128)
score = model.evaluate(encoded_test_toa, labels_test, batch_size=128)
def trans(x):
x = encoder.transform(x).toarray()
return model.predict_proba(x)
import lime
from lime import lime_tabular
explainer = lime.lime_tabular.LimeTabularExplainer(train ,feature_names = feature_names,class_names=class_names,
categorical_features=categorical_features,
categorical_names=categorical_names)
all_explains={}
predict_fn = lambda x: trans(x)
for i in range(test.shape[0]):
temp = test[i,:]
exp = explainer.explain_instance(temp, predict_fn, num_features=5)
all_explains[i]=exp
`
is SVM and NN not supported yet for numerical data? Because I have no problem using it on tree-based classifiers.
The text was updated successfully, but these errors were encountered: