Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIME vs feature importance #180

Closed
germayneng opened this issue Apr 26, 2018 · 14 comments
Closed

LIME vs feature importance #180

germayneng opened this issue Apr 26, 2018 · 14 comments

Comments

@germayneng
Copy link

germayneng commented Apr 26, 2018

Hi,

I have a question regarding the feature importance vs LIME.

For the adult data set, we can see the feature importance of my model -

importance

However, when I set plotting various lime plots, I will post a few:

1

2

3

I ran around 20 plots and mostly we can see for example the variable marital status to be used in the decision. However, for the feature importance, it is slightly low. Is there a reason for this?

feature importance let us know that the more important features are on the higher nodes for splitting. For the LIME, it is ordered by values. Is it correct to understand that more important features does not necessarily means that it will result in larger gain/loss in the LIME?

@bbennett36
Copy link

Total 'gain' for capital gain = 0.10

total 'gain' for capital for class >50k = 0.5
total 'gain' for capital for class <50k = 0.5

0.5 + 0.5 = 0.10

The first plot is the total.

The next 2 plots are showing the total for each class which added is would show the first plot.

@germayneng
Copy link
Author

germayneng commented Apr 30, 2018

@bbennett36 are you saying that for the feature importance ( which is the first plot) is the average of all the gains from the LIME plots? Because the 3 lime plots are only 3 random points of a test set..

Also, if the feature importance is the total, it does not make sense because age is the highest. But out of 20 over plots of 20 random points, gain of age is not that high at all

@bbennett36
Copy link

No. I just noticed your plots are showing feature importance only for 'Class > 50k' (Which I'm summing is a classification problem). I'm going to guess that if you look at the gain for 'Class < 50k', and add up the gain for both classes, it will equal to the totals that you're seeing in the first plot.

Does that make sense? It looks like your only looking at the gain for 1 class and wondering why it's not equal to the total. You need to plot the other class and see if they add up.

@germayneng
Copy link
Author

germayneng commented May 2, 2018

@bbennett36

The first plot is the overall plot of feature importance from the model itself.

The subsequent plots are the LIME plot based off random points of a test set. As such, i do not understand how the logic of adding the plots from the lime to explain the feature importance plot.
Each prediction can be interpreted from the LIME but just summing up the gain / losses to obtain the probability..
Also, this is indeed a classification problem for identifying if class > 50.

Isnt feature importances based off how high it is on the tree and as such, the number represents the fraction of the input samples. (see : http://scikit-learn.org/stable/modules/ensemble.html#random-forest-feature-importance). higher fraction == more splits are based off this feature, It doesnt translate directly to the LIME, which are the weights.

So, tldr; i believe you are interpreting the feature importance plot wrong.

What i am not making sense is that why does the feature importance seems to contradict the lime plots. If age is of higher importance, does this mean that for the local interpretation by lime, each prediction should use AGE as a higher weightage or does this not hold true?

@mizukasai
Copy link

@germayneng
Have you checked how well is LIME explaining your model ? What's the approximation error ?

@germayneng
Copy link
Author

@M212 what is the function to obtain the approximation error for LIME?

@mizukasai
Copy link

@germayneng
lime.explanation.Explanation class has the attribute score
for example :
explainer = lime.lime_tabular.LimeTabularExplainer(train)
exp = explainer.explain_instance(sample)
exp.score

@marcotcr
Copy link
Owner

marcotcr commented May 2, 2018

Adding LIME explanations up should not result in the feature importance weights - @bbennett36 is interpreting the feature importance graph incorrectly.

@germayneng You are correct: more important features according to feature importance in random forests are not necessarily going to show up with higher weights with LIME. Some features may have a lot of impact on individual predictions, but may be fragmented across the tree and thus get low feature importance. One quick thing you can do to check explanations is to test them - for these points you got explanations for, try perturbing capital gain , education and age and see the impact that those changes have.

@germayneng
Copy link
Author

@M212 i have the exp.score of 0.38905211797144262. how do i interpret this?

@marcotcr Thank you for your reply! Ok i understood. Basically from another example here I can see that higher features (higher feature importance) does not always result in highest / lowest gain / loss when it comes to predicting the target. What the higher importance feature does is to do the splitting on the top most nodes.

Also when you mean perturbing, is it as per your tutorial examples where you run through all the various values for a particular variable to identify the impact on the target variable while keeping other variable constant?

@marcotcr
Copy link
Owner

marcotcr commented May 3, 2018

Yes, but you can also do it with multiple variables at a time. Basically that is how LIME is coming up with these weights in the first place.

@germayneng
Copy link
Author

germayneng commented May 3, 2018

@marcotcr can i say that the concept of lime pertube is similar to partial dependence plot except it is being localized ?

Also may i ask what are the default range to pertube the variables? Do you have a standard practice?

@mizukasai
Copy link

@germayneng The score you see is the sklearn.Ridge score for the perturbed data and the labels predicted by your model. It is an R² score
@marcotcr is there a score limit after which we can no longer consider that LIME is approximating the model well locally ?

@germayneng
Copy link
Author

@M212 that make sense since LIME uses ridge regression under the hood. But isnt R2 not a good gauge since we can add more variables and it will increase

@marcotcr
Copy link
Owner

@germayneng I don't think it is similar to PDP - you don't see how the output changes as a function of the input for numerical features, you don't look at a feature at a time, etc.
For perturbing: if the data is categorical or discretized, we sample from a multinomial with probabilities given by the distribution in the training data. For continuous (non discretized) data we sample from a normal with mu and sigma estimated from the trianing data.

@mizukasai I think that threshold is application dependent, I don't think I can come up with a threshold for everyone : )

@marcotcr marcotcr closed this as completed Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants