Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force_plot function with link=logit displays incorrect results for feature importance #1145

Closed
ehuijzer opened this issue Apr 8, 2020 · 2 comments · Fixed by #1148
Closed

Comments

@ehuijzer
Copy link
Contributor

ehuijzer commented Apr 8, 2020

Description
Force_plot does not display correct feature importance in case of link=logit

  • identical logit shap values result in different importance
  • base value + feature importance effects <> output value

Steps/Code to Reproduce

import shap
import numpy as np
shap.force_plot(base_value=-3,
                shap_values=np.array([1,1,-1]),
                feature_names=['PosFeat1', 'PosFeat2', 'NegFeat1'],
                matplotlib=True,
                link="logit")

afbeelding

Expected Results
PosFeat1, PosFeat2 and NegFeat1 all have exactly the same shap value (either positive or negative).
Each feature is represented by same size effect in plot.
Difference in length of negative and positive effects should equal difference base value vs output value

Actual Results
PosFeat2 effect is smaller than PosFeat1 effect.
Sum PosFeat effect - NegFeat1 effect < Output value - Base value

Analysis
Logit function is not linear, therefore the calculation of the effect is dependent of the order (and base value) of the calculation.

Proposed solution

  1. Approximate the logistic function by a linear function
  2. Calculate the slope: (proba output value - proba base value) / sum shap values
    (Special case sum=0; use derivative of logistic: exp(shap) / (exp(shap) + 1)^2
  3. Multiply all shap values by the slope

This way identical shap values result in identical probability effects.
Base value and feature effects sum up to the output value.

@ehuijzer
Copy link
Contributor Author

@slundberg I've seen some discussion on this topic in #238 and #29
Do you agree with my comment and proposed fix in pull request #1148 ?

@slundberg
Copy link
Collaborator

Sorry have been getting back on top of things after being out sick. Great catch! But the right solution I think is to match the behavior of the JS version of the plot. In the JS version we leave the pixels in logit space, but then change the tick marks to reflect a non-linear progress of values from the logit. Will comment on the PR with more thoughts.

ehuijzer pushed a commit to ehuijzer/shap that referenced this issue Apr 23, 2020
This reverts commit 36f33e8
ehuijzer pushed a commit to ehuijzer/shap that referenced this issue Apr 23, 2020
slundberg added a commit that referenced this issue Apr 23, 2020
Fixes #1145: Force_plot function with link=logit & matplotlib=True, displays incorrect results for feature importance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants