Force_plot function with link=logit displays incorrect results for feature importance #1145

ehuijzer · 2020-04-08T15:30:53Z

Description
Force_plot does not display correct feature importance in case of link=logit

identical logit shap values result in different importance
base value + feature importance effects <> output value

Steps/Code to Reproduce

import shap
import numpy as np
shap.force_plot(base_value=-3,
                shap_values=np.array([1,1,-1]),
                feature_names=['PosFeat1', 'PosFeat2', 'NegFeat1'],
                matplotlib=True,
                link="logit")

Expected Results
PosFeat1, PosFeat2 and NegFeat1 all have exactly the same shap value (either positive or negative).
Each feature is represented by same size effect in plot.
Difference in length of negative and positive effects should equal difference base value vs output value

Actual Results
PosFeat2 effect is smaller than PosFeat1 effect.
Sum PosFeat effect - NegFeat1 effect < Output value - Base value

Analysis
Logit function is not linear, therefore the calculation of the effect is dependent of the order (and base value) of the calculation.

Proposed solution

Approximate the logistic function by a linear function
Calculate the slope: (proba output value - proba base value) / sum shap values
(Special case sum=0; use derivative of logistic: exp(shap) / (exp(shap) + 1)^2
Multiply all shap values by the slope

This way identical shap values result in identical probability effects.
Base value and feature effects sum up to the output value.

The text was updated successfully, but these errors were encountered:

ehuijzer · 2020-04-17T12:08:00Z

@slundberg I've seen some discussion on this topic in #238 and #29
Do you agree with my comment and proposed fix in pull request #1148 ?

slundberg · 2020-04-22T18:31:51Z

Sorry have been getting back on top of things after being out sick. Great catch! But the right solution I think is to match the behavior of the JS version of the plot. In the JS version we leave the pixels in logit space, but then change the tick marks to reflect a non-linear progress of values from the logit. Will comment on the PR with more thoughts.

This reverts commit 36f33e8

Fixes #1145: Force_plot function with link=logit & matplotlib=True, displays incorrect results for feature importance

ehuijzer pushed a commit to ehuijzer/shap that referenced this issue Apr 10, 2020

Fix shap#1145

36f33e8

ehuijzer mentioned this issue Apr 10, 2020

Fixes #1145: Force_plot function with link=logit & matplotlib=True, displays incorrect results for feature importance #1148

Merged

ehuijzer pushed a commit to ehuijzer/shap that referenced this issue Apr 23, 2020

Revert "Fix shap#1145"

0ba3932

This reverts commit 36f33e8

ehuijzer pushed a commit to ehuijzer/shap that referenced this issue Apr 23, 2020

Update fix shap#1145 following comments

3e42bd3

slundberg closed this as completed in #1148 Apr 23, 2020

slundberg added a commit that referenced this issue Apr 23, 2020

Merge pull request #1148 from ehuijzer/master

575e791

Fixes #1145: Force_plot function with link=logit & matplotlib=True, displays incorrect results for feature importance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force_plot function with link=logit displays incorrect results for feature importance #1145

Force_plot function with link=logit displays incorrect results for feature importance #1145

ehuijzer commented Apr 8, 2020 •

edited

ehuijzer commented Apr 17, 2020

slundberg commented Apr 22, 2020

Force_plot function with link=logit displays incorrect results for feature importance #1145

Force_plot function with link=logit displays incorrect results for feature importance #1145

Comments

ehuijzer commented Apr 8, 2020 • edited

ehuijzer commented Apr 17, 2020

slundberg commented Apr 22, 2020

ehuijzer commented Apr 8, 2020 •

edited