Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pred_contrib value doesn't add up to prediction in lightgbm regression #3998

Open
yolimonsta opened this issue Feb 18, 2021 · 9 comments
Open
Labels

Comments

@yolimonsta
Copy link

I'm using SHAP with LightGBM for regression.

The problem is that SHAP values don't sum up the predicted value.

Predicted values are all in the 0 - 10000 range. I have ~9000 features.

Screen Shot 2021-02-18 at 16 39 41
Now the expected value is 60.774, while the sum of shap values is -4040, there is no way it could add up 2291

How is that possible? Thank you very much

@StrikerRUS
Copy link
Collaborator

@yolimonsta
Hi!
Are you using linear_tree param?
Please provide more details for the issue.
It will be great to have fully reproducible example (MCVE) so that we will be able to help.

@yolimonsta
Copy link
Author

Hi! I am not using linear_tree = True. My model is really complex, and have a lot of missing value in feature. Could that be the reason?

@StrikerRUS
Copy link
Collaborator

Please provide either reproducible example or your model.

@no-response
Copy link

no-response bot commented Mar 26, 2021

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

@no-response no-response bot closed this as completed Mar 26, 2021
@shiyu1994
Copy link
Collaborator

@yolimonsta Thanks for using LightGBM, what objective function are you using? It is Quantile Regression / L1 Regression / MAPE Regression?

@shiyu1994 shiyu1994 reopened this Apr 5, 2021
@no-response
Copy link

no-response bot commented Apr 5, 2021

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

@no-response no-response bot closed this as completed Apr 5, 2021
@mlisovyi
Copy link
Contributor

mlisovyi commented Nov 2, 2022

I bumped into the same problem today. The problem seems to be related to the choice of the objective function for a regression model, in particular affected are poisson, gamma and tweedie.

Example:

import lightgbm as lgb
import pandas as pd
from sklearn.datasets import load_boston


data = load_boston()
X = pd.DataFrame(data.data, columns=data.feature_names)
# scale the target to make the difference more pronounced
y = data.target * 1000

# try out different objective functions and for each compare prediction with the sum of feature contributions
for obj in ["l2", "l1", "huber", "fair", "poisson", "mape", "gamma", "tweedie"]:
    model = lgb.LGBMRegressor(objective=obj).fit(X, y)
    preds = model.predict(X)

    shap_values = model.predict(X, pred_contrib=True)
    sum_of_shap = shap_values.sum(axis=1)

    diff = preds - sum_of_shap
    print(
        f"{obj}: Difference= {diff[0] / preds[0]:.1%}.    Prediction = {preds[0]:.1f}, sum of shap values = {sum_of_shap[0]:.1f}"
    )

Output:

l2: Difference= -0.0%.    Prediction = 24274.4, sum of shap values = 24274.4
l1: Difference= -0.0%.    Prediction = 25055.9, sum of shap values = 25055.9
huber: Difference= 0.0%.    Prediction = 22541.8, sum of shap values = 22541.8
fair: Difference= 0.0%.    Prediction = 57641.8, sum of shap values = 57641.8
poisson: Difference= 100.0%.    Prediction = 24978.1, sum of shap values = 10.1
mape: Difference= -0.0%.    Prediction = 25148.8, sum of shap values = 25148.8
gamma: Difference= 100.0%.    Prediction = 24221.7, sum of shap values = 10.1
tweedie: Difference= 100.0%.    Prediction = 24475.2, sum of shap values = 10.1

lightgbm version: 3.3.3

@jameslamb
Copy link
Collaborator

Thanks for the example @mlisovyi and for being willing to work with us on this. This issue was previously automatically closed because the people posting in it didn't respond to requests for more information.

I'll re-open this.

@jameslamb jameslamb reopened this Nov 3, 2022
@github-actions

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants