is summary plot only for classification? #41

jayden526 · 2018-03-06T15:37:32Z

Thank you for this amazing work. Just wondering, I want to identify variable importance using the summary plot. But my model is a tree-based regressor. I am not sure if I understand the paper correctly, I found all examples calculating shap values are all classifications. Could you please help clarify this, can this be used in regression? Thank you so much!

JuanCorp · 2018-03-06T15:51:56Z

I've used shap and summary plot for the house list price problem before, which is a regression, and the explanations work just fine, and adjust to what I would expect from a logical standpoint. For example, construction area, distance to certain places of interest, and house geographical sector were all top features. I don't have the plot at hand, but a mini app that uses an XGBoost model for house list price prediction (at least in my city), is available in my profile, albeit with some fixes that I need to do for it.

From what I've understood, the shapley values for each feature is the same as a weight or coefficient, like in regression.There's also the bias or intercept. This bias is the base value for the predictions of the model, for example, the average price of all houses in the dataset. For a single data point, each coefficient represents the impact of the feature on the final prediction. These coefficients and intercept are added, then the sigmoid function is applied to the result of the sum. The result of the sigmoid function is the prediction that the original model gave, which is a probability between 0 and 1. For regression models, the process is the same, except that the sigmoid step is skipped, since the output isn't between 0 and 1, but continuous.

@slundberg Can give you better details though, so you should wait for his output.

jayden526 · 2018-03-06T16:10:56Z

Thank you @JuanCorp, I think you are right. Even for classification the log odds needs to be computed in order to find the probability. The syntax I tried is referred to the classification example:

shap_values = shap.KernelExplainer(randomforest.predict, X_train).shap_values(X_test)
shap.summary_plot(shap_values, X_test)

is this the same as yours? at least now I can get the shap values.
@slundberg Would you mind to clarify the shap_values in regressions? If it is already mentioned in your paper, please let me know, I can check that! thank you.

jayden526 · 2018-03-06T16:43:02Z

Sorry for asking again, I sometimes have runtime error when I used different number of samples in my X_test (sometimes is ok, sometimes if I only use 100 sample of the test, this error occurs),

Exception in thread Thread-15
RuntimeError: Set changed size during iteration

Could you help me with this? Thank you!

slundberg · 2018-03-06T17:04:08Z

@jayden526 SHAP values work well with regressions, in fact the Boston housing example in the read-me is a least squares regression. The SHAP values are in the same units as the model output (for tree SHAP in XGBoost this is before the link function (such as a logistic). So if you are predicting dollars, then the units of the SHAP values will be in dollars and will sum to the output of the model.

As for the error, if there is a simple example of how you got it, please post it and I'll fix it.

FYI...If you are using a tree model I would suggest using XGBoost and getting the exact shap values vs using the model agnostic Kernel SHAP on scikit.

jayden526 · 2018-03-06T19:45:42Z

@slundberg Thank you so much! I will definitely try with Xgboost to see whether it works for me.

slundberg · 2018-03-07T03:43:13Z

sounds good

andymancodes · 2018-06-06T06:01:19Z

@slundberg Hi, thanks for the great package! I am not getting how to use my own dataset with shap? What is the use of *shap.dataset and how can I use my own datasets in the form of (X, y) with SHAP? Thanks :)

slundberg · 2018-06-06T15:56:12Z

Do you have a model and a dataset or just a dataset representing the output of the model? Perhaps clarifying what doesn't make sense about the examples in the README would be helpful.

slundberg closed this as completed Mar 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is summary plot only for classification? #41

is summary plot only for classification? #41

jayden526 commented Mar 6, 2018

JuanCorp commented Mar 6, 2018 •

edited

jayden526 commented Mar 6, 2018

jayden526 commented Mar 6, 2018

slundberg commented Mar 6, 2018

jayden526 commented Mar 6, 2018

slundberg commented Mar 7, 2018

andymancodes commented Jun 6, 2018

slundberg commented Jun 6, 2018

is summary plot only for classification? #41

is summary plot only for classification? #41

Comments

jayden526 commented Mar 6, 2018

JuanCorp commented Mar 6, 2018 • edited

jayden526 commented Mar 6, 2018

jayden526 commented Mar 6, 2018

slundberg commented Mar 6, 2018

jayden526 commented Mar 6, 2018

slundberg commented Mar 7, 2018

andymancodes commented Jun 6, 2018

slundberg commented Jun 6, 2018

JuanCorp commented Mar 6, 2018 •

edited