New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ensemble tree shap #532
ensemble tree shap #532
Conversation
Can you get the SHAP values from python for this settings?
Thanks |
|
Thanks. But max_leaves should be bigger (at least 100). otherwise, we cannot go deep (max_depth). |
tree_method should be "exact", which matches our implementation. |
|
Thanks. Can you make max_depth = 20? Keep all other parameters as is. |
Our order roughly matches. But the values are different, especially LSTAT. How could their LSTAT and RM SHAP values are so big? Given eta = 0.05, the values cannot be big. |
just in case I mistake sth there (reproducible in https://repl.it/languages/python3) and seems the
|
I will ignore the exact values as XGBoost training algorithm may be very different from mine. Importantly, the order matches (only NOX and CRIM switch). I have checked my code many times yesterday. I am confident that it is correct. Thanks. |
BTW, the settings here are not proper for GBM, especially very depth and large trees (we set them large for complicated cases for SHAP). Can you get the SHAP values for random forest? Random Forest typically have large and deep trees. Thanks! |
you mean from sklearn or smile? |
I have tried random forest. The SHAP values are around 2.2 for top two features. This make senses. For GBM, it cannot be that high. Can you get the SHAP values from sklearn for the below settings?
|
When you get the SHAP values in python, have you reset the RNG seed or restart your python session every time? Just want to make sure that you use the same random number generator seed? Thanks. |
for xgboost, it is by default set to 0:
|
Can you please get the SHAP values for random forest? Thanks. 0 has no entropy, the worst seed for random number generator. Anyway, we don't need worry about it for this task. |
you are right. RF with sklearn give similar result around 2.2 for top 2 features
|
The feature is in the same order in GBM case? Thanks |
ordering for RF:
|
RAD is missing |
implementation for #515
similar (although not exactly same) python result for boston housing dataset: