Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catboost bug #7

Closed
nilslacroix opened this issue May 19, 2022 · 3 comments
Closed

Catboost bug #7

nilslacroix opened this issue May 19, 2022 · 3 comments

Comments

@nilslacroix
Copy link

nilslacroix commented May 19, 2022

Catboost produces a TreeEnsemble has no "num_nodes" error with this code. Btw do you support a background dataset parameter, like in shap for "interventional" vs "tree_path_dependent"? Because if your underlying code uses the "interventional" method this might be related to this bug: shap/shap#2557

from catboost import CatBoostRegressor
import  fasttreeshap

X, y = shap.datasets.boston()

model = CatBoostRegressor(task_type="CPU",logging_level="Silent").fit(X, y)
explainer = fasttreeshap.TreeExplainer(model, algorithm="v2", n_jobs=-1)
shap_values = explainer(X)

# visualize the first prediction's explanation
shap.plots.waterfall(shap_values[25])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [131], in <cell line: 13>()
     10 # explain the model's predictions using SHAP
     11 # (same syntax works for LightGBM, CatBoost, scikit-learn, transformers, Spark, etc.)
     12 explainer = fasttreeshap.TreeExplainer(model, algorithm="v2", n_jobs=-1)
---> 13 shap_values = explainer(X)
     15 # visualize the first prediction's explanation
     16 shap.plots.waterfall(shap_values[25])

File ~\miniconda3\envs\Master\lib\site-packages\fasttreeshap\explainers\_tree.py:256, in Tree.__call__(self, X, y, interactions, check_additivity)
    253     feature_names = getattr(self, "data_feature_names", None)
    255 if not interactions:
--> 256     v = self.shap_values(X, y=y, from_call=True, check_additivity=check_additivity, approximate=self.approximate)
    257 else:
    258     assert not self.approximate, "Approximate computation not yet supported for interaction effects!"

File ~\miniconda3\envs\Master\lib\site-packages\fasttreeshap\explainers\_tree.py:379, in Tree.shap_values(self, X, y, tree_limit, approximate, check_additivity, from_call)
    376 algorithm = self.algorithm
    377 if algorithm == "v2":
    378     # check if memory constraint is satisfied (check Section Notes in README.md for justifications of memory check conditions in function _memory_check)
--> 379     memory_check_1, memory_check_2 = self._memory_check(X)
    380     if memory_check_1:
    381         algorithm = "v2_1"

File ~\miniconda3\envs\Master\lib\site-packages\fasttreeshap\explainers\_tree.py:483, in Tree._memory_check(self, X)
    482 def _memory_check(self, X):
--> 483     max_leaves = (max(self.model.num_nodes) + 1) / 2
    484     max_combinations = 2**self.model.max_depth
    485     phi_dim = X.shape[0] * (X.shape[1] + 1) * self.model.num_outputs

AttributeError: 'TreeEnsemble' object has no attribute 'num_nodes'
jlyang1990 added a commit that referenced this issue May 19, 2022
@jlyang1990
Copy link
Collaborator

Fixed this issue by avoiding "memory check" for CatBoost, since CatBoost is not supported in the current version of fasttreeshap (mentioned in #6).

fasttreeshap is built only for "tree_path_dependent". You may still run "interventional" in fasttreeshap anyway, but its performance should be the same as in shap. I would suggest you to post issues related to "interventional" directly in shap GitHub page.

@nilslacroix
Copy link
Author

Is this also true For xgboost and lgbm? From my understanding in the Paper "tree path dependent" is the better Method For explaining model Performance and interventional is used to explain Relationships in the Data. Also "interventional" is a lot slower so wouldnt a fast tree shap Method make a lot of sense For it?

@jlyang1990
Copy link
Collaborator

jlyang1990 commented May 20, 2022

Yes. fasttreeshap accelerates the shap value computation for xgboost and lgbm only for "tree_path_dependent".

Thanks for your suggestion! It may make sense to accelerate "interventional" as well, however the algorithms used in "tree_path_dependent" and "interventional" are totally different. It is actually much harder to accelerate "interventional" (and I actually doubt the feasibility of accelerating "interventional" from algorithm side), and thus it is out of the scope of this package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants