-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] SHAP library: Is it possible to have a node in LightGBM that has no coverage (no samples assigned to it)? #6388
Comments
Given a dataset These checks explicitly prevent the addition of splits that result in 0 samples on one side of the split.
Here's a minimal example using import lightgbm as lgb
import numpy as np
import json
from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=10_000, n_features=1, centers=3)
with open("forced_splits.json", "w") as f:
f.write(json.dumps(
{"feature": 0, "threshold": np.max(X) + 1.0}
))
# construct the estimator + fit the model
bst = lgb.train(
params={
"forcedsplits_filename": "forced_splits.json",
"objective": "multiclass",
"min_gain_to_split": 0.0,
"min_data_in_leaf": 0,
"num_classes": 3,
"num_iterations": 10,
"num_leaves": 31,
"verbose": 1
},
train_set=lgb.Dataset(X, label=y, params={"min_data_in_bin": 1})
)
Ok, so given that I just trained a LightGBM model on dataset
|
@jameslamb Thank you very much! Really appreciating your help! |
Description
In shap/shap#3574 we discuss if it possible to have a node in LightGBM that has no samples assigned to it during training. This question is important to clarify if it is necessary to assert coverage of the nodes in case of no background dataset is is passed and SHAP computations are based on tree paths. One of the SHAP maintainers asked for a clarification on this question.
To summarize the lengthy discussion at shap/shap#3574:
There exists a case (multiclass prediction with LightGBM, interactions=True, data=None, feature_perturbation='tree_path_dependent') where SHAP explanations fail due to an
But since this works in all other cases (non multiclass cases) and also with interactions=False in the multiclass case (this case is using the LightGBM implementation) and since it even works if one removes the assert of coverage, the question arises if for this single case the assert of the coverage is necessary at all.
It would be necessary to assert coverage if it is possible to get "empty" nodes during training, which would be an uncovered node. Hence the question if it is possible to have a node in LightGBM that has no samples assigned to it during training.
Please point me toward any documentation that covers this question or alternatively if one of the maintainers could clarify this question would help us a lot to get forward! Thank you very much in advance!
Reproducible example
Environment info
Win11
Python 3.11.8
LightGBM 4.3.0
SHAP 0.45.0
Additional Comments
The text was updated successfully, but these errors were encountered: