Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shap_values for tree-based models doesn't set check_additivity=False as expected #866

Closed
jcreinhold opened this issue Mar 27, 2024 · 3 comments · Fixed by #872
Closed

shap_values for tree-based models doesn't set check_additivity=False as expected #866

jcreinhold opened this issue Mar 27, 2024 · 3 comments · Fixed by #872

Comments

@jcreinhold
Copy link
Contributor

jcreinhold commented Mar 27, 2024

For, e.g., CausalForestDML, the desired behavior (#458) of EconML is to set check_additivity=False when computing shap_values (otherwise, the computation will frequently fail for reasons unbeknownst to me).

This kwarg is set here where, if the Explainer class is Tree, check_additivity is set to False.

This works with shap<=0.42.1; however, in shap>=0.43.0 the name of the class was changed to TreeExplainer.

Given that EconML states it's compatible with shap >= 0.38.1, < 0.44.0, this represents a bug.

Some potential solutions:

  1. Change the maximum version of shap to 0.44.0 to 0.42.1. (This doesn't seem desirable.)
  2. Change the code to:
# Simple, but potentially not future-proof.
if explainer.__class__.__name__ in {"Tree", "TreeExplainer"}:
    shap_out = explainer(F, check_additivity=False)
  1. Change to code to:
# This might have false positives but might be more future proof.
if "Tree" in explainer.__class__.__name__:
    shap_out = explainer(F, check_additivity=False)
  1. Change the code to:
import inspect

# This seems like the most robust option and will always be correct
if "check_additivity" in inspect.signature(explainer).parameters:
    shap_out = explainer(F, check_additivity=False)

(It seems that this parameter is only used in the TreeExplainer and the DeepExplainer, so the signature inspection would probably be fine, and we probably want to set this kwarg to False all the time regardless (at least in this package)).

Happy to submit a PR with whatever approach though.

@kbattocchi
Copy link
Collaborator

Thanks for the feedback - I agree that option 4 seems like the best approach, and we'd be happy to take a PR fixing it. I am surprised that our existing tests in test_shap.py don't catch this (because they do cover CausalForestDML and they run against shap==0.43); please add a test that fails without your fix (or modify at least one of the existing tests to do so).

Also, we'd be happy to allow a higher upper bound (say shap<0.46) if there are not other breaking changes that are hard to work around; the only reason that we have a specific upper bound is that shap releases have had breaking changes that have broken functionality in the past.

@jcreinhold
Copy link
Contributor Author

jcreinhold commented Apr 4, 2024

@kbattocchi I'm not able to consistently reproduce this error. See #445 for another valiant but ultimately futile effort to reproduce it consistently.

For posterity, here is (essentially) the code I was running when I (occasionally) ran into the error:

Versions:

  • Python version: 3.8.17
  • econml version: 0.15.0
  • numpy version: 1.23.5
  • scikit-learn version: 1.3.2
  • scipy version: 1.10.1
  • shap version: 0.43.0
import econml
import numpy
import sklearn.ensemble as ens

seed = 1337
rng = np.random.default_rng(seed)

n = 1_000
n_x = 2

X = rng.uniform(-1.0, 1.0, size=(n, n_x))
A = rng.binomial(1, 0.5 + 0.5 * X[:, 0])
Y = rng.binomial(1, 0.3 + 0.5 * A + 0.2 * X[:, 0])

X_ = np.hstack([X, A[:, None]])
A_ = rng.binomial(1, 0.5, size=(n,))

model = dml.CausalForestDML(
    model_y=ens.RandomForestClassifier(random_state=seed),
    model_t=ens.RandomForestClassifier(random_state=seed),
    discrete_treatment=True,
    discrete_outcome=True,
    random_state=seed,
)

model.fit(Y, A_, X=X_)
shap_values = model.shap_values(X_)

Part of the problem with reproducing this error is that EconML doesn't currently allow you to pass a seed to the Explainer class.

I'd be happy to add in that keyword argument into the shap_values method; however, that'd involve changing a good number of files (e.g., the LinearCateEstimator and anything that overwrites that method).

This change isn't strictly necessary for this issue (although I believe it's generally desirable), and, even if I do find a configuration that triggers the error on my machine, I doubt it'll be reproducible across machines.

Any alternative ideas? I can quickly submit a PR with just the change proposed in option 4 and we could revisit adding the seed keyword argument to shap_values (if that's desirable).

@kbattocchi
Copy link
Collaborator

Part of the problem with reproducing this error is that EconML doesn't currently allow you to pass a seed to the Explainer class.

I'd be happy to add in that keyword argument into the shap_values method; however, that'd involve changing a good number of files (e.g., the LinearCateEstimator and anything that overwrites that method).

This change isn't strictly necessary for this issue (although I believe it's generally desirable), and, even if I do find a configuration that triggers the error on my machine, I doubt it'll be reproducible across machines.

Any alternative ideas? I can quickly submit a PR with just the change proposed in option 4 and we could revisit adding the seed keyword argument to shap_values (if that's desirable).

I agree that allowing the seed to be passed as an optional argument seems beneficial, not just for testing but for reproducibility more broadly. If you don't mind adding these changes to your PR, that would be great, but if that's too much work I'm also happy to merge it as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants