We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For example, this needs to print True to use contribs to make SHAP plots, but won’t because the categorical column gets one-hot encoded by xgboost:
{code:python}prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv")
prostate["CAPSULE"] = prostate["CAPSULE"].asfactor() prostate["AGE"] = prostate["AGE"].asfactor() predictors = ["ID","AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] response = "CAPSULE"
from h2o.estimators import H2OXGBoostEstimator model = H2OXGBoostEstimator(ntrees = 100, learn_rate = 0.01, seed=2019) model.train(training_frame=prostate, x=predictors, y=response)
contributions = model.predict_contributions(prostate)
contributions_matrix = contributions.as_data_frame().values
shap_values = contributions_matrix[:,0:-1]
X = prostate[predictors].as_data_frame() print(X.values.shape == shap_values.shape){code}
Current workaround:
from h2o.estimators import H2OXGBoostEstimator
model = H2OXGBoostEstimator(ntrees = 100, learn_rate = 0.01, seed=2019) model.train(training_frame=prostate, x=predictors, y=response)
groups = {c: [] for c in [x.split(".")[0] for x in contributions.columns]} for c in contributions.columns: groups[c.split(".")[0]].append(c)
rc = None for k,v in groups.items(): c = contributions[v].sum(axis=1, return_frame=True) c.columns = [k] if rc is None: rc = c else: rc = rc.cbind(c)
contributions_matrix = rc.as_data_frame().values
expected_value = contributions_matrix[:,-1].min()
The text was updated successfully, but these errors were encountered:
JIRA Issue Details
Jira Issue: PUBDEV-8035 Assignee: Michal Kurka Reporter: Joseph Granados State: Resolved Fix Version: 3.32.1.1 Attachments: N/A Development PRs: N/A
Sorry, something went wrong.
No branches or pull requests
For example, this needs to print True to use contribs to make SHAP plots, but won’t because the categorical column gets one-hot encoded by xgboost:
{code:python}prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv")
Set the predictors and response; set the factors:
prostate["CAPSULE"] = prostate["CAPSULE"].asfactor()
prostate["AGE"] = prostate["AGE"].asfactor()
predictors = ["ID","AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"]
response = "CAPSULE"
from h2o.estimators import H2OXGBoostEstimator
model = H2OXGBoostEstimator(ntrees = 100, learn_rate = 0.01, seed=2019)
model.train(training_frame=prostate, x=predictors, y=response)
calculate SHAP values using function predict_contributions
contributions = model.predict_contributions(prostate)
convert the H2O Frame to use with shap's visualization functions
contributions_matrix = contributions.as_data_frame().values
shap values are calculated for all features
shap_values = contributions_matrix[:,0:-1]
X = prostate[predictors].as_data_frame()
print(X.values.shape == shap_values.shape){code}
Current workaround:
{code:python}prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv")
Set the predictors and response; set the factors:
prostate["CAPSULE"] = prostate["CAPSULE"].asfactor()
prostate["AGE"] = prostate["AGE"].asfactor()
predictors = ["ID","AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"]
response = "CAPSULE"
from h2o.estimators import H2OXGBoostEstimator
model = H2OXGBoostEstimator(ntrees = 100, learn_rate = 0.01, seed=2019)
model.train(training_frame=prostate, x=predictors, y=response)
calculate SHAP values using function predict_contributions
contributions = model.predict_contributions(prostate)
sum one-hot contribs
groups = {c: [] for c in [x.split(".")[0] for x in contributions.columns]}
for c in contributions.columns:
groups[c.split(".")[0]].append(c)
rc = None
for k,v in groups.items():
c = contributions[v].sum(axis=1, return_frame=True)
c.columns = [k]
if rc is None:
rc = c
else:
rc = rc.cbind(c)
convert the H2O Frame to use with shap's visualization functions
contributions_matrix = rc.as_data_frame().values
shap values are calculated for all features
shap_values = contributions_matrix[:,0:-1]
expected values is the last returned column
expected_value = contributions_matrix[:,-1].min()
X = prostate[predictors].as_data_frame()
print(X.values.shape == shap_values.shape){code}
The text was updated successfully, but these errors were encountered: