Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H2O - h2o.xgboost support... #6

Closed
coforfe opened this issue Feb 9, 2020 · 5 comments
Closed

H2O - h2o.xgboost support... #6

coforfe opened this issue Feb 9, 2020 · 5 comments

Comments

@coforfe
Copy link

coforfe commented Feb 9, 2020

Thanks Yang Liu for your excellent package.

H2O package includes a function to get some insights about model interpretability h2o.partialPlot() and one of the algorithms they support is xgbsoot (h2o.xgboost()).

But it would be very interesting to get the kind of parametrization and outputs that your package produces.

Thanks in advance,
Carlos.

@liuyanguu
Copy link
Owner

Hi Carlos, Thank you for your kind comment! I just roughly checked it out I know h20 but haven't used it myself. Is the partial plot the same as the dependence plot? What is its difference to SHAPforxgboost::shap.plot.dependence()?
e.g.
image

@coforfe
Copy link
Author

coforfe commented Feb 11, 2020

Hi Yang Liu,
Yes, it is kinda of equivalent.

Partial dependence plot gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response.

The output of one of these charts is like this one:
ppool_lineas_cliente

@coforfe
Copy link
Author

coforfe commented Feb 18, 2020

Hi Yang Liu,
A collegue told me a way to use H2O function to get SHAP values h2o.predict_contributions() and I figured out how to use those values and your functions to get the same charts.

Attached is a reproducible example. I am using these packages and versions:

  • data.table: 1.12.8
  • h2o: 3.28.0.3
  • SHAPforXgboost: 0.0.3
#----- ShapforXgboost - Reproducible Example with H2O.

library(data.table)
library(h2o)

h2o.init()
   prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
   prostate <- h2o.uploadFile(path = prostate_path)
   prostate_gbm <- h2o.gbm(3:9, "AGE", prostate)
   h2o.predict(prostate_gbm, prostate)
   
   # Get Shap values with h2o.predict_contributions
   contri_dt <- as.data.table(h2o.predict_contributions(prostate_gbm, prostate))
   # Prepare Shap outputs 
   contri_gd <- contri_dt[ , BiasTerm := NULL]
   dattrain_gd <- prostate[, c(4:ncol(prostate)) ]
   
   library("SHAPforxgboost")
   shap_long <- shap.prep(shap_contrib = contri_gd, X_train = dattrain_gd)
   # (Notice that there will be a data.table warning from `melt.data.table` due to `dayint`    coerced from integer to double)
   
   # **SHAP summary plot**
   shap.plot.summary(shap_long)
   
   # sometimes for a preview, you want to plot less data to make it faster using `dilute`
   shap.plot.summary(shap_long, x_bound  = 1.2, dilute = 10)
   
   # option 2: supply a self-made SHAP values dataset (e.g. sometimes as output from cross   -validation)
   shap.plot.summary.wrap2(contri_dt, as.matrix(dattrain_gd))
   
   #------------- Change axis values ---------------------------
   # **SHAP dependence plot**
   # if without y, will just plot SHAP values of x vs. x
   shap.plot.dependence(data_long = shap_long, x = "PSA")
   
   # optional to color the plot by assigning `color_feature` (Fig.A)
   shap.plot.dependence(data_long = shap_long, x= "PSA",
                        color_feature = "GLEASON")
   
   # optional to put a different SHAP values on the y axis to view some interaction (Fig.B)         
   shap.plot.dependence(data_long = shap_long, x= "PSA",
                        y = "GLEASON", color_feature = "GLEASON")     
   
   
   #-------------- Force Plot
   # choose to show top 4 features by setting `top_n = 4`, set 6 clustering groups.  
   plot_data <- shap.prep.stack.data(shap_contrib = contri_gd, top_n = 4, n_groups = 6)
   
   # choose to zoom in at location 500, set y-axis limit using `y_parent_limit`  
   # it is also possible to set y-axis limit for zoom-in part alone using `y_zoomin_limit`  
   shap.plot.force_plot(plot_data, zoom_in_location = 100, y_parent_limit = c(-1,1))
   
   # plot by each cluster
   shap.plot.force_plot_bygroup(plot_data)

h2o.shutdown()

Please let me know if you can reproduce it and if so, would it be possible to include it as another working example?.

Thanks in advance,
Carlos.

@coforfe coforfe closed this as completed Feb 18, 2020
@liuyanguu
Copy link
Owner

Thank you Carlos! This is super awesome. I will include it as an example.

@coforfe
Copy link
Author

coforfe commented Feb 19, 2020

Thanks to you!.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants