Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the Model Explainability interface in R & Python #7836

Open
exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments
Open

Improve the Model Explainability interface in R & Python #7836

exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments
Assignees

Comments

@exalate-issue-sync
Copy link

This is just an epic to collect tickets related to improvements in the R and Python interface for H2O Explainability.

A brief list (to be converted into JIRAs):

  • F-or AutoML objects, don’t use the whole model id in the labels/axes/legend of the plots that are the output of h2o.explain() – just use shortened model_id names. The date at the end does not help with visually understanding how the different models/algos compare (the long ids are distracting to the eye, so we could just remove it from the display by default…maybe it can override to use full model_id in plot_overrides?). Would be nice to view the model names as just: StackedEnsemble_AllModels, GLM_1, DRF_1, XGBoost_3, GBM_grid__1__model_3, etc.-
  • Model correlation has interpretable models (GLM) highlighed in red text (in Python), but we don’t explain what the red is for, and we don’t do it in the other visuals like Varimp Heatmap. Need to check if this is also the case in R.
  • Move explanation descriptions into a JSON or text file so there’s just one source and read that into R and Python (easier to edit a single source). Then make some updates to the descriptions.
  • I think we would benefit by using a title for plot name and subtitle for the model_id in all the R plots since they are pretty squished when using inside RStudio.
    [https://www.datanovia.com/en/blog/ggplot-title-subtitle-and-caption/|https://www.datanovia.com/en/blog/ggplot-title-subtitle-and-caption/]
  • -I wonder if the Leaderboard printed out (specifically when you pass in an AutoML object) should just be top 20 models by default?-
  • We can default the Leaderboard to 20 models, but we could find a way to allow the user to override this with {{plot_overrides}}. e.g. {{plot_overrides = list(leaderboard=list(nrow=-1))}} to get all (or maybe there’s something better than -1 to use here, like “ALL” if they want to show all rows)? or they can set to a particular number, like 50.
  • Add more information at the top of the explain print-out for AutoML specific stats (how many models of each type, and best score (using default loss) for each algo type.
  • Visual improvement tweaks/ideas for the printed Leaderboard
    ** is there a way to control the number of decimal places shown? we could reduce to about 5 decimal places and get the table to be skinner & fit on the page better
    ** is it easy to left-align the model names in the first column? then it would be easy to read the type of model better.
  • I am wondering since the user passes the {{newdata}} test frame explicitly in the {{h2o.explain()}} function if we just shouldn't use the test set leaderboard metrics instead of the default CV metrics. But then it’s delivering a different “view” of the leaderboard than the internal AutoML object has… so it’s going to have some inconsistency either way, we just have to choose which type of inconsistency is better/worse.
  • Add learning curve of leader model (let’s decide if we want to plot train vs CV error or validation error or error for a single fold, etc).
@exalate-issue-sync
Copy link
Author

Hud Wahab commented: Not sure if this is related, but {{.explain()}} doesn’t seem to be available for 3.30.1.3. See [https://h2oai.atlassian.net/browse/PUBDEV-7850|https://h2oai.atlassian.net/browse/PUBDEV-7850|smart-link]

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-7806
Assignee: Tomas Fryda
Reporter: Erin LeDell
State: Open
Fix Version: Backlog
Attachments: N/A
Development PRs: N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants