Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add h2o.make_leaderboard function to score & compare a set of models #12152

Closed
exalate-issue-sync bot opened this issue May 13, 2023 · 5 comments
Closed
Assignees

Comments

@exalate-issue-sync
Copy link

The idea of this function is to score a bunch of models and compare their performance on a "leaderboard". This doesn't need to be a list of AutoML models, it could be a simple list of models, a grid of models, or an AutoML object (from which we retrieve the models).

You should be able to use this function on a new dataset, or create a leaderboard from stored metrics (train/valid/xval). It's similar to the h2o.peformance() function in that sense, but instead of returning the performance objects, it will simply return an H2OFrame of metrics (rows = models, cols = metrics).

A few design ideas:

  • user specifies a desired scoring metric and a test_frame or one of train/valid/xval = True and it returns a two-column table with model_id and score. R example:
    {code}
    lb <- h2o.create_leaderboard(models, newdata = NULL, train = FALSE, valid = FALSE,
    xval = FALSE, metric = "AUTO")
    {code}
  • user specifies a test_frame or one of train/valid/xval = True and it returns a frame with a mdoel_id column and all the available metrics (AUC, MSE, etc) are also returned as columns, and it would be sorted by a user-defined (or default) sort_metric. R example:
    {code}
    lb <- h2o.create_leaderboard(models, newdata = NULL, train = FALSE, valid = FALSE,
    xval = FALSE, sort_metric = "AUTO")
    {code}
  • user specifies a user-defined (or default) sort_metric and sort_data and we'd return everything else. Columns would be model_id and then all available metrics columns (where some can be null): train_auc, valid_auc, xval_auc, newdata_auc, train_mse, valid_mse, xval_mse, newdata_mse, etc. R example:
    {code}
    lb <- h2o.create_leaderboard(models, newdata = NULL, sort_metric = "AUTO", sort_data = "AUTO")
    {code}

The models argument would support multiple types of input (which could all be translated to a list of model_ids before sending to the backend, if that's easier): list of models, list of model ids, grid, (maybe a list of grids), and an automl object.

Let's do a check to make sure that the models are all of the same type (binomial, multiclass, regression).

@exalate-issue-sync
Copy link
Author

Erin LeDell commented: Hi [~accountid:557058:9328661f-241f-4a0f-9d9a-d4e78ef05ba0] just tagging you here since this ticket interested you.

@exalate-issue-sync
Copy link
Author

Ruslan Dautkhanov commented: Thank you Erin !

@exalate-issue-sync
Copy link
Author

Sebastien Poirier commented: Client API decision based on discussion:

{code:r}#'
#' @param models: list of anything among models, grids, automl or list of keys (container objects like automl and grids would be flattened)
#' @param newdata: H2OFrame (optional)
#' @param sort_metric: str. One of H2O supported metrics + AUTO (default).
#' @param scoring_data: str. One of AUTO (default), train, valid, xval. AUTO means "use first available among (newdata, xval, valid, train)".

h2o.make_leaderboard <- function(
models,
newdata = NULL,
sort_metric = "AUTO",
scoring_data = "AUTO"
){code}

Backend will generate warnings in case of inconsistencies depending on the choice of {{scoring_data}}.
For example:

  • {{train}}: if not all models were trained with same training_frame
  • {{valid}}: if not all models were trained with same validation_frame
  • {{xval}}: if not all models were trained with same training_frame + nfolds/fold-column.

If {{valid}} is set and one model was trained without validation frame, an error should be raised.

if {{scoring_data='AUTO'}} is set, the first strategy common to ALL models should be chosen. For example, some models may have been trained with {{xval}} and others without, in which case we fallback to {{valid}} and if some models don’t have any validation frame, we fallback to {{train}}.

Finally, at first, we will return all default metrics defined in AutoML leaderboard.

@exalate-issue-sync
Copy link
Author

Sebastien Poirier commented: Also using name {{make_leaderboard}} instead of {{create_leaderboard}} for consistency: R+Py APIs seem to favour {{make_}} prefix over {{create_}}

@hasithjp
Copy link
Member

JIRA Issue Migration Info

Jira Issue: PUBDEV-5280
Assignee: Tomas Fryda
Reporter: Erin LeDell
State: Resolved
Fix Version: 3.38.0.1
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#6225

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants