Add h2o.make_leaderboard function to score & compare a set of models #12152

exalate-issue-sync · 2023-05-13T01:07:12Z

The idea of this function is to score a bunch of models and compare their performance on a "leaderboard". This doesn't need to be a list of AutoML models, it could be a simple list of models, a grid of models, or an AutoML object (from which we retrieve the models).

You should be able to use this function on a new dataset, or create a leaderboard from stored metrics (train/valid/xval). It's similar to the h2o.peformance() function in that sense, but instead of returning the performance objects, it will simply return an H2OFrame of metrics (rows = models, cols = metrics).

A few design ideas:

user specifies a desired scoring metric and a test_frame or one of train/valid/xval = True and it returns a two-column table with model_id and score. R example:
{code}
lb <- h2o.create_leaderboard(models, newdata = NULL, train = FALSE, valid = FALSE,
xval = FALSE, metric = "AUTO")
{code}
user specifies a test_frame or one of train/valid/xval = True and it returns a frame with a mdoel_id column and all the available metrics (AUC, MSE, etc) are also returned as columns, and it would be sorted by a user-defined (or default) sort_metric. R example:
{code}
lb <- h2o.create_leaderboard(models, newdata = NULL, train = FALSE, valid = FALSE,
xval = FALSE, sort_metric = "AUTO")
{code}
user specifies a user-defined (or default) sort_metric and sort_data and we'd return everything else. Columns would be model_id and then all available metrics columns (where some can be null): train_auc, valid_auc, xval_auc, newdata_auc, train_mse, valid_mse, xval_mse, newdata_mse, etc. R example:
{code}
lb <- h2o.create_leaderboard(models, newdata = NULL, sort_metric = "AUTO", sort_data = "AUTO")
{code}

The models argument would support multiple types of input (which could all be translated to a list of model_ids before sending to the backend, if that's easier): list of models, list of model ids, grid, (maybe a list of grids), and an automl object.

Let's do a check to make sure that the models are all of the same type (binomial, multiclass, regression).

The text was updated successfully, but these errors were encountered:

exalate-issue-sync · 2023-05-13T01:07:13Z

Erin LeDell commented: Hi [~accountid:557058:9328661f-241f-4a0f-9d9a-d4e78ef05ba0] just tagging you here since this ticket interested you.

exalate-issue-sync · 2023-05-13T01:07:15Z

Ruslan Dautkhanov commented: Thank you Erin !

exalate-issue-sync · 2023-05-13T01:07:17Z

Sebastien Poirier commented: Client API decision based on discussion:

{code:r}#'
#' @param models: list of anything among models, grids, automl or list of keys (container objects like automl and grids would be flattened)
#' @param newdata: H2OFrame (optional)
#' @param sort_metric: str. One of H2O supported metrics + AUTO (default).
#' @param scoring_data: str. One of AUTO (default), train, valid, xval. AUTO means "use first available among (newdata, xval, valid, train)".

h2o.make_leaderboard <- function(
models,
newdata = NULL,
sort_metric = "AUTO",
scoring_data = "AUTO"
){code}

Backend will generate warnings in case of inconsistencies depending on the choice of {{scoring_data}}.
For example:

{{train}}: if not all models were trained with same training_frame
{{valid}}: if not all models were trained with same validation_frame
{{xval}}: if not all models were trained with same training_frame + nfolds/fold-column.

If {{valid}} is set and one model was trained without validation frame, an error should be raised.

if {{scoring_data='AUTO'}} is set, the first strategy common to ALL models should be chosen. For example, some models may have been trained with {{xval}} and others without, in which case we fallback to {{valid}} and if some models don’t have any validation frame, we fallback to {{train}}.

Finally, at first, we will return all default metrics defined in AutoML leaderboard.

exalate-issue-sync · 2023-05-13T01:07:18Z

Sebastien Poirier commented: Also using name {{make_leaderboard}} instead of {{create_leaderboard}} for consistency: R+Py APIs seem to favour {{make_}} prefix over {{create_}}

hasithjp · 2023-05-15T06:52:45Z

JIRA Issue Migration Info

Jira Issue: PUBDEV-5280
Assignee: Tomas Fryda
Reporter: Erin LeDell
State: Resolved
Fix Version: 3.38.0.1
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#6225

hasithjp assigned tomasfryda May 15, 2023

hasithjp closed this as completed May 15, 2023

hasithjp added the fixVersion/3.38.0.1 label May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add h2o.make_leaderboard function to score & compare a set of models #12152

Add h2o.make_leaderboard function to score & compare a set of models #12152

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

hasithjp commented May 15, 2023

Add h2o.make_leaderboard function to score & compare a set of models #12152

Add h2o.make_leaderboard function to score & compare a set of models #12152

Comments

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

hasithjp commented May 15, 2023