Cross-validation predictions not saved with h2o.saveModel #12775

exalate-issue-sync · 2023-05-13T04:04:12Z

When saving a model (which had keep_cross_validation_predictions = TRUE) using the binary format, the model will be saved, but the predictions will not. This means that if you save models and want to train stacked ensembles later, you will not be able to. The model saves key name of the predictions frame, but once the cluster is shutdown, that key is no longer valid.

Example:
{code}

fit <- h2o.gbm(y = 5, training_frame = as.h2o(train), nfolds = 3, keep_cross_validation_predictions = TRUE)
|========================================================================================================| 100%
h2o.saveModel(fit, path = "/Users/me/Downloads/foocv/")
[1] "/Users/me/Downloads/foocv/GBM_model_R_1536894672242_603"
h2o.shutdown()
Are you sure you want to shutdown the H2O instance running at http://localhost:54321/ (Y/N)? y
[1] TRUE
rm(list=ls())
h2o.init()

H2O is not running yet, starting it now...

Note: In case of errors look at the following log files:
/var/folders/gj/cm0k4b_s42j30zs376cq_5hh0000gn/T//Rtmp7Grlq4/h2o_me_started_from_r.out
/var/folders/gj/cm0k4b_s42j30zs376cq_5hh0000gn/T//Rtmp7Grlq4/h2o_me_started_from_r.err

java version "1.8.0_172"
Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)

Starting H2O JVM and connecting: . Connection successful!

R is connected to the H2O cluster:
H2O cluster uptime: 1 seconds 590 milliseconds
H2O cluster timezone: America/Los_Angeles
H2O data parsing timezone: UTC
H2O cluster version: 3.21.0.99999
H2O cluster version age: 52 minutes
H2O cluster name: H2O_started_from_R_me_zdk319
H2O cluster total nodes: 1
H2O cluster total memory: 3.56 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4
R Version: R version 3.5.0 (2018-04-23)

fit <- h2o.loadModel("/Users/me/Downloads/foocv/GBM_model_R_1536894672242_603")
fit@model$cross_validation_predictions
[[1]]
[[1]]$__meta
[[1]]$__meta$schema_version
[1] 3

[[1]]$__meta$schema_name
[1] "FrameKeyV3"

[[1]]$__meta$schema_type
[1] "Key"

[[1]]$name
[1] "prediction_GBM_model_R_1536894672242_603_cv_1"

[[1]]$type
[1] "Key"

[[1]]$URL
[1] "/3/Frames/prediction_GBM_model_R_1536894672242_603_cv_1"

[[2]]
[[2]]$__meta
[[2]]$__meta$schema_version
[1] 3

[[2]]$__meta$schema_name
[1] "FrameKeyV3"

[[2]]$__meta$schema_type
[1] "Key"

[[2]]$name
[1] "prediction_GBM_model_R_1536894672242_603_cv_2"

[[2]]$type
[1] "Key"

[[2]]$URL
[1] "/3/Frames/prediction_GBM_model_R_1536894672242_603_cv_2"

[[3]]
[[3]]$__meta
[[3]]$__meta$schema_version
[1] 3

[[3]]$__meta$schema_name
[1] "FrameKeyV3"

[[3]]$__meta$schema_type
[1] "Key"

[[3]]$name
[1] "prediction_GBM_model_R_1536894672242_603_cv_3"

[[3]]$type
[1] "Key"

[[3]]$URL
[1] "/3/Frames/prediction_GBM_model_R_1536894672242_603_cv_3"

fit@model$cross_validation_holdout_predictions_frame_id
$__meta
$__meta$schema_version
[1] 3

$__meta$schema_name
[1] "FrameKeyV3"

$__meta$schema_type
[1] "Key"

$name
[1] "cv_holdout_prediction_GBM_model_R_1536894672242_603"

$type
[1] "Key"

$URL
[1] "/3/Frames/cv_holdout_prediction_GBM_model_R_1536894672242_603"

{code}

Reported on h2ostream: https://groups.google.com/forum/#!topic/h2ostream/zoW_ewFwJAU

I'm not sure if we should try to save the predictions along with the binary model (if they were kept) or if we should write client side wrapper functions to save the CV pred frame separately at the same path (e.g. model_id.csv) and then load them up when the model is loaded.

The text was updated successfully, but these errors were encountered:

exalate-issue-sync · 2023-05-13T04:04:13Z

Neema Mashayekhi commented: MOJO version: [https://0xdata.atlassian.net/browse/PUBDEV-7506|https://0xdata.atlassian.net/browse/PUBDEV-7506|smart-link]

exalate-issue-sync · 2023-05-13T04:04:15Z

Erin LeDell commented: Another report of this causing issues for a user: [https://stackoverflow.com/questions/64985991/is-it-possible-to-use-loaded-h2o-grids-for-stacked-ensembles|https://stackoverflow.com/questions/64985991/is-it-possible-to-use-loaded-h2o-grids-for-stacked-ensembles]

hasithjp · 2023-05-15T07:51:10Z

JIRA Issue Migration Info

Jira Issue: PUBDEV-5923
Assignee: Michal Kurka
Reporter: Erin LeDell
State: Resolved
Fix Version: 3.32.0.3
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#5179
#5180

hasithjp closed this as completed May 15, 2023

hasithjp added the fixVersion/3.32.0.3 label May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-validation predictions not saved with h2o.saveModel #12775

Cross-validation predictions not saved with h2o.saveModel #12775

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

hasithjp commented May 15, 2023

Cross-validation predictions not saved with h2o.saveModel #12775

Cross-validation predictions not saved with h2o.saveModel #12775

Comments

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

hasithjp commented May 15, 2023