Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-validation predictions not saved with h2o.saveModel #12775

Closed
exalate-issue-sync bot opened this issue May 13, 2023 · 3 comments
Closed

Cross-validation predictions not saved with h2o.saveModel #12775

exalate-issue-sync bot opened this issue May 13, 2023 · 3 comments

Comments

@exalate-issue-sync
Copy link

When saving a model (which had keep_cross_validation_predictions = TRUE) using the binary format, the model will be saved, but the predictions will not. This means that if you save models and want to train stacked ensembles later, you will not be able to. The model saves key name of the predictions frame, but once the cluster is shutdown, that key is no longer valid.

Example:
{code}

fit <- h2o.gbm(y = 5, training_frame = as.h2o(train), nfolds = 3, keep_cross_validation_predictions = TRUE)
|========================================================================================================| 100%
h2o.saveModel(fit, path = "/Users/me/Downloads/foocv/")
[1] "/Users/me/Downloads/foocv/GBM_model_R_1536894672242_603"
h2o.shutdown()
Are you sure you want to shutdown the H2O instance running at http://localhost:54321/ (Y/N)? y
[1] TRUE
rm(list=ls())
h2o.init()

H2O is not running yet, starting it now...

Note: In case of errors look at the following log files:
/var/folders/gj/cm0k4b_s42j30zs376cq_5hh0000gn/T//Rtmp7Grlq4/h2o_me_started_from_r.out
/var/folders/gj/cm0k4b_s42j30zs376cq_5hh0000gn/T//Rtmp7Grlq4/h2o_me_started_from_r.err

java version "1.8.0_172"
Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)

Starting H2O JVM and connecting: . Connection successful!

R is connected to the H2O cluster:
H2O cluster uptime: 1 seconds 590 milliseconds
H2O cluster timezone: America/Los_Angeles
H2O data parsing timezone: UTC
H2O cluster version: 3.21.0.99999
H2O cluster version age: 52 minutes
H2O cluster name: H2O_started_from_R_me_zdk319
H2O cluster total nodes: 1
H2O cluster total memory: 3.56 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4
R Version: R version 3.5.0 (2018-04-23)

fit <- h2o.loadModel("/Users/me/Downloads/foocv/GBM_model_R_1536894672242_603")
fit@model$cross_validation_predictions
[[1]]
[[1]]$__meta
[[1]]$__meta$schema_version
[1] 3

[[1]]$__meta$schema_name
[1] "FrameKeyV3"

[[1]]$__meta$schema_type
[1] "Key"

[[1]]$name
[1] "prediction_GBM_model_R_1536894672242_603_cv_1"

[[1]]$type
[1] "Key"

[[1]]$URL
[1] "/3/Frames/prediction_GBM_model_R_1536894672242_603_cv_1"

[[2]]
[[2]]$__meta
[[2]]$__meta$schema_version
[1] 3

[[2]]$__meta$schema_name
[1] "FrameKeyV3"

[[2]]$__meta$schema_type
[1] "Key"

[[2]]$name
[1] "prediction_GBM_model_R_1536894672242_603_cv_2"

[[2]]$type
[1] "Key"

[[2]]$URL
[1] "/3/Frames/prediction_GBM_model_R_1536894672242_603_cv_2"

[[3]]
[[3]]$__meta
[[3]]$__meta$schema_version
[1] 3

[[3]]$__meta$schema_name
[1] "FrameKeyV3"

[[3]]$__meta$schema_type
[1] "Key"

[[3]]$name
[1] "prediction_GBM_model_R_1536894672242_603_cv_3"

[[3]]$type
[1] "Key"

[[3]]$URL
[1] "/3/Frames/prediction_GBM_model_R_1536894672242_603_cv_3"

fit@model$cross_validation_holdout_predictions_frame_id
$__meta
$__meta$schema_version
[1] 3

$__meta$schema_name
[1] "FrameKeyV3"

$__meta$schema_type
[1] "Key"

$name
[1] "cv_holdout_prediction_GBM_model_R_1536894672242_603"

$type
[1] "Key"

$URL
[1] "/3/Frames/cv_holdout_prediction_GBM_model_R_1536894672242_603"

{code}

Reported on h2ostream: https://groups.google.com/forum/#!topic/h2ostream/zoW_ewFwJAU

I'm not sure if we should try to save the predictions along with the binary model (if they were kept) or if we should write client side wrapper functions to save the CV pred frame separately at the same path (e.g. model_id.csv) and then load them up when the model is loaded.

@exalate-issue-sync
Copy link
Author

Neema Mashayekhi commented: MOJO version: [https://0xdata.atlassian.net/browse/PUBDEV-7506|https://0xdata.atlassian.net/browse/PUBDEV-7506|smart-link]

@exalate-issue-sync
Copy link
Author

Erin LeDell commented: Another report of this causing issues for a user: [https://stackoverflow.com/questions/64985991/is-it-possible-to-use-loaded-h2o-grids-for-stacked-ensembles|https://stackoverflow.com/questions/64985991/is-it-possible-to-use-loaded-h2o-grids-for-stacked-ensembles]

@hasithjp
Copy link
Member

JIRA Issue Migration Info

Jira Issue: PUBDEV-5923
Assignee: Michal Kurka
Reporter: Erin LeDell
State: Resolved
Fix Version: 3.32.0.3
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#5179
#5180

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant