Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GAM prediction results in different results with cv on and off. #7015

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 4 comments
Closed

GAM prediction results in different results with cv on and off. #7015

exalate-issue-sync bot opened this issue May 11, 2023 · 4 comments
Assignees

Comments

@exalate-issue-sync
Copy link

No description provided.

@exalate-issue-sync
Copy link
Author

Wendy Wong commented: A user from gitter reported the following:

Hello, I'm testing out h2o.gam and am coming across something unintuitive. Does it make sense for identical training models (same deviance) to have different predictions (from the training model) when comparing with and without cross-validation?

{{train <- h2o.createFrame(cols = 5, seed = 22, seed_for_column_types = 55, factors = 3, missing_fraction = 0) }}

{{train$fold <- h2o.kfold_column(train, nfolds = 3, seed = 11) }}

{{train$response <- 50 + ifelse(train$C5 == "c4.l0", 10, ifelse(train$C5 == "c4.l1", 15, ifelse(train$C5 == "c4.12", 20, 25))) + 0.2 * train$C1 + - 0.05 * train$C2 + -0.2 * train$C4 - 0.005 * train$C4^2 + 0.00005 * train$C4^3 + 5*h2o.runif(train) }}

{{params <- list( x = c("C1", "C2", "C5"), y = "response", training_frame = train, lambda = 0, keep_gam_cols = TRUE, gam_columns = c("C4"), scale = c(.05), num_knots = c(5), spline_orders = c(3) ) # no cross validation, bs = 0 (default) }}

{{mod <- do.call(what = "h2o.gam", args = params) }}

{{h2o.residual_deviance(object = mod, train = TRUE) # [1] 76080.37 }}

{{h2o.predict(mod, train) # predict }}

{{# 1 43.49629 }}

{{# 2 61.16891 }}

{{# 3 58.14821 }}

{{# 4 49.06894 }}

{{# 5 54.63423 }}

{{# 6 33.10237 # }}

{{# [10000 rows x 1 column] }}

{{# cross validation, bs = 0 (default)}}

{{ mod2 <- do.call(what = "h2o.gam", args = c(params, fold_column = "fold")) h2o.residual_deviance(object = mod2, train = TRUE) # [1] 76080.37 }}

{{h2o.predict(mod2, train) }}

{{# predict }}

{{# 1 115.53379 }}

{{# 2 71.36690 }}

{{# 3 44.06435 }}

{{# 4 90.39080}}

{{# 5 66.77768 }}

{{# 6 104.90591 # }}

{{# [10000 rows x 1 column]}}

It might also be worth looking at the models' h2o.residual_analysis_plot. mod looks pretty normal, but mod2 shows very strange patterns in the residuals. Not providing here, but different values of bs also had inconsistencies and strange residuals.
Thanks for any help understanding what is happening!

@exalate-issue-sync
Copy link
Author

Wendy Wong commented: I have run Paul’s code and was able to reproduce the error.

I remove the fold column when doing the predict using model mod2 but still do not get the same results as in mod. Something is wrong here.

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Details

Jira Issue: PUBDEV-8681
Assignee: Wendy Wong
Reporter: Wendy Wong
State: Resolved
Fix Version: 3.36.1.3
Attachments: N/A
Development PRs: Available

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

Linked PRs from JIRA

#6185

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants