-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GAM prediction results in different results with cv on and off. #7015
Comments
Wendy Wong commented: A user from gitter reported the following: Hello, I'm testing out h2o.gam and am coming across something unintuitive. Does it make sense for identical training models (same deviance) to have different predictions (from the training model) when comparing with and without cross-validation? {{train <- h2o.createFrame(cols = 5, seed = 22, seed_for_column_types = 55, factors = 3, missing_fraction = 0) }} {{train$fold <- h2o.kfold_column(train, nfolds = 3, seed = 11) }} {{train$response <- 50 + ifelse(train$C5 == "c4.l0", 10, ifelse(train$C5 == "c4.l1", 15, ifelse(train$C5 == "c4.12", 20, 25))) + 0.2 * train$C1 + - 0.05 * train$C2 + -0.2 * train$C4 - 0.005 * train$C4^2 + 0.00005 * train$C4^3 + 5*h2o.runif(train) }} {{params <- list( x = c("C1", "C2", "C5"), y = "response", training_frame = train, lambda = 0, keep_gam_cols = TRUE, gam_columns = c("C4"), scale = c(.05), num_knots = c(5), spline_orders = c(3) ) # no cross validation, bs = 0 (default) }} {{mod <- do.call(what = "h2o.gam", args = params) }} {{h2o.residual_deviance(object = mod, train = TRUE) # [1] 76080.37 }} {{h2o.predict(mod, train) # predict }} {{# 1 43.49629 }} {{# 2 61.16891 }} {{# 3 58.14821 }} {{# 4 49.06894 }} {{# 5 54.63423 }} {{# 6 33.10237 # }} {{# [10000 rows x 1 column] }} {{# cross validation, bs = 0 (default)}} {{ mod2 <- do.call(what = "h2o.gam", args = c(params, fold_column = "fold")) h2o.residual_deviance(object = mod2, train = TRUE) # [1] 76080.37 }} {{h2o.predict(mod2, train) }} {{# predict }} {{# 1 115.53379 }} {{# 2 71.36690 }} {{# 3 44.06435 }} {{# 4 90.39080}} {{# 5 66.77768 }} {{# 6 104.90591 # }} {{# [10000 rows x 1 column]}} It might also be worth looking at the models' h2o.residual_analysis_plot. mod looks pretty normal, but mod2 shows very strange patterns in the residuals. Not providing here, but different values of bs also had inconsistencies and strange residuals. |
Wendy Wong commented: I have run Paul’s code and was able to reproduce the error. I remove the fold column when doing the predict using model mod2 but still do not get the same results as in mod. Something is wrong here. |
JIRA Issue Details Jira Issue: PUBDEV-8681 |
Linked PRs from JIRA |
No description provided.
The text was updated successfully, but these errors were encountered: