You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's a warning message when you use predict() in Stacked Ensemble when the base models were trained with a fold_column. I checked in R, but it's probably also happening in Python. Only Stacked Ensemble is affected (other models are not).
Tomas Fryda commented: Note to myself: This gets emitted from {{hex.Model#adaptTestForTrain}} because stacked ensemble uses {{metalearner_fold_column}} instead of {{fold_column}}. So it affects both R and Python. Simple fix would be to set internally {{fold_column}} to the value of {{metalearner_fold_column}} but it is not a clean solution.
There's a warning message when you use predict() in Stacked Ensemble when the base models were trained with a fold_column. I checked in R, but it's probably also happening in Python. Only Stacked Ensemble is affected (other models are not).
Repro:
{code:r}library(h2o)
h2o.init()
#Import the titanic dataset
f <- "https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv"
titanic <- h2o.importFile(f)
Set response column as a factor
y <- "survived"
titanic[y] <- as.factor(titanic[y])
x <- c('home.dest', 'cabin', 'embarked')
Split the dataset into train and test & add fold column
splits <- h2o.splitFrame(titanic, seed = 1, ratios = c(0.8))
train <- splits[[1]]
test <- splits[[2]]
train$fold <- h2o.kfold_column(train, nfolds = 5, seed = 1)
aml <- h2o.automl(y = y, x = x,
training_frame = train,
fold_column = "fold",
max_models = 5,
seed = 1)
aml@leaderboard
model_id auc logloss aucpr
1 StackedEnsemble_AllModels_AutoML_20210219_165846 0.7246883 0.5861177 0.6115136
2 StackedEnsemble_BestOfFamily_AutoML_20210219_165846 0.7240083 0.5863743 0.6115439
3 GLM_1_AutoML_20210219_165846 0.7150777 0.5862183 0.6188071
4 XGBoost_2_AutoML_20210219_165846 0.6925490 0.6000085 0.5713987
5 XGBoost_3_AutoML_20210219_165846 0.6898093 0.6030432 0.5544744
6 XGBoost_1_AutoML_20210219_165846 0.6873683 0.6032365 0.5526302
mean_per_class_error rmse mse
1 0.3303080 0.4467188 0.1995577
2 0.3302314 0.4468061 0.1996356
3 0.3701358 0.4469886 0.1997988
4 0.3778655 0.4539299 0.2060524
5 0.3778655 0.4549444 0.2069744
6 0.3778655 0.4552295 0.2072339
[7 rows x 7 columns]
pred <- h2o.predict(aml, test)
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
Test/Validation dataset is missing column 'fold': substituting in a column of NaN
compare to different models
se <- h2o.getModel(as.data.frame(aml@leaderboard)[1,1])
glm <- h2o.getModel(as.data.frame(aml@leaderboard)[3,1])
xgb <- h2o.getModel(as.data.frame(aml@leaderboard)[5,1])
h2o.predict(se, test)
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
Test/Validation dataset is missing column 'fold': substituting in a column of NaN
no warning
h2o.predict(glm, test)
h2o.predict(xgb, test){code}
The text was updated successfully, but these errors were encountered: