Support offsets in the Stacked Ensemble metalearner #11793

exalate-issue-sync · 2023-05-12T23:16:35Z

We don't support using offsets (offset_column) in Stacked Ensemble. You can use offsets in the base learners, but Stacked Ensemble metalearner is hardcoded and doesn't honor that. We need to check:

If one of the models used offset_column, that they all use it
If it was used, it must be included in the training_frame for Stacked Ensemble and then we should use it automatically.
This column must be copied from the training frame and added to the level-one (metalearning) frame so it’s available for the metalearner to use.

The text was updated successfully, but these errors were encountered:

exalate-issue-sync · 2023-05-12T23:16:37Z

Erin LeDell commented: Note that potential work-around, adding {{offset_column}} to {{metalearner_params}}, does not work because we do not currently copy the offset column to the level-one frame (thus its not available to use in metalearning and it fails).

{quote}Error: water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GLM model: metalearner_AUTO_StackedEnsemble_model_R_1582665555956_1952. Details: ERRR on field: _offset_column: Offset column 'wts' not found in the training frame{quote}

exalate-issue-sync · 2023-05-12T23:16:39Z

Erin LeDell commented: Here’s some R code for a test:

{noformat}set.seed = 123
N=1000; p=2
nzc=2
x=matrix(rnorm(Np),N,p)
beta=rnorm(nzc)
f = x[,seq(nzc)]%%beta
mu=exp(f)
y=rpois(N,mu)
wts = sample(1:6, N, TRUE)*10
data = cbind(y,wts,x)
data = data.frame(data)
hdata = as.h2o(data, destination_frame = "data")

x <- 3:length(colnames(hdata))
y <- 1
train <- hdata
offset <- "wts"

works for a regular GLM

hh3 <- h2o.glm(x = x,
y = y,
training_frame = train,
offset_column = offset,
family ="gaussian")

Train & Cross-validate a GBM

gbm <- h2o.gbm(x = x,
y = y,
training_frame = train,
nfolds = nfolds,
keep_cross_validation_predictions = TRUE,
seed = 1)

Train & Cross-validate a RF

rf <- h2o.randomForest(x = x,
y = y,
training_frame = train,
nfolds = nfolds,
keep_cross_validation_predictions = TRUE,
seed = 1)

stack_model <- h2o.stackedEnsemble(x = x,
y = y,
training_frame = train,
base_models = list(gbm, rf),
metalearner_params = list(offset_column = offset_column)){noformat}

exalate-issue-sync · 2023-05-12T23:16:40Z

Tomas Fryda commented: This is included in PR for {{weights_column}} ([https://0xdata.atlassian.net/browse/PUBDEV-4916|https://0xdata.atlassian.net/browse/PUBDEV-4916|smart-link] ).

hasithjp · 2023-05-15T06:06:00Z

JIRA Issue Migration Info

Jira Issue: PUBDEV-4915
Assignee: Tomas Fryda
Reporter: Erin LeDell
State: Resolved
Fix Version: 3.30.1.1
Attachments: N/A
Development PRs: N/A

hasithjp assigned tomasfryda May 15, 2023

hasithjp closed this as completed May 15, 2023

hasithjp added the fixVersion/3.30.1.1 label May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support offsets in the Stacked Ensemble metalearner #11793

Support offsets in the Stacked Ensemble metalearner #11793

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

hasithjp commented May 15, 2023

Support offsets in the Stacked Ensemble metalearner #11793

Support offsets in the Stacked Ensemble metalearner #11793

Comments

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

works for a regular GLM

Train & Cross-validate a GBM

Train & Cross-validate a RF

exalate-issue-sync bot commented May 12, 2023

hasithjp commented May 15, 2023