Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support offsets in the Stacked Ensemble metalearner #11793

Closed
exalate-issue-sync bot opened this issue May 12, 2023 · 4 comments
Closed

Support offsets in the Stacked Ensemble metalearner #11793

exalate-issue-sync bot opened this issue May 12, 2023 · 4 comments
Assignees

Comments

@exalate-issue-sync
Copy link

We don't support using offsets (offset_column) in Stacked Ensemble. You can use offsets in the base learners, but Stacked Ensemble metalearner is hardcoded and doesn't honor that. We need to check:

  • If one of the models used offset_column, that they all use it
  • If it was used, it must be included in the training_frame for Stacked Ensemble and then we should use it automatically.
  • This column must be copied from the training frame and added to the level-one (metalearning) frame so it’s available for the metalearner to use.
@exalate-issue-sync
Copy link
Author

Erin LeDell commented: Note that potential work-around, adding {{offset_column}} to {{metalearner_params}}, does not work because we do not currently copy the offset column to the level-one frame (thus its not available to use in metalearning and it fails).

{quote}Error: water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GLM model: metalearner_AUTO_StackedEnsemble_model_R_1582665555956_1952. Details: ERRR on field: _offset_column: Offset column 'wts' not found in the training frame{quote}

@exalate-issue-sync
Copy link
Author

Erin LeDell commented: Here’s some R code for a test:

{noformat}set.seed = 123
N=1000; p=2
nzc=2
x=matrix(rnorm(Np),N,p)
beta=rnorm(nzc)
f = x[,seq(nzc)]%
%beta
mu=exp(f)
y=rpois(N,mu)
wts = sample(1:6, N, TRUE)*10
data = cbind(y,wts,x)
data = data.frame(data)
hdata = as.h2o(data, destination_frame = "data")

x <- 3:length(colnames(hdata))
y <- 1
train <- hdata
offset <- "wts"

works for a regular GLM

hh3 <- h2o.glm(x = x,
y = y,
training_frame = train,
offset_column = offset,
family ="gaussian")

Train & Cross-validate a GBM

gbm <- h2o.gbm(x = x,
y = y,
training_frame = train,
nfolds = nfolds,
keep_cross_validation_predictions = TRUE,
seed = 1)

Train & Cross-validate a RF

rf <- h2o.randomForest(x = x,
y = y,
training_frame = train,
nfolds = nfolds,
keep_cross_validation_predictions = TRUE,
seed = 1)

stack_model <- h2o.stackedEnsemble(x = x,
y = y,
training_frame = train,
base_models = list(gbm, rf),
metalearner_params = list(offset_column = offset_column)){noformat}

@exalate-issue-sync
Copy link
Author

Tomas Fryda commented: This is included in PR for {{weights_column}} ([https://0xdata.atlassian.net/browse/PUBDEV-4916|https://0xdata.atlassian.net/browse/PUBDEV-4916|smart-link] ).

@hasithjp
Copy link
Member

JIRA Issue Migration Info

Jira Issue: PUBDEV-4915
Assignee: Tomas Fryda
Reporter: Erin LeDell
State: Resolved
Fix Version: 3.30.1.1
Attachments: N/A
Development PRs: N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants