Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoost Fails With AssertionError When Specifying Custom Fold Column #7822

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments
Closed

Comments

@exalate-issue-sync
Copy link

When attempting to run an XGBoost model, both through AutoML and independently, while specifying a custom fold column the build process fails with an AssertionError. Testing the same custom fold column on other algorithms (i.e. GBM) does not result in this error.

Error:
{code:java}

OSError Traceback (most recent call last)
in
1 from h2o.estimators import H2OXGBoostEstimator
2 xg_model = H2OXGBoostEstimator(seed=1234, stopping_metric='rmse')
----> 3 xg_model.train(x=features, y='reactivity', training_frame=model_train, validation_frame=model_test, weights_column=weights_column, fold_column=fold_column)
/opt/conda/lib/python3.7/site-packages/h2o/estimators/estimator_base.py in train(self, x, y, training_frame, offset_column, fold_column, weights_column, validation_frame, max_runtime_secs, ignored_columns, model_id, verbose)
113 validation_frame=validation_frame, max_runtime_secs=max_runtime_secs,
114 ignored_columns=ignored_columns, model_id=model_id, verbose=verbose)
--> 115 self._train(parms, verbose=verbose)
116
117 def train_segments(self, x=None, y=None, training_frame=None, offset_column=None, fold_column=None,
/opt/conda/lib/python3.7/site-packages/h2o/estimators/estimator_base.py in _train(self, parms, verbose)
205 return
206
--> 207 job.poll(poll_updates=self._print_model_scoring_history if verbose else None)
208 model_json = h2o.api("GET /%d/Models/%s" % (rest_ver, job.dest_key))["models"][0]
209 self._resolve_model(job.dest_key, model_json)
/opt/conda/lib/python3.7/site-packages/h2o/job.py in poll(self, poll_updates)
76 if (isinstance(self.job, dict)) and ("stacktrace" in list(self.job)):
77 raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: "
---> 78 "\n{}".format(self.job_key, self.exception, self.job["stacktrace"]))
79 else:
80 raise EnvironmentError("Job with key %s failed with an exception: %s" % (self.job_key, self.exception))
OSError: Job with key $03017f00000132d4ffffffff$_b410f1873cf50b6f36a9ad411a4344a5 failed with an exception: java.lang.AssertionError
stacktrace:
java.lang.AssertionError
at hex.tree.xgboost.matrix.DenseMatrixFactory$WriteDenseChunkFun.map(DenseMatrixFactory.java:187)
at water.LocalMR.compute2(LocalMR.java:84)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1563)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
{code}

Dataset: [https://www.kaggle.com/c/stanford-covid-vaccine/data|https://www.kaggle.com/c/stanford-covid-vaccine/data]

Code:
{code:python}

Code used to create custom fold column keeping entire RNA sequences together to avoid leakage

folds = [k for k in range(1, 8) for i in range(15436)]
formatted_train['folds'] = folds

Convert to H2OFrame

h2f = h2o.H2OFrame(formatted_train)

Then split data and run XGBoost to see the error.

{code}

@exalate-issue-sync
Copy link
Author

Travis Couture commented: Code used when encountering error.

[^competition-model (1).zip]

Note: Remove XGBoost from the excluded algorithms to encounter the error.

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-7819
Assignee: Pavel Pscheidl
Reporter: Travis Couture
State: Closed
Fix Version: 3.32.1.1
Attachments: Available (Count: 2)
Development PRs: Available

Linked PRs from JIRA

#5038

Attachments From Jira

Attachment Name: assert_err.py
Attached By: Jan Sterba
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-7819/assert_err.py

Attachment Name: competition-model (1).zip
Attached By: Travis Couture
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-7819/competition-model (1).zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant