You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When attempting to run an XGBoost model, both through AutoML and independently, while specifying a custom fold column the build process fails with an AssertionError. Testing the same custom fold column on other algorithms (i.e. GBM) does not result in this error.
Error:
{code:java}
OSError Traceback (most recent call last)
in
1 from h2o.estimators import H2OXGBoostEstimator
2 xg_model = H2OXGBoostEstimator(seed=1234, stopping_metric='rmse')
----> 3 xg_model.train(x=features, y='reactivity', training_frame=model_train, validation_frame=model_test, weights_column=weights_column, fold_column=fold_column)
/opt/conda/lib/python3.7/site-packages/h2o/estimators/estimator_base.py in train(self, x, y, training_frame, offset_column, fold_column, weights_column, validation_frame, max_runtime_secs, ignored_columns, model_id, verbose)
113 validation_frame=validation_frame, max_runtime_secs=max_runtime_secs,
114 ignored_columns=ignored_columns, model_id=model_id, verbose=verbose)
--> 115 self._train(parms, verbose=verbose)
116
117 def train_segments(self, x=None, y=None, training_frame=None, offset_column=None, fold_column=None,
/opt/conda/lib/python3.7/site-packages/h2o/estimators/estimator_base.py in _train(self, parms, verbose)
205 return
206
--> 207 job.poll(poll_updates=self._print_model_scoring_history if verbose else None)
208 model_json = h2o.api("GET /%d/Models/%s" % (rest_ver, job.dest_key))["models"][0]
209 self._resolve_model(job.dest_key, model_json)
/opt/conda/lib/python3.7/site-packages/h2o/job.py in poll(self, poll_updates)
76 if (isinstance(self.job, dict)) and ("stacktrace" in list(self.job)):
77 raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: "
---> 78 "\n{}".format(self.job_key, self.exception, self.job["stacktrace"]))
79 else:
80 raise EnvironmentError("Job with key %s failed with an exception: %s" % (self.job_key, self.exception))
OSError: Job with key $03017f00000132d4ffffffff$_b410f1873cf50b6f36a9ad411a4344a5 failed with an exception: java.lang.AssertionError
stacktrace:
java.lang.AssertionError
at hex.tree.xgboost.matrix.DenseMatrixFactory$WriteDenseChunkFun.map(DenseMatrixFactory.java:187)
at water.LocalMR.compute2(LocalMR.java:84)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1563)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
{code}
When attempting to run an XGBoost model, both through AutoML and independently, while specifying a custom fold column the build process fails with an AssertionError. Testing the same custom fold column on other algorithms (i.e. GBM) does not result in this error.
Error:
{code:java}
OSError Traceback (most recent call last)
in
1 from h2o.estimators import H2OXGBoostEstimator
2 xg_model = H2OXGBoostEstimator(seed=1234, stopping_metric='rmse')
----> 3 xg_model.train(x=features, y='reactivity', training_frame=model_train, validation_frame=model_test, weights_column=weights_column, fold_column=fold_column)
/opt/conda/lib/python3.7/site-packages/h2o/estimators/estimator_base.py in train(self, x, y, training_frame, offset_column, fold_column, weights_column, validation_frame, max_runtime_secs, ignored_columns, model_id, verbose)
113 validation_frame=validation_frame, max_runtime_secs=max_runtime_secs,
114 ignored_columns=ignored_columns, model_id=model_id, verbose=verbose)
--> 115 self._train(parms, verbose=verbose)
116
117 def train_segments(self, x=None, y=None, training_frame=None, offset_column=None, fold_column=None,
/opt/conda/lib/python3.7/site-packages/h2o/estimators/estimator_base.py in _train(self, parms, verbose)
205 return
206
--> 207 job.poll(poll_updates=self._print_model_scoring_history if verbose else None)
208 model_json = h2o.api("GET /%d/Models/%s" % (rest_ver, job.dest_key))["models"][0]
209 self._resolve_model(job.dest_key, model_json)
/opt/conda/lib/python3.7/site-packages/h2o/job.py in poll(self, poll_updates)
76 if (isinstance(self.job, dict)) and ("stacktrace" in list(self.job)):
77 raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: "
---> 78 "\n{}".format(self.job_key, self.exception, self.job["stacktrace"]))
79 else:
80 raise EnvironmentError("Job with key %s failed with an exception: %s" % (self.job_key, self.exception))
OSError: Job with key $03017f00000132d4ffffffff$_b410f1873cf50b6f36a9ad411a4344a5 failed with an exception: java.lang.AssertionError
stacktrace:
java.lang.AssertionError
at hex.tree.xgboost.matrix.DenseMatrixFactory$WriteDenseChunkFun.map(DenseMatrixFactory.java:187)
at water.LocalMR.compute2(LocalMR.java:84)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1563)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
{code}
Dataset: [https://www.kaggle.com/c/stanford-covid-vaccine/data|https://www.kaggle.com/c/stanford-covid-vaccine/data]
Code:
{code:python}
Code used to create custom fold column keeping entire RNA sequences together to avoid leakage
folds = [k for k in range(1, 8) for i in range(15436)]
formatted_train['folds'] = folds
Convert to H2OFrame
h2f = h2o.H2OFrame(formatted_train)
Then split data and run XGBoost to see the error.
{code}
The text was updated successfully, but these errors were encountered: