We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reproduced on 4 node cluster in AWS
{code:java} train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/diabetes/diabetes_text_train.csv")
y = "diabetesMed" aml = H2OAutoML( exclude_algos=["GLM", "DeepLearning"], max_models=100, max_runtime_secs_per_model=120, keep_cross_validation_models=False, keep_cross_validation_predictions=False, seed=1 ) aml.train(y=y, training_frame=train)
{code}
{code:java} 08-15 09:46:46.914 10.0.0.121:54321 24460 FJ-3-25 INFO: Scoring History: 08-15 09:46:46.914 10.0.0.121:54321 24460 FJ-3-25 INFO: Timestamp Duration Number of Trees Training RMSE Training LogLoss Training AUC Training pr_auc Training Lift Training Classification Error Validation RMSE Validation LogLoss Validation AUC Validation pr_auc Validation Lift Validation Classification Error 08-15 09:46:46.914 10.0.0.121:54321 24460 FJ-3-25 INFO: 2019-08-15 09:46:46 0.064 sec 0 0.50000 0.69315 0.50000 0.00000 1.00000 0.25370 0.50000 0.69315 0.50000 0.00000 1.00000 0.25007 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: XGBoost training iteration failed 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: ml.dmlc.xgboost4j.java.XGBoostError: [09:46:47] /dot/include/xgboost/./tree_model.h:290: Check failed: static_cast(deleted_nodes_.size()) == param.num_deleted (0 vs. 4) 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: Stack trace returned 8 entries: 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (0) /tmp/libxgboost4j_minimal2120870706660257861.so(dmlc::StackTrace(unsigned long)+0x1aa) [0x7fc2ccb4cb6a] 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (1) /tmp/libxgboost4j_minimal2120870706660257861.so(+0xf9419) [0x7fc2ccba8419] 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (2) /tmp/libxgboost4j_minimal2120870706660257861.so(xgboost::tree::TreeSyncher::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::DMatrix, std::vector<xgboost::RegTree*, std::allocatorxgboost::RegTree* > const&)+0x27a) [0x7fc2ccba955a] 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (3) /tmp/libxgboost4j_minimal2120870706660257861.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::DMatrix, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_deletexgboost::RegTree >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_deletexgboost::RegTree > > >)+0x49e) [0x7fc2ccce813e] 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (4) /tmp/libxgboost4j_minimal2120870706660257861.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::ObjFunction)+0x9a9) [0x7fc2ccce93f9] 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (5) /tmp/libxgboost4j_minimal2120870706660257861.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x421) [0x7fc2ccc8b3b1] 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (6) /tmp/libxgboost4j_minimal2120870706660257861.so(XGBoosterUpdateOneIter+0x35) [0x7fc2ccc24945] 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (7) [0x7fc2d95e6d3d] 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48) 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:177) 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: at ml.dmlc.xgboost4j.java.XGBoostUpdater$UpdateBooster.call(XGBoostUpdater.java:126) 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: at ml.dmlc.xgboost4j.java.XGBoostUpdater$UpdateBooster.call(XGBoostUpdater.java:104) 08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: at ml.dmlc.xgboost4j.java.XGBoostUpdater.run(XGBoostUpdater.java:49) AssertError:Allreduce: boundary error {code}
The text was updated successfully, but these errors were encountered:
Jan Sterba commented: could be fixed in lastest xgboost, we should try that
Sorry, something went wrong.
Jan Sterba commented: upgrading xgboost fixes this issue
Nidhi Mehta commented: 0.90 xgb version
JIRA Issue Migration Info
Jira Issue: PUBDEV-6793 Assignee: Jan Sterba Reporter: Jan Sterba State: Closed Fix Version: 3.28.0.1 Attachments: N/A Development PRs: N/A
No branches or pull requests
reproduced on 4 node cluster in AWS
{code:java}
train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/diabetes/diabetes_text_train.csv")
y = "diabetesMed"
aml = H2OAutoML(
exclude_algos=["GLM", "DeepLearning"],
max_models=100, max_runtime_secs_per_model=120,
keep_cross_validation_models=False,
keep_cross_validation_predictions=False,
seed=1
)
aml.train(y=y, training_frame=train)
{code}
{code:java}
08-15 09:46:46.914 10.0.0.121:54321 24460 FJ-3-25 INFO: Scoring History:
08-15 09:46:46.914 10.0.0.121:54321 24460 FJ-3-25 INFO: Timestamp Duration Number of Trees Training RMSE Training LogLoss Training AUC Training pr_auc Training Lift Training Classification Error Validation RMSE Validation LogLoss Validation AUC Validation pr_auc Validation Lift Validation Classification Error
08-15 09:46:46.914 10.0.0.121:54321 24460 FJ-3-25 INFO: 2019-08-15 09:46:46 0.064 sec 0 0.50000 0.69315 0.50000 0.00000 1.00000 0.25370 0.50000 0.69315 0.50000 0.00000 1.00000 0.25007
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: XGBoost training iteration failed
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: ml.dmlc.xgboost4j.java.XGBoostError: [09:46:47] /dot/include/xgboost/./tree_model.h:290: Check failed: static_cast(deleted_nodes_.size()) == param.num_deleted (0 vs. 4)
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR:
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: Stack trace returned 8 entries:
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (0) /tmp/libxgboost4j_minimal2120870706660257861.so(dmlc::StackTrace(unsigned long)+0x1aa) [0x7fc2ccb4cb6a]
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (1) /tmp/libxgboost4j_minimal2120870706660257861.so(+0xf9419) [0x7fc2ccba8419]
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (2) /tmp/libxgboost4j_minimal2120870706660257861.so(xgboost::tree::TreeSyncher::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::DMatrix, std::vector<xgboost::RegTree*, std::allocatorxgboost::RegTree* > const&)+0x27a) [0x7fc2ccba955a]
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (3) /tmp/libxgboost4j_minimal2120870706660257861.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::DMatrix, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_deletexgboost::RegTree >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_deletexgboost::RegTree > > >)+0x49e) [0x7fc2ccce813e]
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (4) /tmp/libxgboost4j_minimal2120870706660257861.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::ObjFunction)+0x9a9) [0x7fc2ccce93f9]
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (5) /tmp/libxgboost4j_minimal2120870706660257861.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x421) [0x7fc2ccc8b3b1]
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (6) /tmp/libxgboost4j_minimal2120870706660257861.so(XGBoosterUpdateOneIter+0x35) [0x7fc2ccc24945]
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: [bt] (7) [0x7fc2d95e6d3d]
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR:
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR:
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:177)
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: at ml.dmlc.xgboost4j.java.XGBoostUpdater$UpdateBooster.call(XGBoostUpdater.java:126)
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: at ml.dmlc.xgboost4j.java.XGBoostUpdater$UpdateBooster.call(XGBoostUpdater.java:104)
08-15 09:46:47.394 10.0.0.121:54321 24460 #l_1_cv_2 ERRR: at ml.dmlc.xgboost4j.java.XGBoostUpdater.run(XGBoostUpdater.java:49)
AssertError:Allreduce: boundary error
{code}
The text was updated successfully, but these errors were encountered: