-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn on multinode XGBoost support in AutoML by default #9128
Comments
Sebastien Poirier commented: Activation on multinode seems to work most of the time on Linux, but we still need more intensive testing to check its reliability:
If it passes the tests and reliability is confirmed, than nothing will prevents us from activating it by default. Note that users willing to use XGB on multinode always have the possibility to activate it when starting the nodes using the jvm param {{-Dsys.ai.h2o.automl.xgboost.multinode.enabled=true}} |
Ruslan Dautkhanov commented: Please let us know once that multi-node xgboost instability is fixed Thank you Sebastien |
Jan Sterba commented: fixed in PUBDEV-6793 |
Angela Bartz commented: FYI, the documentation task listed in the description was not addressed. I will ask [~accountid:5d1185d4f46aa30c271c7cc6] to fix this as part of her fix to PUBDEV-7141. |
Sebastien Poirier commented: Reopened as we had to re-disable it for {{3.28.0.1}} due to issues with XGBoost/rabit. |
Michal Kurka commented: The reason this was reopened is because we discovered an issue in our automated tests where XGBoost was crashing one of the H2O nodes. The bug seems to be buried in XGBoost codebase, not H2O codebase. We do need to investigate more before we can make the switch. We are targeting one of the fix releases in 3.28.0.x. |
Let's turn on multinode XGBoost support by default in AutoML. Currently, you have to start H2O with special options to enable it: setting the environment variable -Dsys.ai.h2o.automl.xgboost.multinode.enabled=true (when launching the H2O process from the command line) for every node of the H2O cluster.
We also need to update the documentation to reflect this:
The text was updated successfully, but these errors were encountered: