Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there support for multicore processing? #156

Closed
vmgustavo opened this issue Jul 21, 2020 · 2 comments
Closed

Is there support for multicore processing? #156

vmgustavo opened this issue Jul 21, 2020 · 2 comments

Comments

@vmgustavo
Copy link

During executions I see a single core going up to 100% usage, is by default single core? If not how to use multiple jobs?

@alejandroschuler
Copy link
Collaborator

ngboost, like other boosting algorithms, does not lend itself to parallelization. There is no parallel capability at this time and I'm not sure how it would ever be. We could send different base learners to different nodes at each boosting iteration, but that would only speed up multiparameter distributions. I think xgboost has some parallelization but they acheive that with some low-level hacking of the base learners. Ngboost lets you use whatever base learners you like, so that's not an option for us.

@kmedved
Copy link
Contributor

kmedved commented Jul 21, 2020

To add to what @alejandroschuler said, Xgboost, LightGBM, and Sklearn's HistGradientBoostingClassifier are all multiprocessed in some ways, but this is a marginal speedup at best on those libraries, and is actually leads to major slowdowns eventually, depending on the number of cores and the dimensionality of your data. Here's a useful Kaggle thread on this: https://www.kaggle.com/c/microsoft-malware-prediction/discussion/79131. More cores is not always better with GBDT models. 32 cores train about 10x slower than 2 cores on my dataset for instance.

To the extent you have additional cores and want to speed up hyperparameter tuning, you'd usually be better off spending those resources on searching multiple sets of hyperparameters at once, such as via RandomSearchCV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants