Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] allow customization of num_threads #3714

Closed
jameslamb opened this issue Jan 3, 2021 · 6 comments
Closed

[dask] allow customization of num_threads #3714

jameslamb opened this issue Jan 3, 2021 · 6 comments

Comments

@jameslamb
Copy link
Collaborator

jameslamb commented Jan 3, 2021

Summary

LightGBM training with Dask currently ignores the num_threads setting in user-provided parameters, and overwrites it with the number of CPU cores on each worker contributing to training.

This behavior is a good default, but it should be possible to override it.

Motivation

There might be cases where it's desirable not to use as many threads as cores on each worker during training, and that should be possible.

References

This behavior was recommended in #3515 (comment).

@jameslamb
Copy link
Collaborator Author

Closing this in favor of being in #2302 with other features. Please leave a comment here if you'd like to work on it.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Jan 19, 2021

I'm going to open this because I'm actively working on it.


@guolinke could you tell me more about this note in the parameter docs for num_threads?

https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_threads

for parallel learning, do not use all CPU cores because this will cause poor performance for the network communication

I'm wondering if this means that I should change the default behavior of the Dask module to use n_cores - 1 threads on each worker for this parameter.

@StrikerRUS
Copy link
Collaborator

@jameslamb

I'm going to open this because I'm actively working on it.

Just want to make sure you're aware of that nthreads can/should be used among additional params for predict methods as well as for model training.

Here are two examples where users are not happy with the default values:

#1534 (comment)
#2225 (comment)

@jameslamb
Copy link
Collaborator Author

I'm going to close this again for now, currently focusing on other Dask items. This can be done after 3.2.0 (#3872 ).

@StrikerRUS
Copy link
Collaborator

Related: dmlc/xgboost#7337.

@StrikerRUS
Copy link
Collaborator

The following code of joblib's DaskDistributedBackend and MultiprocessingBackend might be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants