[dask] allow customization of num_threads #3714

jameslamb · 2021-01-03T23:09:43Z

Summary

LightGBM training with Dask currently ignores the num_threads setting in user-provided parameters, and overwrites it with the number of CPU cores on each worker contributing to training.

This behavior is a good default, but it should be possible to override it.

Motivation

There might be cases where it's desirable not to use as many threads as cores on each worker during training, and that should be possible.

References

See

LightGBM/python-package/lightgbm/dask.py

Line 144 in aae4fe4

params={**params, 'num_threads': worker_ncores[worker]},

for a proposed fix
The current behavior comes from https://github.com/dask/dask-lightgbm

This behavior was recommended in #3515 (comment).

The text was updated successfully, but these errors were encountered:

jameslamb · 2021-01-03T23:10:33Z

Closing this in favor of being in #2302 with other features. Please leave a comment here if you'd like to work on it.

jameslamb · 2021-01-19T05:08:13Z

I'm going to open this because I'm actively working on it.

@guolinke could you tell me more about this note in the parameter docs for num_threads?

https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_threads

for parallel learning, do not use all CPU cores because this will cause poor performance for the network communication

I'm wondering if this means that I should change the default behavior of the Dask module to use n_cores - 1 threads on each worker for this parameter.

StrikerRUS · 2021-01-20T16:35:11Z

@jameslamb

I'm going to open this because I'm actively working on it.

Just want to make sure you're aware of that nthreads can/should be used among additional params for predict methods as well as for model training.

Here are two examples where users are not happy with the default values:

#1534 (comment)
#2225 (comment)

jameslamb · 2021-01-28T04:15:15Z

I'm going to close this again for now, currently focusing on other Dask items. This can be done after 3.2.0 (#3872 ).

StrikerRUS · 2021-10-19T18:41:37Z

Related: dmlc/xgboost#7337.

StrikerRUS · 2022-03-30T16:53:35Z

The following code of joblib's DaskDistributedBackend and MultiprocessingBackend might be useful.

jameslamb added feature request dask labels Jan 3, 2021

jameslamb closed this as completed Jan 3, 2021

jameslamb mentioned this issue Jan 3, 2021

[python] [dask] add initial dask integration #3515

Merged

StrikerRUS mentioned this issue Jan 4, 2021

Feature Requests & Voting Hub #2302

Open

jameslamb reopened this Jan 19, 2021

jameslamb mentioned this issue Jan 19, 2021

[dask] allow parameter aliases for local_listen_port, num_threads, tree_learner (fixes #3671) #3789

Merged

jameslamb closed this as completed Jan 28, 2021

StrikerRUS mentioned this issue Feb 21, 2021

[dask] allow tight control over ports #3994

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dask] allow customization of num_threads #3714

[dask] allow customization of num_threads #3714

jameslamb commented Jan 3, 2021 •

edited

jameslamb commented Jan 3, 2021

jameslamb commented Jan 19, 2021 •

edited

StrikerRUS commented Jan 20, 2021

jameslamb commented Jan 28, 2021

StrikerRUS commented Oct 19, 2021

StrikerRUS commented Mar 30, 2022

[dask] allow customization of num_threads #3714

[dask] allow customization of num_threads #3714

Comments

jameslamb commented Jan 3, 2021 • edited

Summary

Motivation

References

jameslamb commented Jan 3, 2021

jameslamb commented Jan 19, 2021 • edited

StrikerRUS commented Jan 20, 2021

jameslamb commented Jan 28, 2021

StrikerRUS commented Oct 19, 2021

StrikerRUS commented Mar 30, 2022

jameslamb commented Jan 3, 2021 •

edited

jameslamb commented Jan 19, 2021 •

edited