perf: improve lightgbm training performance 4x-10x by setting num_threads to be cores-1 by default for single dataset mode #1282

imatiach-msft · 2021-12-02T06:11:56Z

In benchmarking, it was discovered that lightgbm training time could be reduced 4X-10X by setting the num_threads to be equal to the # of machine cores - 1 for the single dataset mode (which is now the default mode).
This actually was already suggested in the lightgbm docs:
https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_threads

for distributed learning, do not use all CPU cores because this will cause poor performance for the network communication

Poor network communication can lead to very slow training execution time.

On one scenario with a 37 GB dataset on disk with parameters:
learning_rate = 0.1, num_leaves = 768, num_trees = 1000, min_data_in_leaf = 15000, max_bin = 512
the training time without setting num_threads was 5.5 hours, while setting num_threads to (number of machine cores)-1 reduced the training time to 1.2 hours. The change in performance will vary depending on the parameters used.

This PR sets the distributed lightgbm number of threads by default to be (num cores)-1 if the parameter is not specified by the user. The user can still override the parameter.

imatiach-msft · 2021-12-02T06:12:04Z

/azp run

azure-pipelines · 2021-12-02T06:12:15Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov-commenter · 2021-12-02T06:18:09Z

Codecov Report

Merging #1282 (a52e0f8) into master (6ea8a9a) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1282      +/-   ##
==========================================
+ Coverage   83.33%   83.37%   +0.04%     
==========================================
  Files         300      300              
  Lines       13828    13830       +2     
  Branches      672      675       +3     
==========================================
+ Hits        11523    11531       +8     
+ Misses       2305     2299       -6

Impacted Files	Coverage Δ
...osoft/azure/synapse/ml/lightgbm/LightGBMBase.scala	`94.94% <100.00%> (+0.05%)`	⬆️
...azure/synapse/ml/lightgbm/LightGBMClassifier.scala	`91.11% <100.00%> (ø)`
...oft/azure/synapse/ml/lightgbm/LightGBMRanker.scala	`64.17% <100.00%> (ø)`
.../azure/synapse/ml/lightgbm/LightGBMRegressor.scala	`74.13% <100.00%> (ø)`
...crosoft/azure/synapse/ml/lightgbm/TrainUtils.scala	`85.98% <0.00%> (+2.54%)`	⬆️
...crosoft/azure/synapse/ml/io/http/HTTPClients.scala	`86.66% <0.00%> (+3.33%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ea8a9a...a52e0f8. Read the comment docs.

imatiach-msft · 2021-12-02T06:35:22Z

An added benefit seems to be that the lightgbm tests finish faster as well - the lightgbm1 unit tests finished in 11 min in this build and took 33 min in previous build 😅

…eads to be cores-1

imatiach-msft · 2021-12-02T15:42:12Z

/azp run

azure-pipelines · 2021-12-02T15:42:22Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2021-12-03T05:02:02Z

/azp run

azure-pipelines · 2021-12-03T05:02:12Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2021-12-03T05:39:05Z

/azp run

azure-pipelines · 2021-12-03T05:39:15Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2021-12-03T15:34:08Z

/azp run

azure-pipelines · 2021-12-03T15:34:19Z

Azure Pipelines successfully started running 1 pipeline(s).

perf: improve lightgbm training performance 4x-10x by setting num_thr…

6fb7fc4

…eads to be cores-1

imatiach-msft force-pushed the ilmat/def-num-threads branch from 0ffee85 to 6fb7fc4 Compare December 2, 2021 15:41

mhamilton723 approved these changes Dec 2, 2021

View reviewed changes

Merge branch 'master' into ilmat/def-num-threads

a52e0f8

imatiach-msft merged commit ff2aa02 into microsoft:master Dec 3, 2021

imatiach-msft mentioned this pull request Dec 6, 2021

mmlspark lightgbm performs poorly when dealing with unbalanced label dataset compared with native lightgbm #1276

Closed

stuartleeks mentioned this pull request Dec 6, 2021

sl/flatten batch non array stuartleeks/SynapseML#1

Closed

stuartleeks mentioned this pull request Dec 16, 2021

Various improvements for TextAnalyze stuartleeks/SynapseML#2

Closed

This was referenced Dec 27, 2021

Why it almost do not speedup with distributed learning? #1316

Open

LightGBM stuck at "reduce at LightGBMClassifier.scala:150" #1053

Open

imatiach-msft mentioned this pull request Jan 20, 2022

[BUG] Spark smoke test error with Criteo recommenders-team/recommenders#1615

Closed

imatiach-msft mentioned this pull request Apr 18, 2022

LightGBMClassifier suffers a great loss in quality in Single Dataset Mode if running with not enough chunkSize #1478

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: improve lightgbm training performance 4x-10x by setting num_threads to be cores-1 by default for single dataset mode #1282

perf: improve lightgbm training performance 4x-10x by setting num_threads to be cores-1 by default for single dataset mode #1282

imatiach-msft commented Dec 2, 2021

imatiach-msft commented Dec 2, 2021

azure-pipelines bot commented Dec 2, 2021

codecov-commenter commented Dec 2, 2021 •

edited

imatiach-msft commented Dec 2, 2021

imatiach-msft commented Dec 2, 2021

azure-pipelines bot commented Dec 2, 2021

imatiach-msft commented Dec 3, 2021

azure-pipelines bot commented Dec 3, 2021

imatiach-msft commented Dec 3, 2021

azure-pipelines bot commented Dec 3, 2021

imatiach-msft commented Dec 3, 2021

azure-pipelines bot commented Dec 3, 2021

perf: improve lightgbm training performance 4x-10x by setting num_threads to be cores-1 by default for single dataset mode #1282

perf: improve lightgbm training performance 4x-10x by setting num_threads to be cores-1 by default for single dataset mode #1282

Conversation

imatiach-msft commented Dec 2, 2021

imatiach-msft commented Dec 2, 2021

azure-pipelines bot commented Dec 2, 2021

codecov-commenter commented Dec 2, 2021 • edited

Codecov Report

imatiach-msft commented Dec 2, 2021

imatiach-msft commented Dec 2, 2021

azure-pipelines bot commented Dec 2, 2021

imatiach-msft commented Dec 3, 2021

azure-pipelines bot commented Dec 3, 2021

imatiach-msft commented Dec 3, 2021

azure-pipelines bot commented Dec 3, 2021

imatiach-msft commented Dec 3, 2021

azure-pipelines bot commented Dec 3, 2021

codecov-commenter commented Dec 2, 2021 •

edited