Why it almost do not speedup with distributed learning? #1316

shuDaoNan9 · 2021-12-17T08:12:48Z

First, I tried 2 spark slaves, it take about 11 minutes to train my model.
submit info: spark-submit --master yarn --num-executors 2 --executor-memory 19G --executor-cores 16 --conf spark.dynamicAllocation.enabled=false --jars s3://EMR/jars/synapseml-vw_2.12-0.9.4.jar,s3://EMR/jars/synapseml_2.12-0.9.4.jar,s3://EMR/jars/client-sdk-1.14.0.jar ......

Second, I tried only one spark slave, it take about 12 minutes to train my model.
submit info: spark-submit --master yarn --num-executors 1 --executor-memory 19G --executor-cores 16 --conf spark.dynamicAllocation.enabled=false --jars s3://EMR/jars/synapseml-vw_2.12-0.9.4.jar,s3://EMR/jars/synapseml_2.12-0.9.4.jar,s3://EMR/jars/client-sdk-1.14.0.jar ......

My results show that LightGBM do not speedup with distributed learning. And my CPU Utilization could be more than 95% on each spark slave! Why it almost do not speedup with distributed learning?

My cluster/data/code Info：
spark slave: 16 vCore, 32 GiB * 2;
spark version: Spark 3.1.2, Hive 3.1.2, ZooKeeper 3.5.7;
dependency: synapseml_2.12-0.9.4.jar;
train data set: 5377937 rows;
code:

val classifier = new LightGBMClassifier() 
  .setLabelCol("play")
  .setObjective("binary")
  .setCategoricalSlotNames(Array("countrycode_index","itemID_index","uid_index"))
  .setUseBarrierExecutionMode(true) 
  .setFeaturesCol("gbdtFeature")
  .setPredictionCol("predictPlay")
  .setNumIterations(trees) 
  .setNumLeaves(32)
  .setLearningRate(0.006) 
  .setProbabilityCol("probabilitys")
  .setEarlyStoppingRound(200)
  .setBoostingType("gbdt")
  .setLambdaL2(0.002)
  .setMaxDepth(24)
val lgbmModel =classifier.fit(lgbmTrainDF.repartition(repartNum)) // repartNum='the number of spark slaves '

Thanks in advance!

AB#1984488

The text was updated successfully, but these errors were encountered:

imatiach-msft · 2021-12-27T16:02:24Z

hi @jwenbin can you please try:
useSingleDatasetMode = True
numThreads = num cores - 1
These two PRs should resolve this:

#1222
#1282

In performance testing we saw big speedup with new single dataset mode and numThreads set to num cores -1.
The two PRs above will be available in 0.9.5 or you can get them with the latest build right now.

For more information on the new single dataset mode please see the PR description:
#1066

This new mode was created after extensive internal benchmarking.

shuDaoNan9 · 2022-01-05T08:04:36Z

hi @jwenbin can you please try: useSingleDatasetMode = True numThreads = num cores - 1 These two PRs should resolve this:

#1222 #1282

In performance testing we saw big speedup with new single dataset mode and numThreads set to num cores -1. The two PRs above will be available in 0.9.5 or you can get them with the latest build right now.

For more information on the new single dataset mode please see the PR description: #1066

This new mode was created after extensive internal benchmarking.

Thank you for your Reply! I tried that just now, the speed improved a lot, but AUC and accuracy becomes too low (less than 0.6). It looks like if I use 'setUseSingleDatasetMode(true)', I should change my params at the same time.

imatiach-msft · 2022-01-05T14:28:33Z

hi @jwenbin
"the speed improved a lot, but AUC and accuracy becomes too low (less than 0.6)"
That is very interesting. In our benchmarking this didn't affect accuracy at all. It only affected memory usage and execution time. I wonder why this might be causing AUC and accuracy to get worse. Essentially we are just reducing the number of datasets being run in parallel, and using more multithreading within each machine while reducing inter-process communication.

shuDaoNan9 · 2022-01-06T08:06:42Z

hi @jwenbin "the speed improved a lot, but AUC and accuracy becomes too low (less than 0.6)" That is very interesting. In our benchmarking this didn't affect accuracy at all. It only affected memory usage and execution time. I wonder why this might be causing AUC and accuracy to get worse. Essentially we are just reducing the number of datasets being run in parallel, and using more multithreading within each machine while reducing inter-process communication.

I deleted 'setUseBarrierExecutionMode(true)' while using ‘setUseSingleDatasetMode(true)’ and retrain the model again. My AUC returned to normal level. But I still don't know how 'setUseBarrierExecutionMode(true)' affect ‘setUseSingleDatasetMode(true)’ while trainning.
And I find that vector features from 'Word2VecModel' may make AUC worse while setting 'setUseSingleDatasetMode(true)'. I still don't know how 'setUseSingleDatasetMode(true)' affect vector features while trainning too.
Thank you very much!

shuDaoNan9 · 2022-01-12T06:31:43Z

Every input data is about 48MB±2MB in each task while trainning('setUseBarrierExecutionMode(true)'), But spark history server indicate only 575989/26320507 rows were trained enough time.

fonhorst mentioned this issue Apr 14, 2022

LightGBMClassifier suffers a great loss in quality in Single Dataset Mode if running with not enough chunkSize #1478

Closed

ruixinxu added the area/lightgbm label Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why it almost do not speedup with distributed learning? #1316

Why it almost do not speedup with distributed learning? #1316

shuDaoNan9 commented Dec 17, 2021 •

edited by mhamilton723

Loading

imatiach-msft commented Dec 27, 2021

shuDaoNan9 commented Jan 5, 2022 •

edited

Loading

imatiach-msft commented Jan 5, 2022 •

edited

Loading

shuDaoNan9 commented Jan 6, 2022 •

edited

Loading

shuDaoNan9 commented Jan 12, 2022

Why it almost do not speedup with distributed learning? #1316

Why it almost do not speedup with distributed learning? #1316

Comments

shuDaoNan9 commented Dec 17, 2021 • edited by mhamilton723 Loading

imatiach-msft commented Dec 27, 2021

shuDaoNan9 commented Jan 5, 2022 • edited Loading

imatiach-msft commented Jan 5, 2022 • edited Loading

shuDaoNan9 commented Jan 6, 2022 • edited Loading

shuDaoNan9 commented Jan 12, 2022

shuDaoNan9 commented Dec 17, 2021 •

edited by mhamilton723

Loading

shuDaoNan9 commented Jan 5, 2022 •

edited

Loading

imatiach-msft commented Jan 5, 2022 •

edited

Loading

shuDaoNan9 commented Jan 6, 2022 •

edited

Loading