Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why it almost do not speedup with distributed learning? #1316

Open
shuDaoNan9 opened this issue Dec 17, 2021 · 5 comments
Open

Why it almost do not speedup with distributed learning? #1316

shuDaoNan9 opened this issue Dec 17, 2021 · 5 comments

Comments

@shuDaoNan9
Copy link

shuDaoNan9 commented Dec 17, 2021

First, I tried 2 spark slaves, it take about 11 minutes to train my model.
submit info: spark-submit --master yarn --num-executors 2 --executor-memory 19G --executor-cores 16 --conf spark.dynamicAllocation.enabled=false --jars s3://EMR/jars/synapseml-vw_2.12-0.9.4.jar,s3://EMR/jars/synapseml_2.12-0.9.4.jar,s3://EMR/jars/client-sdk-1.14.0.jar ......
图片

Second, I tried only one spark slave, it take about 12 minutes to train my model.
submit info: spark-submit --master yarn --num-executors 1 --executor-memory 19G --executor-cores 16 --conf spark.dynamicAllocation.enabled=false --jars s3://EMR/jars/synapseml-vw_2.12-0.9.4.jar,s3://EMR/jars/synapseml_2.12-0.9.4.jar,s3://EMR/jars/client-sdk-1.14.0.jar ......
图片

My results show that LightGBM do not speedup with distributed learning. And my CPU Utilization could be more than 95% on each spark slave! Why it almost do not speedup with distributed learning?

My cluster/data/code Info:
spark slave: 16 vCore, 32 GiB * 2;
spark version: Spark 3.1.2, Hive 3.1.2, ZooKeeper 3.5.7;
dependency: synapseml_2.12-0.9.4.jar;
train data set: 5377937 rows;
code:

val classifier = new LightGBMClassifier() 
  .setLabelCol("play")
  .setObjective("binary")
  .setCategoricalSlotNames(Array("countrycode_index","itemID_index","uid_index"))
  .setUseBarrierExecutionMode(true) 
  .setFeaturesCol("gbdtFeature")
  .setPredictionCol("predictPlay")
  .setNumIterations(trees) 
  .setNumLeaves(32)
  .setLearningRate(0.006) 
  .setProbabilityCol("probabilitys")
  .setEarlyStoppingRound(200)
  .setBoostingType("gbdt")
  .setLambdaL2(0.002)
  .setMaxDepth(24)
val lgbmModel =classifier.fit(lgbmTrainDF.repartition(repartNum)) // repartNum='the number of spark slaves '

Thanks in advance!

AB#1984488

@imatiach-msft
Copy link
Contributor

hi @jwenbin can you please try:
useSingleDatasetMode = True
numThreads = num cores - 1
These two PRs should resolve this:

#1222
#1282

In performance testing we saw big speedup with new single dataset mode and numThreads set to num cores -1.
The two PRs above will be available in 0.9.5 or you can get them with the latest build right now.

For more information on the new single dataset mode please see the PR description:
#1066

This new mode was created after extensive internal benchmarking.

@shuDaoNan9
Copy link
Author

shuDaoNan9 commented Jan 5, 2022

hi @jwenbin can you please try: useSingleDatasetMode = True numThreads = num cores - 1 These two PRs should resolve this:

#1222 #1282

In performance testing we saw big speedup with new single dataset mode and numThreads set to num cores -1. The two PRs above will be available in 0.9.5 or you can get them with the latest build right now.

For more information on the new single dataset mode please see the PR description: #1066

This new mode was created after extensive internal benchmarking.

Thank you for your Reply! I tried that just now, the speed improved a lot, but AUC and accuracy becomes too low (less than 0.6). It looks like if I use 'setUseSingleDatasetMode(true)', I should change my params at the same time.

@imatiach-msft
Copy link
Contributor

imatiach-msft commented Jan 5, 2022

hi @jwenbin
"the speed improved a lot, but AUC and accuracy becomes too low (less than 0.6)"
That is very interesting. In our benchmarking this didn't affect accuracy at all. It only affected memory usage and execution time. I wonder why this might be causing AUC and accuracy to get worse. Essentially we are just reducing the number of datasets being run in parallel, and using more multithreading within each machine while reducing inter-process communication.

@shuDaoNan9
Copy link
Author

shuDaoNan9 commented Jan 6, 2022

hi @jwenbin "the speed improved a lot, but AUC and accuracy becomes too low (less than 0.6)" That is very interesting. In our benchmarking this didn't affect accuracy at all. It only affected memory usage and execution time. I wonder why this might be causing AUC and accuracy to get worse. Essentially we are just reducing the number of datasets being run in parallel, and using more multithreading within each machine while reducing inter-process communication.

I deleted 'setUseBarrierExecutionMode(true)' while using ‘setUseSingleDatasetMode(true)’ and retrain the model again. My AUC returned to normal level. But I still don't know how 'setUseBarrierExecutionMode(true)' affect ‘setUseSingleDatasetMode(true)’ while trainning.
And I find that vector features from 'Word2VecModel' may make AUC worse while setting 'setUseSingleDatasetMode(true)'. I still don't know how 'setUseSingleDatasetMode(true)' affect vector features while trainning too.
Thank you very much!

@shuDaoNan9
Copy link
Author

Every input data is about 48MB±2MB in each task while trainning('setUseBarrierExecutionMode(true)'), But spark history server indicate only 575989/26320507 rows were trained enough time.
图片

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants