setting number of thread per executor #950

perrital · 2020-10-31T18:57:45Z

I'm using version com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc2 on a k8s cluster.
Specifically the lightgbm classifier with binary response.
I'm setting num executors to 10 and allocating spark.executor.cores=80 and park.task.cpus=80 so that each machine runs exactly one task with 80 cores available for the task.
What I was expecting to see is full utilisation of 80 cores, instead only ~8 cores are utilised.
My best guess is that it is related to the num_threads parameter, which you have exposed as numThreads a long time ago, but currently is not present in the param set.
I have dataset of 30,000,000 samples with a large amount of features, each tree take around 10 sec to generate
rebagging takes a lot of time too and uses a single core.
Please advise

brunocous · 2020-11-05T12:00:18Z

Same problem as in #292. Not fixed yet.

Try to play with the number of partitions, and partition key(s). This resulted in 20% utilisation instead of 10%. So still not there.

shuDaoNan9 · 2021-08-02T11:32:20Z

Same problem +1

imatiach-msft · 2021-08-09T11:14:04Z

@jwenbin have you tried the new single dataset mode parameter on latest master?
#1066
In our benchmarking it resolved the low CPU utilization issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

setting number of thread per executor #950

setting number of thread per executor #950

perrital commented Oct 31, 2020 •

edited

Loading

brunocous commented Nov 5, 2020

shuDaoNan9 commented Aug 2, 2021

imatiach-msft commented Aug 9, 2021

setting number of thread per executor #950

setting number of thread per executor #950

Comments

perrital commented Oct 31, 2020 • edited Loading

brunocous commented Nov 5, 2020

shuDaoNan9 commented Aug 2, 2021

imatiach-msft commented Aug 9, 2021

perrital commented Oct 31, 2020 •

edited

Loading