Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setting number of thread per executor #950

Open
perrital opened this issue Oct 31, 2020 · 3 comments
Open

setting number of thread per executor #950

perrital opened this issue Oct 31, 2020 · 3 comments

Comments

@perrital
Copy link

perrital commented Oct 31, 2020

I'm using version com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc2 on a k8s cluster.
Specifically the lightgbm classifier with binary response.
I'm setting num executors to 10 and allocating spark.executor.cores=80 and park.task.cpus=80 so that each machine runs exactly one task with 80 cores available for the task.
What I was expecting to see is full utilisation of 80 cores, instead only ~8 cores are utilised.
My best guess is that it is related to the num_threads parameter, which you have exposed as numThreads a long time ago, but currently is not present in the param set.
I have dataset of 30,000,000 samples with a large amount of features, each tree take around 10 sec to generate
rebagging takes a lot of time too and uses a single core.
Please advise

@brunocous
Copy link

Same problem as in #292. Not fixed yet.

Try to play with the number of partitions, and partition key(s). This resulted in 20% utilisation instead of 10%. So still not there.

@shuDaoNan9
Copy link

Same problem +1

@imatiach-msft
Copy link
Contributor

@jwenbin have you tried the new single dataset mode parameter on latest master?
#1066
In our benchmarking it resolved the low CPU utilization issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants