-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Number of threads in KMeans should not be bigger than number of chunks #17210
[WIP] Number of threads in KMeans should not be bigger than number of chunks #17210
Conversation
curious, how much of a gain is this? and should it go in 0.23.1? |
It is a regression in #17208. It seems x10 on very small dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then if you've tested and this is fixing the issue, I'm happy. I don't think we can easily write a test for it.
It's an attempt to fix 17208, but it's still wip. I can reproduce the slowdown on my laptop and this pr fixes it but it seems to not work for the person who opened the issue. I need to investigate further with him |
The fix sounds like a good improvement in any case. Though he has 4 cores, so I would have imagined spawning 8 threads shouldn't be too costly performance wise? |
I agree
In the reproducible snippet, there are only 150 samples, which means there will only be one chunk. On my laptop with 4 cores, it spawns 4 threads and it makes a huge diff. The thing is that it only concerns very small datasets for which the whole fit time is ~ 0.005 sec. So I guess that the overhead of thread creation become non negligible. |
Let's merge this as is? As if necessary you could create a new PR with more improvements? Please add a what's new entry. BTW, can this affect other parts of the code that do parallel chunking where we could also apply this fix ? |
Not that I can think about. |
It can't hurt and It's still an improvement |
Some profiling showed that it's threadpoolctl that takes 90% of the time on these very small problems. It's called at each iteration. I'll make a pr to move it outside of the loop. |
I guess @rth is also happy to have this merged. Merging, hopefully the other PR improving the threadpoolctl overhead would get in quickly too. |
…hunks (scikit-learn#17210) * num threads not bigger than num chunks * what's new
…hunks (scikit-learn#17210) * num threads not bigger than num chunks * what's new
…hunks (#17210) * num threads not bigger than num chunks * what's new
…hunks (scikit-learn#17210) * num threads not bigger than num chunks * what's new
related to #17208
When the number of chunks is smaller than the number of cores (i.e. very small datasets), KMeans launches as many threads as there are cores anyway. It should use n_chunks threads instead.