-
-
Notifications
You must be signed in to change notification settings - Fork 25.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Move threadpoolctl outside of iteration loop in KMeans #17235
ENH Move threadpoolctl outside of iteration loop in KMeans #17235
Conversation
…l-outside-of-loop
ping @adrinjalali |
The uncovered lines are for the verbose mode. |
Can you help me understand what made you change your mind about https://github.com/scikit-learn/scikit-learn/pull/16499/files#r382599621 @jeremiedbb ? |
Just trying to fully understand where we're at: is this sentence still up to date considering that the overhead of |
Haha I was sure you'd remember :D Reading my comment again, I recall what was the issue.
If we put the context inside the loop we have the overhead and if we put it outside of the loop, the inner |
We still don't exactly recover the timings from 0.22 but the explanation is wrong, we still need to figure out why |
actually, there's no need for the threadpoolctl context for elkan since elkan does not call BLAS in nested parallel regions. I just removed it. The timings are now:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just removed it
These are the best fixes
LGTM as long as the answer to my question below is yes ;)
thanks @jeremiedbb
print("Iteration {0}, inertia {1}" .format(i, inertia)) | ||
# Threadpoolctl context to limit the number of threads in second level of | ||
# nested parallelism (i.e. BLAS) to avoid oversubsciption. | ||
with threadpool_limits(limits=1, user_api="blas"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so you confirm that we have no BLAS call that are protected by this CM, apart from the ones in lloyd_iter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I just double checked
also will need a whatsnew entry |
@NicolasHug I merged the what's new entry with the entry for #17210 if you don't mind. They are 2 attempts at fixing the same thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As another datapoint:
0.22: 0.0229
0.23: 0.16056
master: 0.1285
this pr: 0.0249
LGTM Thank you @jeremiedbb !
Related to #17208
Profiling shows that the threadpoolctl context manager has a significant overhead when running KMeans on very small datasets. This PR moves the call to threadpoolctl outside of the iteration loop.
The timings on my laptop from the snippet in #17208 are now:
0.22: 0.0138
0.23: 0.0562
master: 0.0431
this pr: 0.0214
Although it's a nice improvement, we don't fully recover the perf of 0.22. See explanation and discussion in #17208.