-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release GIL where possible in watershed algorithm #3233
Conversation
@lagru could you, please, present the performance change on a sample data? |
Yes, I thought I'd added a checklist item for adding benchmarks to |
@jni Yes, you did. However, two things made me hesitate.
Anyway, I'll put benchmarks back on the menu. 😄 |
I considered a full benchmark suite too ambitious for any one PR. Instead I suggest that any PR that touches some function should provide a benchmark for that function. This is especially true if the PR is about performance! ;) I agree that asv might not be able to catch multicore appropriately. This is certainly true if the benchmark machine only has one core associated with the benchmarking task. @TomAugspurger any comments? Regarding the structure: there should be one file per scikit-image top-level module, at least until we outgrow this structure. |
The benchmark machine has 8 cores.
I *think* the runner has all 8 available, but it's possible its being
limited somewhere.
…On Sun, Jul 1, 2018 at 2:58 AM, Juan Nunez-Iglesias < ***@***.***> wrote:
I considered a full benchmark suite too ambitious for any one PR. Instead
I suggest that any PR that touches some function should provide a benchmark
for that function. This is *especially* true if the PR is about
performance! ;)
I agree that asv might not be able to catch multicore appropriately. This
is certainly true if the benchmark machine only has one core associated
with the benchmarking task. @TomAugspurger
<https://github.com/TomAugspurger> any comments?
Regarding the structure: there should be one file per scikit-image
top-level module, at least until we outgrow this structure.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3233 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIpxSHjUWsBDq2OzZuFaiDJ9tlgIxks5uCIE8gaJpZM4U9689>
.
|
@soupault Here is a code snippet (using IPython!) that you can use to compare the effect multithreading has with and without releasing the GIL. In[1]: from multiprocessing.pool import ThreadPool
...: from skimage.morphology import watershed
...: from skimage.data import coins
...: image = coins()
In[2]: %%time
...: with ThreadPool(4) as pool:
...: pool.starmap(watershed, ((image, 200) for _ in range(500))) Opening a system monitor one can easily see how only the GIL-releasing version is able to use more than one CPU. On my machine (4 x Intel Core i5-4200U CPU @ 1.60GHz) the results are:
I'm not sure how to present this otherwise. At least this way you can quickly check the qualitative results yourself if you want. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks! I'm also getting a 2X speedup on my machine with the provided code snippet.
@lagru thanks for the ping!
|
Oh, you are right. I'll do that! |
@lagru this PR is almost ready, do you have the time to add the additional benchmark or do you prefer someone else to take over? |
6526c87
to
dafcf0d
Compare
@emmanuelle I already had the benchmark in my offline repository but didn't upload it for a while because I encountered several problems / hiccups (see review comments) which I wanted to fix before uploading. Sorry for the delay. |
self.images = ((image, 100) for _ in range(4)) | ||
|
||
def time_watershed_parallel(self): | ||
with ThreadPool(4) as pool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I created the ThreadPool inside the setup method so that this initialization wouldn't taint the benchmark. However the benchmark would always fail when using asv run
with ValueError: Pool not running
despite passing with with asv dev
. Moving the initialization into the time_
-function seems to work now. I figure asv does some threading behind the scenes which interfered with the ThreadPool?
|
||
def setup(self): | ||
image = filters.sobel(data.coins()) | ||
self.images = ((image, 100) for _ in range(4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not happy with the adaption of the informal benchmark to a smaller runtime. For the current configuration the GIL-releasing version is faster but only by a factor of 0.86. Increasing the number of "jobs" leads to the GIL-releasing version unexpectedly being slower! The same thing happens if I change the pool size meaning the current pool and job size of 4 seems to be a small sweet spot were the GIL-releasing version is slower.
Would be interesting if this only happens on my local hardware or if this behavior is universal. Right now I have a rather low confidence in the results of this particular benchmark...
thank you @lagru ! Just to be sure to understand, what would be a typical usecase where performance could be increased thanks to your modifications? When the watershed function is called on the same image with different parameters in different threads, for example? |
Correct, that could be a usecase! However my thinking was more along the line of batch processing of multiple different images. Say that the watershed function is part of a larger workflow that can be applied to multiple images in parallel, then releasing the GIL has the advantage that the watershed function no longer blocks the interpreter. The more parts of this workflow release the GIL the more of this workflow can be parallized. Because this has no negative impact for the single-threaded case (as long as the GIL is not released and reacquired extensivly in short order) there shouldn't be a reason not to do so. :) Does that answer your question? |
Also relevant, the contribution guide recommends this as well: Lines 228 to 229 in 942ed29
|
kind ping to @emmanuelle and @jni on this? Is this PR ready for you? |
Releasing the GIL makes using the watershed algorithm in parallel easier / possible. As this is a computationally intensive function this has no noticeable performance penalties for the non-threaded case. Furthermore add some documentation in heap_general and heap_watershed.
dafcf0d
to
95204b0
Compare
Codecov Report
@@ Coverage Diff @@
## master #3233 +/- ##
==========================================
- Coverage 87.88% 86.86% -1.02%
==========================================
Files 325 340 +15
Lines 27462 27485 +23
==========================================
- Hits 24134 23876 -258
- Misses 3328 3609 +281
Continue to review full report at Codecov.
|
So I guess merging this doesn't make sense anymore as #3490 contains the changes proposed here (except for the benchmark). |
Description
Releasing the GIL makes using the watershed algorithm in parallel easier / possible. As this is a computationally intensive function this should be worth it and have no performance penalties.
Checklist
For reviewers
(Don't remove the checklist below.)
later.
__init__.py
.doc/release/release_dev.rst
.@meeseeksdev backport to v0.14.x