Release GIL where possible in watershed algorithm #3233

lagru · 2018-06-30T10:11:43Z

Description

Releasing the GIL makes using the watershed algorithm in parallel easier / possible. As this is a computationally intensive function this should be worth it and have no performance penalties.

Checklist

Clean style in the spirit of PEP8
Docstrings for all functions
Unit tests (already exist)
Benchmark (see Add benchmarks for morphology.watershed #3234)

For reviewers

(Don't remove the checklist below.)

Check that the PR title is short, concise, and will make sense 1 year
later.
Check that new functions are imported in corresponding __init__.py.
Check that new features, API changes, and deprecations are mentioned in
doc/release/release_dev.rst.
Consider backporting the PR with @meeseeksdev backport to v0.14.x

soupault · 2018-06-30T11:14:10Z

@lagru could you, please, present the performance change on a sample data?

jni · 2018-07-01T03:51:05Z

Yes, I thought I'd added a checklist item for adding benchmarks to benchmarks/... ;)

lagru · 2018-07-01T07:53:52Z

@jni Yes, you did. However, two things made me hesitate.

I'm not sure asv is able to handle and measure multiprocessing correctly but I guess it might still be useful to catch performance regressions
The current state of the benchmarks directory compared to SciPy or NumPy made me think that you are still setting up asv for scikit-image and not ready to take PRs yet. You were missing the asv.conf.json (I now found it on the project's top level) and seem to be using one file for all benchmarks? Is the latter a temporary solution until there are enough benchmarks to start using a structure that more closely resembles scikit-images namespaces?

Anyway, I'll put benchmarks back on the menu. 😄

jni · 2018-07-01T07:58:48Z

I considered a full benchmark suite too ambitious for any one PR. Instead I suggest that any PR that touches some function should provide a benchmark for that function. This is especially true if the PR is about performance! ;)

I agree that asv might not be able to catch multicore appropriately. This is certainly true if the benchmark machine only has one core associated with the benchmarking task. @TomAugspurger any comments?

Regarding the structure: there should be one file per scikit-image top-level module, at least until we outgrow this structure.

TomAugspurger · 2018-07-01T11:09:44Z

The benchmark machine has 8 cores. I *think* the runner has all 8 available, but it's possible its being limited somewhere.

…

On Sun, Jul 1, 2018 at 2:58 AM, Juan Nunez-Iglesias < ***@***.***> wrote: I considered a full benchmark suite too ambitious for any one PR. Instead I suggest that any PR that touches some function should provide a benchmark for that function. This is *especially* true if the PR is about performance! ;) I agree that asv might not be able to catch multicore appropriately. This is certainly true if the benchmark machine only has one core associated with the benchmarking task. @TomAugspurger <https://github.com/TomAugspurger> any comments? Regarding the structure: there should be one file per scikit-image top-level module, at least until we outgrow this structure. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3233 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIpxSHjUWsBDq2OzZuFaiDJ9tlgIxks5uCIE8gaJpZM4U9689> .

lagru · 2018-07-01T12:05:36Z

@soupault Here is a code snippet (using IPython!) that you can use to compare the effect multithreading has with and without releasing the GIL.

In[1]: from multiprocessing.pool import ThreadPool
  ...: from skimage.morphology import watershed
  ...: from skimage.data import coins
  ...: image = coins()

In[2]: %%time
  ...: with ThreadPool(4) as pool:
  ...:     pool.starmap(watershed, ((image, 200) for _ in range(500)))

Opening a system monitor one can easily see how only the GIL-releasing version is able to use more than one CPU. On my machine (4 x Intel Core i5-4200U CPU @ 1.60GHz) the results are:

with GIL (f21c52b):

CPU times: user 17.8 s, sys: 1.4 s, total: 19.2 s
Wall time: 16.5 s

without GIL (6526c87):

CPU times: user 25.1 s, sys: 1.08 s, total: 26.2 s
Wall time: 7.33 s

I'm not sure how to present this otherwise. At least this way you can quickly check the qualitative results yourself if you want.

soupault

Nice, thanks! I'm also getting a 2X speedup on my machine with the provided code snippet.

lagru · 2018-08-16T09:56:02Z

@jni With #3234 closed, are there any concerns left?

jni · 2018-08-16T10:15:18Z

@lagru thanks for the ping!

Well, the idea of Add benchmarks for morphology.watershed #3234 was to rebase this one after that one was merged. ;) A rebase is a good idea to get proper CI now that it's working. And,
It just occurred to me that Add benchmarks for morphology.watershed #3234 doesn't cover the multi-threaded case. After rebase, could you add such a benchmark? It can be part of this PR, and it can more or less follow your informal benchmark above (threadpool.starmap), but with a smaller total runtime. =)

lagru · 2018-08-16T10:45:24Z

Oh, you are right. I'll do that!

emmanuelle · 2018-09-16T06:22:51Z

@lagru this PR is almost ready, do you have the time to add the additional benchmark or do you prefer someone else to take over?

lagru · 2018-09-16T11:49:59Z

@emmanuelle I already had the benchmark in my offline repository but didn't upload it for a while because I encountered several problems / hiccups (see review comments) which I wanted to fix before uploading. Sorry for the delay.

lagru · 2018-09-16T11:51:43Z

benchmarks/benchmark_morphology.py

+        self.images = ((image, 100) for _ in range(4))
+
+    def time_watershed_parallel(self):
+        with ThreadPool(4) as pool:


Initially I created the ThreadPool inside the setup method so that this initialization wouldn't taint the benchmark. However the benchmark would always fail when using asv run with ValueError: Pool not running despite passing with with asv dev. Moving the initialization into the time_-function seems to work now. I figure asv does some threading behind the scenes which interfered with the ThreadPool?

lagru · 2018-09-16T11:57:17Z

benchmarks/benchmark_morphology.py

+
+    def setup(self):
+        image = filters.sobel(data.coins())
+        self.images = ((image, 100) for _ in range(4))


I'm not happy with the adaption of the informal benchmark to a smaller runtime. For the current configuration the GIL-releasing version is faster but only by a factor of 0.86. Increasing the number of "jobs" leads to the GIL-releasing version unexpectedly being slower! The same thing happens if I change the pool size meaning the current pool and job size of 4 seems to be a small sweet spot were the GIL-releasing version is slower.

Would be interesting if this only happens on my local hardware or if this behavior is universal. Right now I have a rather low confidence in the results of this particular benchmark...

emmanuelle · 2018-09-16T12:07:04Z

thank you @lagru ! Just to be sure to understand, what would be a typical usecase where performance could be increased thanks to your modifications? When the watershed function is called on the same image with different parameters in different threads, for example?

lagru · 2018-09-16T12:24:27Z

what would be a typical usecase where performance could be increased thanks to your modifications? When the watershed function is called on the same image with different parameters in different threads, for example? - @emmanuelle

Correct, that could be a usecase! However my thinking was more along the line of batch processing of multiple different images. Say that the watershed function is part of a larger workflow that can be applied to multiple images in parallel, then releasing the GIL has the advantage that the watershed function no longer blocks the interpreter. The more parts of this workflow release the GIL the more of this workflow can be parallized.

Because this has no negative impact for the single-threaded case (as long as the GIL is not released and reacquired extensivly in short order) there shouldn't be a reason not to do so. :)

Does that answer your question?

lagru · 2018-10-09T12:37:39Z

Also relevant, the contribution guide recommends this as well:

scikit-image/CONTRIBUTING.txt

Lines 228 to 229 in 942ed29

    
           * For Cython functions, release the GIL whenever possible, using 
        
             ``with nogil:``.

sciunto · 2018-10-12T11:40:59Z

kind ping to @emmanuelle and @jni on this? Is this PR ready for you?

Releasing the GIL makes using the watershed algorithm in parallel easier / possible. As this is a computationally intensive function this has no noticeable performance penalties for the non-threaded case. Furthermore add some documentation in heap_general and heap_watershed.

codecov-io · 2018-11-17T11:19:35Z

Codecov Report

Merging #3233 into master will decrease coverage by 1.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3233      +/-   ##
==========================================
- Coverage   87.88%   86.86%   -1.02%     
==========================================
  Files         325      340      +15     
  Lines       27462    27485      +23     
==========================================
- Hits        24134    23876     -258     
- Misses       3328     3609     +281

Impacted Files	Coverage Δ
skimage/io/_plugins/imread_plugin.py	`69.23% <0%> (-15.39%)`	⬇️
skimage/io/tests/test_fits.py	`83.01% <0%> (-13.21%)`	⬇️
skimage/io/_plugins/fits_plugin.py	`81.13% <0%> (-9.44%)`	⬇️
skimage/io/tests/test_imread.py	`70.58% <0%> (-3.93%)`	⬇️
skimage/segmentation/random_walker_segmentation.py	`92.01% <0%> (-1.88%)`	⬇️
skimage/draw/_random_shapes.py	`97.89% <0%> (-1.06%)`	⬇️
skimage/feature/match.py	`96.96% <0%> (-0.26%)`	⬇️
skimage/morphology/extrema.py	`94.89% <0%> (-0.25%)`	⬇️
skimage/morphology/tests/test_extrema.py	`99.4% <0%> (-0.08%)`	⬇️
skimage/transform/_warps.py	`99.47% <0%> (-0.01%)`	⬇️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29a22e1...95204b0. Read the comment docs.

lagru · 2018-11-20T10:22:20Z

So I guess merging this doesn't make sense anymore as #3490 contains the changes proposed here (except for the benchmark).

soupault added 🧐 Needs review 📈 type: Performance labels Jun 30, 2018

soupault added this to the 0.15 milestone Jun 30, 2018

lagru mentioned this pull request Jul 1, 2018

Add benchmarks for morphology.watershed #3234

Merged

4 tasks

soupault approved these changes Jul 7, 2018

View reviewed changes

soupault added status: mrg+1 and removed 🧐 Needs review labels Jul 7, 2018

soupault requested a review from jni July 7, 2018 10:14

lagru changed the title ~~MAINT: Release GIL where possible in watershed algorithm~~ Release GIL where possible in watershed algorithm Aug 11, 2018

lagru force-pushed the watershed-nogil branch from 6526c87 to dafcf0d Compare September 16, 2018 11:41

lagru commented Sep 16, 2018

View reviewed changes

sciunto assigned emmanuelle and jni Oct 27, 2018

soupault mentioned this pull request Nov 1, 2018

Speedup _inpaint_biharmonic_single_channel #3489

Closed

4 tasks

lagru added 2 commits November 17, 2018 11:43

Add benchmarks for multi-threaded morphology.watershed

95204b0

lagru force-pushed the watershed-nogil branch from dafcf0d to 95204b0 Compare November 17, 2018 11:19

lagru closed this Nov 20, 2018

lagru deleted the watershed-nogil branch July 22, 2023 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release GIL where possible in watershed algorithm #3233

Release GIL where possible in watershed algorithm #3233

lagru commented Jun 30, 2018 •

edited

Loading

soupault commented Jun 30, 2018

jni commented Jul 1, 2018

lagru commented Jul 1, 2018

jni commented Jul 1, 2018

TomAugspurger commented Jul 1, 2018 via email

lagru commented Jul 1, 2018

soupault left a comment

lagru commented Aug 16, 2018

jni commented Aug 16, 2018

lagru commented Aug 16, 2018

emmanuelle commented Sep 16, 2018

lagru commented Sep 16, 2018

lagru Sep 16, 2018

lagru Sep 16, 2018

emmanuelle commented Sep 16, 2018

lagru commented Sep 16, 2018

lagru commented Oct 9, 2018

sciunto commented Oct 12, 2018

codecov-io commented Nov 17, 2018

lagru commented Nov 20, 2018

Release GIL where possible in watershed algorithm #3233

Release GIL where possible in watershed algorithm #3233

Conversation

lagru commented Jun 30, 2018 • edited Loading

Description

Checklist

For reviewers

soupault commented Jun 30, 2018

jni commented Jul 1, 2018

lagru commented Jul 1, 2018

jni commented Jul 1, 2018

TomAugspurger commented Jul 1, 2018 via email

lagru commented Jul 1, 2018

soupault left a comment

Choose a reason for hiding this comment

lagru commented Aug 16, 2018

jni commented Aug 16, 2018

lagru commented Aug 16, 2018

emmanuelle commented Sep 16, 2018

lagru commented Sep 16, 2018

lagru Sep 16, 2018

Choose a reason for hiding this comment

lagru Sep 16, 2018

Choose a reason for hiding this comment

emmanuelle commented Sep 16, 2018

lagru commented Sep 16, 2018

lagru commented Oct 9, 2018

sciunto commented Oct 12, 2018

codecov-io commented Nov 17, 2018

Codecov Report

lagru commented Nov 20, 2018

lagru commented Jun 30, 2018 •

edited

Loading