Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release GIL where possible in watershed algorithm #3233

Closed
wants to merge 2 commits into from

Conversation

lagru
Copy link
Member

@lagru lagru commented Jun 30, 2018

Description

Releasing the GIL makes using the watershed algorithm in parallel easier / possible. As this is a computationally intensive function this should be worth it and have no performance penalties.

Checklist

For reviewers

(Don't remove the checklist below.)

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.
  • Consider backporting the PR with @meeseeksdev backport to v0.14.x

@soupault
Copy link
Member

@lagru could you, please, present the performance change on a sample data?

@jni
Copy link
Member

jni commented Jul 1, 2018

Yes, I thought I'd added a checklist item for adding benchmarks to benchmarks/... ;)

@lagru
Copy link
Member Author

lagru commented Jul 1, 2018

@jni Yes, you did. However, two things made me hesitate.

  • I'm not sure asv is able to handle and measure multiprocessing correctly but I guess it might still be useful to catch performance regressions
  • The current state of the benchmarks directory compared to SciPy or NumPy made me think that you are still setting up asv for scikit-image and not ready to take PRs yet. You were missing the asv.conf.json (I now found it on the project's top level) and seem to be using one file for all benchmarks? Is the latter a temporary solution until there are enough benchmarks to start using a structure that more closely resembles scikit-images namespaces?

Anyway, I'll put benchmarks back on the menu. 😄

@jni
Copy link
Member

jni commented Jul 1, 2018

I considered a full benchmark suite too ambitious for any one PR. Instead I suggest that any PR that touches some function should provide a benchmark for that function. This is especially true if the PR is about performance! ;)

I agree that asv might not be able to catch multicore appropriately. This is certainly true if the benchmark machine only has one core associated with the benchmarking task. @TomAugspurger any comments?

Regarding the structure: there should be one file per scikit-image top-level module, at least until we outgrow this structure.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 1, 2018 via email

@lagru
Copy link
Member Author

lagru commented Jul 1, 2018

@soupault Here is a code snippet (using IPython!) that you can use to compare the effect multithreading has with and without releasing the GIL.

In[1]: from multiprocessing.pool import ThreadPool
  ...: from skimage.morphology import watershed
  ...: from skimage.data import coins
  ...: image = coins()

In[2]: %%time
  ...: with ThreadPool(4) as pool:
  ...:     pool.starmap(watershed, ((image, 200) for _ in range(500)))

Opening a system monitor one can easily see how only the GIL-releasing version is able to use more than one CPU. On my machine (4 x Intel Core i5-4200U CPU @ 1.60GHz) the results are:

  • with GIL (f21c52b):
    CPU times: user 17.8 s, sys: 1.4 s, total: 19.2 s
    Wall time: 16.5 s
    
  • without GIL (6526c87):
    CPU times: user 25.1 s, sys: 1.08 s, total: 26.2 s
    Wall time: 7.33 s
    

I'm not sure how to present this otherwise. At least this way you can quickly check the qualitative results yourself if you want.

Copy link
Member

@soupault soupault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks! I'm also getting a 2X speedup on my machine with the provided code snippet.

@soupault soupault requested a review from jni July 7, 2018 10:14
@lagru lagru changed the title MAINT: Release GIL where possible in watershed algorithm Release GIL where possible in watershed algorithm Aug 11, 2018
@lagru
Copy link
Member Author

lagru commented Aug 16, 2018

@jni With #3234 closed, are there any concerns left?

@jni
Copy link
Member

jni commented Aug 16, 2018

@lagru thanks for the ping!

  1. Well, the idea of Add benchmarks for morphology.watershed #3234 was to rebase this one after that one was merged. ;) A rebase is a good idea to get proper CI now that it's working. And,
  2. It just occurred to me that Add benchmarks for morphology.watershed #3234 doesn't cover the multi-threaded case. After rebase, could you add such a benchmark? It can be part of this PR, and it can more or less follow your informal benchmark above (threadpool.starmap), but with a smaller total runtime. =)

@lagru
Copy link
Member Author

lagru commented Aug 16, 2018

Oh, you are right. I'll do that!

@emmanuelle
Copy link
Member

@lagru this PR is almost ready, do you have the time to add the additional benchmark or do you prefer someone else to take over?

@lagru
Copy link
Member Author

lagru commented Sep 16, 2018

@emmanuelle I already had the benchmark in my offline repository but didn't upload it for a while because I encountered several problems / hiccups (see review comments) which I wanted to fix before uploading. Sorry for the delay.

self.images = ((image, 100) for _ in range(4))

def time_watershed_parallel(self):
with ThreadPool(4) as pool:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I created the ThreadPool inside the setup method so that this initialization wouldn't taint the benchmark. However the benchmark would always fail when using asv run with ValueError: Pool not running despite passing with with asv dev. Moving the initialization into the time_-function seems to work now. I figure asv does some threading behind the scenes which interfered with the ThreadPool?


def setup(self):
image = filters.sobel(data.coins())
self.images = ((image, 100) for _ in range(4))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not happy with the adaption of the informal benchmark to a smaller runtime. For the current configuration the GIL-releasing version is faster but only by a factor of 0.86. Increasing the number of "jobs" leads to the GIL-releasing version unexpectedly being slower! The same thing happens if I change the pool size meaning the current pool and job size of 4 seems to be a small sweet spot were the GIL-releasing version is slower.

Would be interesting if this only happens on my local hardware or if this behavior is universal. Right now I have a rather low confidence in the results of this particular benchmark...

@emmanuelle
Copy link
Member

thank you @lagru ! Just to be sure to understand, what would be a typical usecase where performance could be increased thanks to your modifications? When the watershed function is called on the same image with different parameters in different threads, for example?

@lagru
Copy link
Member Author

lagru commented Sep 16, 2018

what would be a typical usecase where performance could be increased thanks to your modifications? When the watershed function is called on the same image with different parameters in different threads, for example? - @emmanuelle

Correct, that could be a usecase! However my thinking was more along the line of batch processing of multiple different images. Say that the watershed function is part of a larger workflow that can be applied to multiple images in parallel, then releasing the GIL has the advantage that the watershed function no longer blocks the interpreter. The more parts of this workflow release the GIL the more of this workflow can be parallized.

Because this has no negative impact for the single-threaded case (as long as the GIL is not released and reacquired extensivly in short order) there shouldn't be a reason not to do so. :)

Does that answer your question?

@lagru
Copy link
Member Author

lagru commented Oct 9, 2018

Also relevant, the contribution guide recommends this as well:

* For Cython functions, release the GIL whenever possible, using
``with nogil:``.

@sciunto
Copy link
Member

sciunto commented Oct 12, 2018

kind ping to @emmanuelle and @jni on this? Is this PR ready for you?

Releasing the GIL makes using the watershed algorithm in parallel
easier / possible. As this is a computationally intensive function this
has no noticeable performance penalties for the non-threaded case.

Furthermore add some documentation in heap_general and heap_watershed.
@codecov-io
Copy link

Codecov Report

Merging #3233 into master will decrease coverage by 1.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3233      +/-   ##
==========================================
- Coverage   87.88%   86.86%   -1.02%     
==========================================
  Files         325      340      +15     
  Lines       27462    27485      +23     
==========================================
- Hits        24134    23876     -258     
- Misses       3328     3609     +281
Impacted Files Coverage Δ
skimage/io/_plugins/imread_plugin.py 69.23% <0%> (-15.39%) ⬇️
skimage/io/tests/test_fits.py 83.01% <0%> (-13.21%) ⬇️
skimage/io/_plugins/fits_plugin.py 81.13% <0%> (-9.44%) ⬇️
skimage/io/tests/test_imread.py 70.58% <0%> (-3.93%) ⬇️
skimage/segmentation/random_walker_segmentation.py 92.01% <0%> (-1.88%) ⬇️
skimage/draw/_random_shapes.py 97.89% <0%> (-1.06%) ⬇️
skimage/feature/match.py 96.96% <0%> (-0.26%) ⬇️
skimage/morphology/extrema.py 94.89% <0%> (-0.25%) ⬇️
skimage/morphology/tests/test_extrema.py 99.4% <0%> (-0.08%) ⬇️
skimage/transform/_warps.py 99.47% <0%> (-0.01%) ⬇️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29a22e1...95204b0. Read the comment docs.

@lagru
Copy link
Member Author

lagru commented Nov 20, 2018

So I guess merging this doesn't make sense anymore as #3490 contains the changes proposed here (except for the benchmark).

@lagru lagru closed this Nov 20, 2018
@lagru lagru deleted the watershed-nogil branch July 22, 2023 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants