Allow rank filters to work with nogil #5399

klaussfreire · 2021-05-16T06:34:17Z

Description

Allow rank filters to work with nogil

For reviewers

Check that the PR title is short, concise, and will make sense 1 year
later.
Check that new functions are imported in corresponding __init__.py.
Check that new features, API changes, and deprecations are mentioned in
doc/release/release_dev.rst.

skimage/filters/rank/core_cy.pyx

grlee77 · 2021-05-16T14:30:25Z

Thanks @klaussfreire, but I think these are already nogil. Does it seem that these are not releasing the gil? One way to try running these in parallel is via skimage.util.apply_parallel. I haven't tested whether or not that accelerates things much for these rank filters, but suspect it would provide some benefit on large enough images.

hmaarrfk · 2021-05-16T14:34:52Z

@grlee77 I was going through this and trying to think about how best to respond to this PR.

The function declaration takes a function pointer: kernel. That one is declared nogil but _core is not declared nogil

cdef void _core(void kernel(dtype_t_out*, Py_ssize_t, Py_ssize_t[::1], double,
                            dtype_t, Py_ssize_t, Py_ssize_t, double,
                            double, Py_ssize_t, Py_ssize_t) nogil,
                dtype_t[:, ::1] image,
                char[:, ::1] selem,
                char[:, ::1] mask,
                dtype_t_out[:, :, ::1] out,
                signed char shift_x, signed char shift_y,
                double p0, double p1,
                Py_ssize_t s0, Py_ssize_t s1,
                Py_ssize_t n_bins) except *:

Finally, callers to core are not declared nogil and they don't call _core with:

with nogil:
    _core(....)

Therefore, I think that this PR is indeeded adding multi-threaded capabilities to rank filters.

hmaarrfk · 2021-05-16T14:51:26Z

Personally, I prefer a code organization where low level agorithmic functions are declared as nogil and high level functions call the low level function with with nogil

In this case, I would prefer _core being decalred as with nogil and callers like _autolevel should issue the with nogil call:

https://github.com/scikit-image/scikit-image/blob/main/skimage/filters/rank/generic_cy.pyx#L432

grlee77 · 2021-05-16T15:02:46Z

Yes, I just tried running a couple of the rank filter functions with apply_parallel and it does seem to use only 1 core and ends up being slower than just calling the function directly. This is in contrast with denoise_nl_means, for example, which shows large acceleration when using apply_parallel.

grlee77 · 2021-05-16T15:04:19Z

The function declaration takes a function pointer: kernel. That one is declared nogil but _core is not declared nogil

Yes, I see that now. I was not looking closely enough!

grlee77 · 2021-05-16T15:09:25Z

In this case, I would prefer _core being decalred as with nogil and callers like _autolevel should issue the with nogil call:

right, but I think the issue here is that _core calls numpy functions like np.empty and np.diff, so it can't currently be nogil.

klaussfreire · 2021-05-16T15:25:39Z

In this case, I would prefer _core being decalred as with nogil and callers like _autolevel should issue the with nogil call:

right, but I think the issue here is that _core calls numpy functions like np.empty and np.diff, so it can't currently be nogil.

Exactly

grlee77 · 2021-05-16T16:01:16Z

I compiled this branch, but using apply_parallel still doesn't seem to result in the expected multithreaded CPU usage. I don't imediately see the issue, though. @klaussfreire, were you able to observe an improvement in your application with this change?

klaussfreire · 2021-05-16T16:23:12Z

I compiled this branch, but using apply_parallel still doesn't seem to result in the expected multithreaded CPU usage. I don't imediately see the issue, though. @klaussfreire, were you able to observe an improvement in your application with this change?

Yes. I have a program that uses local entropy to do HDR stretching on several images in parallel, and it does work effectively there.

Of course a single invocation isn't parallelized with this patch. Parallelization of a single invocation would require OpenMP or some other parallelization strategy inside _core, but in my use case (several invocations on several large images) it works fine.

Edit: let me gather some timings...

hmaarrfk · 2021-05-16T16:47:27Z

i also have a feeling that it won't show up with our tests that use small images (and thus things are fast).

klaussfreire · 2021-05-16T17:32:15Z

With patch:

(astro) claudiofreire@linux:~/Pictures/Astro/Cederblad122/2021-05-12> time python -m cvastrophoto combine --mode vvrgb --args lum_w=95,rgb_w=44 --luma-rops nr:tv:weight=0.0002,levels=2 stretch:hdr:steps=9 --color-rops color:wb:wb_set=qhy163m-orion5561 color:scnr nr:tv:weight=0.0003,levels=2 stretch:hdr:steps=9 -- cederblad122.vrgb.tiff cederblad122.{l.deconv,r,g,b}.tiff    

real    10m55.821s
user    72m40.140s
sys     1m41.398s

Without patch

(astro) claudiofreire@linux:~/Pictures/Astro/Cederblad122/2021-05-12> time python -m cvastrophoto combine --mode vvrgb --args lum_w=95,rgb_w=44 --luma-rops nr:tv:weight=0.0002,levels=2 stretch:hdr:steps=9 --color-rops color:wb:wb_set=qhy163m-orion5561 color:scnr nr:tv:weight=0.0003,levels=2 stretch:hdr:steps=9 -- cederblad122.vrgb.tiff cederblad122.{l.deconv,r,g,b}.tiff

real    52m44.981s
user    58m24.521s
sys     1m47.336s

grlee77 · 2021-05-17T03:13:24Z

I tried again and did see the speedup this time, so I must have just not refreshed something properly last time. I can confirm a substantial speedup for a simple example like below that uses apply_parallel with rank.median.

Computation time is 1.58 s when run with in the normal fashion, but 324 ms when run with apply parallel, so about a 5x speedup on the 10-core CPU I tested it on.

import dask
import numpy as np
from skimage.filters import rank
from skimage import data, img_as_float
from skimage.util import apply_parallel

# tile a small image to be able to see the benefit of multithreading
img = img_as_float(data.camera())
img = np.tile(img, (4, 8))
# use original non-tiled shape as the chunk size
chunks = data.camera().shape

selem = np.ones((5, 5), dtype=np.uint8)
depth = selem.shape[0] // 2
img_uint = (np.clip(img, 0, 1) * 255).astype(np.uint8)
extra_keywords = dict(selem=selem)
with dask.config.set(scheduler='threads'):  # scheduler='processes'):
     %timeit rank_median = apply_parallel(rank.median, img_uint, chunks=chunks, depth=depth, extra_keywords=extra_keywords, dtype=img.dtype, compute=True)
# %timeit rank.median(img_uint, **extra_keywords)

grlee77 · 2021-05-17T03:21:25Z

I just happened to be trying apply_parallel with some skimage.restoration functions today and noticed that two of those suffer from the same issue, so I will go ahead and open a similar PR for those.

grlee77

Thanks @klaussfreire, this looks good to me. I confirmed multithreading seems to be working properly now for both 2d and 3d (tested with rank.median and apply_parallel).

jni

Thank you @klaussfreire!

Allow rank filters to work with nogil

a289f81

grlee77 reviewed May 16, 2021

View reviewed changes

skimage/filters/rank/core_cy.pyx Show resolved Hide resolved

Add nogil capability to 3D rank filters as well

ab0a73c

grlee77 approved these changes May 17, 2021

View reviewed changes

grlee77 mentioned this pull request May 17, 2021

release the GIL in denoise_tv_bregman and denoise_bilateral #5400

Merged

grlee77 added the performance label May 17, 2021

jni approved these changes May 17, 2021

View reviewed changes

jni merged commit 28e53a2 into scikit-image:main May 17, 2021

mkcor mentioned this pull request May 17, 2021

2021's calendar of community management #5169

Closed

jni mentioned this pull request Nov 15, 2021

Performance regression in mean filter (v0.18.0 onwards) #6028

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow rank filters to work with nogil #5399

Allow rank filters to work with nogil #5399

klaussfreire commented May 16, 2021

grlee77 commented May 16, 2021

hmaarrfk commented May 16, 2021 •

edited

hmaarrfk commented May 16, 2021

grlee77 commented May 16, 2021

grlee77 commented May 16, 2021

grlee77 commented May 16, 2021

klaussfreire commented May 16, 2021

grlee77 commented May 16, 2021

klaussfreire commented May 16, 2021 •

edited

hmaarrfk commented May 16, 2021

klaussfreire commented May 16, 2021

grlee77 commented May 17, 2021

grlee77 commented May 17, 2021

grlee77 left a comment

jni left a comment

Allow rank filters to work with nogil #5399

Allow rank filters to work with nogil #5399

Conversation

klaussfreire commented May 16, 2021

Description

For reviewers

grlee77 commented May 16, 2021

hmaarrfk commented May 16, 2021 • edited

hmaarrfk commented May 16, 2021

grlee77 commented May 16, 2021

grlee77 commented May 16, 2021

grlee77 commented May 16, 2021

klaussfreire commented May 16, 2021

grlee77 commented May 16, 2021

klaussfreire commented May 16, 2021 • edited

hmaarrfk commented May 16, 2021

klaussfreire commented May 16, 2021

With patch:

Without patch

grlee77 commented May 17, 2021

grlee77 commented May 17, 2021

grlee77 left a comment

Choose a reason for hiding this comment

jni left a comment

Choose a reason for hiding this comment

hmaarrfk commented May 16, 2021 •

edited

klaussfreire commented May 16, 2021 •

edited