fast, non-Cython implementation for correlate_sparse #5171

grlee77 · 2021-01-04T19:44:50Z

Description

This PR replaces the recently added correlate_sparse Cython version with a simple slicing-based variant. This was done so that this function can be used with other array types implementing the NumPy API, such as CuPy. There should be no change in behavior or API.

A secondary benefit is that this also profiled modestly faster on 2d and 3d cases. See the threshold_sauvola benchmarks for v0.18.1 vs. this PR below. I tried out varying sparsity levels as well and wasn't able to find a case where it is worse than the current implementation.

Benchmark Results

v0.18.1
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_home_lee8rx_miniconda3_envs_skimage_dev_bin_python3.7
[ 25.00%] ··· benchmark_filters.ThresholdSauvolaSuite.time_sauvola 252±0ms
[ 50.00%] ··· benchmark_filters.ThresholdSauvolaSuite.time_sauvola_3d 458±0ms

This PR
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_home_lee8rx_miniconda3_envs_skimage_dev_bin_python3.7
[ 25.00%] ··· benchmark_filters.ThresholdSauvolaSuite.time_sauvola 188±0ms
[ 50.00%] ··· benchmark_filters.ThresholdSauvolaSuite.time_sauvola_3d 387±0ms

Checklist

Docstrings for all functions
Gallery example in ./doc/examples (new features only)
Benchmark in ./benchmarks, if your changes aren't covered by an
existing benchmark
Unit tests
Clean style in the spirit of PEP8

For reviewers

Check that the PR title is short, concise, and will make sense 1 year
later.
Check that new functions are imported in corresponding __init__.py.
Check that new features, API changes, and deprecations are mentioned in
doc/release/release_dev.rst.

…ariant in _mean_std This slicing variant has two benefits: 1.) profiled 30-40% faster on 2d and 3d cases I tested with 2.) can immediately be used by other array backends such as CuPy

rfezzani

Thank you @grlee77, I like the idea of reducing code base 😉

skimage/filters/_sparse.py

rfezzani · 2021-01-04T20:36:35Z

skimage/filters/_sparse.py

+    values = kernel[indices].astype(padded_image.dtype, copy=False)
+    indices = list(zip(*indices))
+    kernel_indices_and_values = [(idx, v) for idx, v in zip(indices, values)]
+    if (0, ) * kernel.ndim not in indices:


Suggested change

if (0, ) * kernel.ndim not in indices:

if kernel.reshape(-1)[0] == 0:

I don't think this is equivalent? What I want is if the sparse kernel does not have an entry at the (0,) * ndim corner location, then an entry should be added so that there will not be a shift in the output of _correlate_sparse. (Internal to _correlate_sparse, the output array is initialized from this first point, which needs to be the corner location in order to avoid an unexpected spatial shift in the output).

See if e4cff50 makes things any clearer

Isn't (0,) * ndim the index of the first entry of kernel? kernel.reshape(-1)[0] is the first entry of kernel. If it is 0 then (0,) * ndim is not in indices.

Yes, I agree now that what you suggest is equivalent and would be fine with changing to that.

On closer look, I would prefer to leave this check as-is so that it avoids a device->host transfer if using a CuPy backend. (For CuPy, kernel.reshape(-1)[0] will give a 0-dimensional cupy.ndarray rather than a host scalar and comparing to 0 requires overhead of transferring that value from the GPU back to the host to do the comparison)

remove duplicate declarations Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>

pep8speaks · 2021-01-04T21:06:27Z

Hello @grlee77! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-01-07 13:19:11 UTC

skimage/filters/_sparse.py

skimage/_shared/utils.py

emmanuelle

Thanks @grlee77 ! I had the same comment as @rfezzani with respect to assertion vs error but it's not in a function of the public API and the public-API function calling the helper function will not cause problems, so this should not block the PR. Also, I noted that correlate_sparse is in the public API but it's not use in any gallery example, I'm not sure it is very discoverable for users (but once again this should not block the PR).

jni

Just some minor comments from me. Thanks @grlee77, very nice!

jni · 2021-01-04T22:23:48Z

skimage/filters/_sparse.py

+    # implementation assumes this corner is first in kernel_indices_in_values
+    assert tuple(idx) == (0, ) * image.ndim
+    out = _get_view(image, kernel_shape, idx, val)
+    if not out.flags.owndata:


Given that one of the goals of this PR is to work with alternative array implementations, wouldn't it be safer to just assume that we need a copy here? ie we should aim to target the Array API rather than just NumPy and CuPy?

I am fine with that and don't think it will substantially reduce performance, but given the reliance on scipy.ndimage elsewhere in the library for many things related to filtering/morphology/labeling (as well as usage of FFTs or sparse matrices in various places), we will already not be able to realistically do many things purely with the Array API. I think NumPy+SciPy, Dask+dask-image and CuPy are the only implementations that provide most of the needed features currently.

Well, dask arrays don't have .flags either...

In [1]: arr = da.random.random((5,)) In [2]: arr.flags --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-2-c469d1119f64> in <module> ----> 1 arr.flags AttributeError: 'Array' object has no attribute 'flags'

jni · 2021-01-04T22:25:34Z

skimage/filters/_sparse.py

+    elif val == -1:
+        return -v


Is this faster than -1 * v? I presume yes — then maybe add a comment above the start of the if/else to say we optimise to avoid multiplication/copy for these values?

I suspected it may be faster, but hadn't timed it. I tried just now and it was faster on very small arrays, but the same on large arrays, so probably not important to have it as a special case

import numpy as np a = np.ones((4096, 4096)) %timeit (-a) 22.6 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit (-1 * a) 22.6 ms ± 33.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) a = np.ones((16, 16)) %timeit (-a) 400 ns ± 2.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit (-1 * a) 913 ns ± 3.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

The same relative performance pattern occurs for CuPy as reported for NumPy above (it is necessary to synchronize to measure the actual time since CUDA kernels run asynchronously, so you will only measure the CPU overhead without that)

import cupy as cp a = cp.ones((4096, 4096)) d = cp.cuda.Device() %timeit (-a); d.synchronize() 701 µs ± 621 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) %timeit (-1 * a); d.synchronize() 705 µs ± 713 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) a = cp.ones((16, 16)) %timeit (-a); d.synchronize() 8.89 µs ± 86.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) %timeit (-1 * a); d.synchronize() 12.8 µs ± 228 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

This also demonstrates how CuPy is much faster when working with large arrays, but NumPy is much faster on tiny arrays because it doesn't the kernel launch overhead.

Coolio, let's remove that special case then?

jni · 2021-01-06T00:03:40Z

skimage/filters/thresholding.py

-    sum_sq_full = correlate_sparse(integral_sq, kern, mode='valid')
-    g2 = sum_sq_full / total_window_size
+    kernel_shape = tuple(_w + 1 for _w in w)
+    m = _correlate_sparse(integral, kernel_shape, kernel_indices_and_values)


As a minor point, from an API niceness standpoint I'd rather kernel_indices and kernel_values be two separate arrays that are zipped inside the function. Not critical though.

grlee77 · 2021-01-06T15:21:25Z

The CI failures seen here should be resolved by #5175

stefanv · 2021-01-17T02:59:12Z

Thanks, Greg!

@mkcor

* resize using scipy.ndimage.zoom * Clip negative values in output of RGB to HED conversion. Fixes #5164 * Fixed stain separation tests. - Round trips start with stains, such that the roundtrip does not require negative values. - Using a blue+red+orange stain combination, the test previously used a 2-stain combination that made it impossible to do a correct roundtrip. * Improve IHC stain separation example. * BENCH: add benchmarks for resize and rescale * move mode conversion code to _shared.utils.py use grid-constant and grid-wrap for SciPy >= 1.6.0 closes gh-5064 * separate grid mode conversion utility * Update doc/examples/color_exposure/plot_ihc_color_separation.py Improvement to the IHC example. Co-authored-by: Riadh Fezzani <rfezzani@gmail.com> * add TODO for removing legacy code paths * avoid pep8 warning in benchmark_interpolation.py * keep same ValueError on unrecognized modes * test_rank.py: fetch reference data once into a dictionary of arrays (#5175) This should avoid stochastic test failures that have been occuring on GitHub Actions * Add normalized mutual information metric (#5158) * Add normalized mutual information metric * Add tests for NMI * Add NMI to metrics init * Fix error message for incorrect target shape * Add 3D NMI test * properly add nmi to metrics init * _true, _test -> 0, 1 * ravel -> reshape(-1) * Apply suggestions from @mkcor's code review Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org> * Fix warning and exception messages Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org> * segment Other section of TODO by date based on NEP-29 24 month window for dropping upstream package version * Fix bug in optical flow code and examples mode 'nearest' is a scipy.ndimage mode, but warp only accepts the numpy.pad equivalent of 'edge' * fix additional cases of warp with 'nearest'->'edge' * DOC: Fixup to linspace in example (#5181) Right now the data is not sampled every half degree. Also, -pi/2 to pi/2. * Minor fixups to Hough line transform code and examples (#5182) * Correction to linspace call in hough_line default `theta = np.pi / 2 - np.arange(180) / 180.0 * np.pi` done correctly below. * BUG: Fixup to image orientation in example * DOC: Tiny typo * Changed range to conform to documentation The original was [pi/2, -pi/2), not [-pi/2, pi/2). Updated wording in docs. * Fixed x-bounds of hough image in example * Fixed long line * Fixes to tests Number of lines in checkerboard, longer line in peak ordering assertion, updated angle * Accidentally modified the wrong value. * Updated example to include three lines As requested in review * Fixed off-by one error in pixel bins in hough line transform (#5183) * Added 1/2 pixel bounds to extent of displayed images (#5184) * fast, non-Cython implementation for correlate_sparse (#5171) * Replace correlate_sparse Cython version with a simple slicing-based variant in _mean_std This slicing variant has two benefits: 1.) profiled 30-40% faster on 2d and 3d cases I tested with 2.) can immediately be used by other array backends such as CuPy * remove intermediate padded_sq variable to reduce memory footprint * Update skimage/filters/_sparse.py remove duplicate declarations Co-authored-by: Riadh Fezzani <rfezzani@gmail.com> * move _to_np_mode and _to_ndimage_mode to _shared.utils.py * add comment about corner_index needing to be present * raise RuntimeError on missing corner index * update code style based on reviewer comments Co-authored-by: Riadh Fezzani <rfezzani@gmail.com> * Add release step on github to RELEASE.txt (See #5185) (#5187) Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu> * Prevent integer overflow in EllipseModel (#5179) * Change dtype of CircleModel and EllipseModel to float, to avoid integer overflows * remove failing test and add code review notes * Apply suggestions from code review Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com> * Remove unused pytest import Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com> Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com> Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu> * Add saturation parameter to color.label2rgb (#5156) * Remove reference to opencv in threshold_local documentation (#5191) Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org> Co-authored-by: Riadh <r.fezzani@vitadx.com> * resize using scipy.ndimage.zoom * BENCH: add benchmarks for resize and rescale * move mode conversion code to _shared.utils.py use grid-constant and grid-wrap for SciPy >= 1.6.0 closes gh-5064 * separate grid mode conversion utility * add TODO for removing legacy code paths * avoid pep8 warning in benchmark_interpolation.py * keep same ValueError on unrecognized modes * segment Other section of TODO by date based on NEP-29 24 month window for dropping upstream package version * Fix bug in optical flow code and examples mode 'nearest' is a scipy.ndimage mode, but warp only accepts the numpy.pad equivalent of 'edge' * fix additional cases of warp with 'nearest'->'edge' * Apply suggestions from code review Co-authored-by: Riadh Fezzani <rfezzani@gmail.com> Co-authored-by: Cris Luengo <cris.l.luengo@gmail.com> Co-authored-by: Cris Luengo <crisluengo@users.noreply.github.com> Co-authored-by: Riadh Fezzani <rfezzani@gmail.com> Co-authored-by: Alexandre de Siqueira <alex.desiqueira@igdore.org> Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu> Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org> Co-authored-by: Joseph Fox-Rabinovitz <madphysicist@users.noreply.github.com> Co-authored-by: François Boulogne <fboulogne@sciunto.org> Co-authored-by: Mark Boer <m.h.boer.2@gmail.com> Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com> Co-authored-by: Carlos Andrés Álvarez Restrepo <charlie_cha@outlook.com> Co-authored-by: Riadh <r.fezzani@vitadx.com>

grlee77 added 2 commits January 4, 2021 11:25

Replace correlate_sparse Cython version with a simple slicing-based v…

ee03144

…ariant in _mean_std This slicing variant has two benefits: 1.) profiled 30-40% faster on 2d and 3d cases I tested with 2.) can immediately be used by other array backends such as CuPy

remove intermediate padded_sq variable to reduce memory footprint

026ce1a

grlee77 added the 📈 type: Performance label Jan 4, 2021

rfezzani reviewed Jan 4, 2021

View reviewed changes

Update skimage/filters/_sparse.py

e35de57

remove duplicate declarations Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>

grlee77 added 2 commits January 4, 2021 16:08

move _to_np_mode and _to_ndimage_mode to _shared.utils.py

db5471b

add comment about corner_index needing to be present

e4cff50

rfezzani reviewed Jan 5, 2021

View reviewed changes

skimage/filters/_sparse.py Outdated Show resolved Hide resolved

rfezzani reviewed Jan 5, 2021

View reviewed changes

skimage/_shared/utils.py Show resolved Hide resolved

emmanuelle approved these changes Jan 5, 2021

View reviewed changes

raise RuntimeError on missing corner index

45cbad5

jni approved these changes Jan 6, 2021

View reviewed changes

update code style based on reviewer comments

3502c59

grlee77 mentioned this pull request Jan 11, 2021

2021's calendar of community management #5169

Closed

stefanv merged commit c799c7d into scikit-image:master Jan 17, 2021

grlee77 deleted the fast_mean_std branch July 8, 2021 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast, non-Cython implementation for correlate_sparse #5171

fast, non-Cython implementation for correlate_sparse #5171

grlee77 commented Jan 4, 2021

rfezzani left a comment

rfezzani Jan 4, 2021

grlee77 Jan 4, 2021

rfezzani Jan 4, 2021

grlee77 Jan 5, 2021

grlee77 Jan 5, 2021

pep8speaks commented Jan 4, 2021 •

edited

emmanuelle left a comment

jni left a comment

jni Jan 4, 2021

grlee77 Jan 6, 2021

jni Jan 7, 2021

jni Jan 4, 2021

grlee77 Jan 6, 2021

grlee77 Jan 6, 2021 •

edited

jni Jan 7, 2021

jni Jan 6, 2021

grlee77 commented Jan 6, 2021

stefanv commented Jan 17, 2021

	if (0, ) * kernel.ndim not in indices:
	if kernel.reshape(-1)[0] == 0:

fast, non-Cython implementation for correlate_sparse #5171

fast, non-Cython implementation for correlate_sparse #5171

Conversation

grlee77 commented Jan 4, 2021

Description

Benchmark Results

Checklist

For reviewers

rfezzani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Jan 4, 2021 • edited

Comment last updated at 2021-01-07 13:19:11 UTC

emmanuelle left a comment

Choose a reason for hiding this comment

jni left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grlee77 Jan 6, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grlee77 commented Jan 6, 2021

stefanv commented Jan 17, 2021

pep8speaks commented Jan 4, 2021 •

edited

grlee77 Jan 6, 2021 •

edited