-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fast, non-Cython implementation for correlate_sparse #5171
Conversation
…ariant in _mean_std This slicing variant has two benefits: 1.) profiled 30-40% faster on 2d and 3d cases I tested with 2.) can immediately be used by other array backends such as CuPy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @grlee77, I like the idea of reducing code base 😉
skimage/filters/_sparse.py
Outdated
values = kernel[indices].astype(padded_image.dtype, copy=False) | ||
indices = list(zip(*indices)) | ||
kernel_indices_and_values = [(idx, v) for idx, v in zip(indices, values)] | ||
if (0, ) * kernel.ndim not in indices: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (0, ) * kernel.ndim not in indices: | |
if kernel.reshape(-1)[0] == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is equivalent? What I want is if the sparse kernel does not have an entry at the (0,) * ndim
corner location, then an entry should be added so that there will not be a shift in the output of _correlate_sparse
. (Internal to _correlate_sparse
, the output array is initialized from this first point, which needs to be the corner location in order to avoid an unexpected spatial shift in the output).
See if e4cff50 makes things any clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't (0,) * ndim
the index of the first entry of kernel
? kernel.reshape(-1)[0]
is the first entry of kernel
. If it is 0
then (0,) * ndim
is not in indices
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree now that what you suggest is equivalent and would be fine with changing to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On closer look, I would prefer to leave this check as-is so that it avoids a device->host transfer if using a CuPy backend. (For CuPy, kernel.reshape(-1)[0]
will give a 0-dimensional cupy.ndarray
rather than a host scalar and comparing to 0 requires overhead of transferring that value from the GPU back to the host to do the comparison)
remove duplicate declarations Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @grlee77 ! I had the same comment as @rfezzani with respect to assertion vs error but it's not in a function of the public API and the public-API function calling the helper function will not cause problems, so this should not block the PR. Also, I noted that correlate_sparse
is in the public API but it's not use in any gallery example, I'm not sure it is very discoverable for users (but once again this should not block the PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments from me. Thanks @grlee77, very nice!
skimage/filters/_sparse.py
Outdated
# implementation assumes this corner is first in kernel_indices_in_values | ||
assert tuple(idx) == (0, ) * image.ndim | ||
out = _get_view(image, kernel_shape, idx, val) | ||
if not out.flags.owndata: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that one of the goals of this PR is to work with alternative array implementations, wouldn't it be safer to just assume that we need a copy here? ie we should aim to target the Array API rather than just NumPy and CuPy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with that and don't think it will substantially reduce performance, but given the reliance on scipy.ndimage
elsewhere in the library for many things related to filtering/morphology/labeling (as well as usage of FFTs or sparse matrices in various places), we will already not be able to realistically do many things purely with the Array API. I think NumPy+SciPy, Dask+dask-image and CuPy are the only implementations that provide most of the needed features currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, dask arrays don't have .flags either...
In [1]: arr = da.random.random((5,))
In [2]: arr.flags
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-c469d1119f64> in <module>
----> 1 arr.flags
AttributeError: 'Array' object has no attribute 'flags'
skimage/filters/_sparse.py
Outdated
elif val == -1: | ||
return -v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this faster than -1 * v
? I presume yes — then maybe add a comment above the start of the if/else to say we optimise to avoid multiplication/copy for these values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspected it may be faster, but hadn't timed it. I tried just now and it was faster on very small arrays, but the same on large arrays, so probably not important to have it as a special case
import numpy as np
a = np.ones((4096, 4096))
%timeit (-a)
22.6 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit (-1 * a)
22.6 ms ± 33.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
a = np.ones((16, 16))
%timeit (-a)
400 ns ± 2.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit (-1 * a)
913 ns ± 3.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same relative performance pattern occurs for CuPy as reported for NumPy above (it is necessary to synchronize to measure the actual time since CUDA kernels run asynchronously, so you will only measure the CPU overhead without that)
import cupy as cp
a = cp.ones((4096, 4096))
d = cp.cuda.Device()
%timeit (-a); d.synchronize()
701 µs ± 621 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit (-1 * a); d.synchronize()
705 µs ± 713 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
a = cp.ones((16, 16))
%timeit (-a); d.synchronize()
8.89 µs ± 86.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit (-1 * a); d.synchronize()
12.8 µs ± 228 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
This also demonstrates how CuPy is much faster when working with large arrays, but NumPy is much faster on tiny arrays because it doesn't the kernel launch overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coolio, let's remove that special case then?
skimage/filters/thresholding.py
Outdated
sum_sq_full = correlate_sparse(integral_sq, kern, mode='valid') | ||
g2 = sum_sq_full / total_window_size | ||
kernel_shape = tuple(_w + 1 for _w in w) | ||
m = _correlate_sparse(integral, kernel_shape, kernel_indices_and_values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a minor point, from an API niceness standpoint I'd rather kernel_indices
and kernel_values
be two separate arrays that are zipped inside the function. Not critical though.
The CI failures seen here should be resolved by #5175 |
Thanks, Greg! |
* resize using scipy.ndimage.zoom * Clip negative values in output of RGB to HED conversion. Fixes #5164 * Fixed stain separation tests. - Round trips start with stains, such that the roundtrip does not require negative values. - Using a blue+red+orange stain combination, the test previously used a 2-stain combination that made it impossible to do a correct roundtrip. * Improve IHC stain separation example. * BENCH: add benchmarks for resize and rescale * move mode conversion code to _shared.utils.py use grid-constant and grid-wrap for SciPy >= 1.6.0 closes gh-5064 * separate grid mode conversion utility * Update doc/examples/color_exposure/plot_ihc_color_separation.py Improvement to the IHC example. Co-authored-by: Riadh Fezzani <rfezzani@gmail.com> * add TODO for removing legacy code paths * avoid pep8 warning in benchmark_interpolation.py * keep same ValueError on unrecognized modes * test_rank.py: fetch reference data once into a dictionary of arrays (#5175) This should avoid stochastic test failures that have been occuring on GitHub Actions * Add normalized mutual information metric (#5158) * Add normalized mutual information metric * Add tests for NMI * Add NMI to metrics init * Fix error message for incorrect target shape * Add 3D NMI test * properly add nmi to metrics init * _true, _test -> 0, 1 * ravel -> reshape(-1) * Apply suggestions from @mkcor's code review Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org> * Fix warning and exception messages Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org> * segment Other section of TODO by date based on NEP-29 24 month window for dropping upstream package version * Fix bug in optical flow code and examples mode 'nearest' is a scipy.ndimage mode, but warp only accepts the numpy.pad equivalent of 'edge' * fix additional cases of warp with 'nearest'->'edge' * DOC: Fixup to linspace in example (#5181) Right now the data is not sampled every half degree. Also, -pi/2 to pi/2. * Minor fixups to Hough line transform code and examples (#5182) * Correction to linspace call in hough_line default `theta = np.pi / 2 - np.arange(180) / 180.0 * np.pi` done correctly below. * BUG: Fixup to image orientation in example * DOC: Tiny typo * Changed range to conform to documentation The original was [pi/2, -pi/2), not [-pi/2, pi/2). Updated wording in docs. * Fixed x-bounds of hough image in example * Fixed long line * Fixes to tests Number of lines in checkerboard, longer line in peak ordering assertion, updated angle * Accidentally modified the wrong value. * Updated example to include three lines As requested in review * Fixed off-by one error in pixel bins in hough line transform (#5183) * Added 1/2 pixel bounds to extent of displayed images (#5184) * fast, non-Cython implementation for correlate_sparse (#5171) * Replace correlate_sparse Cython version with a simple slicing-based variant in _mean_std This slicing variant has two benefits: 1.) profiled 30-40% faster on 2d and 3d cases I tested with 2.) can immediately be used by other array backends such as CuPy * remove intermediate padded_sq variable to reduce memory footprint * Update skimage/filters/_sparse.py remove duplicate declarations Co-authored-by: Riadh Fezzani <rfezzani@gmail.com> * move _to_np_mode and _to_ndimage_mode to _shared.utils.py * add comment about corner_index needing to be present * raise RuntimeError on missing corner index * update code style based on reviewer comments Co-authored-by: Riadh Fezzani <rfezzani@gmail.com> * Add release step on github to RELEASE.txt (See #5185) (#5187) Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu> * Prevent integer overflow in EllipseModel (#5179) * Change dtype of CircleModel and EllipseModel to float, to avoid integer overflows * remove failing test and add code review notes * Apply suggestions from code review Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com> * Remove unused pytest import Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com> Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com> Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu> * Add saturation parameter to color.label2rgb (#5156) * Remove reference to opencv in threshold_local documentation (#5191) Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org> Co-authored-by: Riadh <r.fezzani@vitadx.com> * resize using scipy.ndimage.zoom * BENCH: add benchmarks for resize and rescale * move mode conversion code to _shared.utils.py use grid-constant and grid-wrap for SciPy >= 1.6.0 closes gh-5064 * separate grid mode conversion utility * add TODO for removing legacy code paths * avoid pep8 warning in benchmark_interpolation.py * keep same ValueError on unrecognized modes * segment Other section of TODO by date based on NEP-29 24 month window for dropping upstream package version * Fix bug in optical flow code and examples mode 'nearest' is a scipy.ndimage mode, but warp only accepts the numpy.pad equivalent of 'edge' * fix additional cases of warp with 'nearest'->'edge' * Apply suggestions from code review Co-authored-by: Riadh Fezzani <rfezzani@gmail.com> Co-authored-by: Cris Luengo <cris.l.luengo@gmail.com> Co-authored-by: Cris Luengo <crisluengo@users.noreply.github.com> Co-authored-by: Riadh Fezzani <rfezzani@gmail.com> Co-authored-by: Alexandre de Siqueira <alex.desiqueira@igdore.org> Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu> Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org> Co-authored-by: Joseph Fox-Rabinovitz <madphysicist@users.noreply.github.com> Co-authored-by: François Boulogne <fboulogne@sciunto.org> Co-authored-by: Mark Boer <m.h.boer.2@gmail.com> Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com> Co-authored-by: Carlos Andrés Álvarez Restrepo <charlie_cha@outlook.com> Co-authored-by: Riadh <r.fezzani@vitadx.com>
Description
This PR replaces the recently added
correlate_sparse
Cython version with a simple slicing-based variant. This was done so that this function can be used with other array types implementing the NumPy API, such as CuPy. There should be no change in behavior or API.A secondary benefit is that this also profiled modestly faster on 2d and 3d cases. See the
threshold_sauvola
benchmarks for v0.18.1 vs. this PR below. I tried out varying sparsity levels as well and wasn't able to find a case where it is worse than the current implementation.Benchmark Results
v0.18.1
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_home_lee8rx_miniconda3_envs_skimage_dev_bin_python3.7
[ 25.00%] ··· benchmark_filters.ThresholdSauvolaSuite.time_sauvola 252±0ms
[ 50.00%] ··· benchmark_filters.ThresholdSauvolaSuite.time_sauvola_3d 458±0ms
This PR
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_home_lee8rx_miniconda3_envs_skimage_dev_bin_python3.7
[ 25.00%] ··· benchmark_filters.ThresholdSauvolaSuite.time_sauvola 188±0ms
[ 50.00%] ··· benchmark_filters.ThresholdSauvolaSuite.time_sauvola_3d 387±0ms
Checklist
./doc/examples
(new features only)./benchmarks
, if your changes aren't covered by anexisting benchmark
For reviewers
later.
__init__.py
.doc/release/release_dev.rst
.