Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast, non-Cython implementation for correlate_sparse #5171

Merged
merged 7 commits into from
Jan 17, 2021

Conversation

grlee77
Copy link
Contributor

@grlee77 grlee77 commented Jan 4, 2021

Description

This PR replaces the recently added correlate_sparse Cython version with a simple slicing-based variant. This was done so that this function can be used with other array types implementing the NumPy API, such as CuPy. There should be no change in behavior or API.

A secondary benefit is that this also profiled modestly faster on 2d and 3d cases. See the threshold_sauvola benchmarks for v0.18.1 vs. this PR below. I tried out varying sparsity levels as well and wasn't able to find a case where it is worse than the current implementation.

Benchmark Results

v0.18.1
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_home_lee8rx_miniconda3_envs_skimage_dev_bin_python3.7
[ 25.00%] ··· benchmark_filters.ThresholdSauvolaSuite.time_sauvola 252±0ms
[ 50.00%] ··· benchmark_filters.ThresholdSauvolaSuite.time_sauvola_3d 458±0ms

This PR
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_home_lee8rx_miniconda3_envs_skimage_dev_bin_python3.7
[ 25.00%] ··· benchmark_filters.ThresholdSauvolaSuite.time_sauvola 188±0ms
[ 50.00%] ··· benchmark_filters.ThresholdSauvolaSuite.time_sauvola_3d 387±0ms

Checklist

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.

…ariant in _mean_std

This slicing variant has two benefits:
1.) profiled 30-40% faster on 2d and 3d cases I tested with
2.) can immediately be used by other array backends such as CuPy
Copy link
Member

@rfezzani rfezzani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @grlee77, I like the idea of reducing code base 😉

skimage/filters/_sparse.py Outdated Show resolved Hide resolved
values = kernel[indices].astype(padded_image.dtype, copy=False)
indices = list(zip(*indices))
kernel_indices_and_values = [(idx, v) for idx, v in zip(indices, values)]
if (0, ) * kernel.ndim not in indices:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (0, ) * kernel.ndim not in indices:
if kernel.reshape(-1)[0] == 0:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is equivalent? What I want is if the sparse kernel does not have an entry at the (0,) * ndim corner location, then an entry should be added so that there will not be a shift in the output of _correlate_sparse. (Internal to _correlate_sparse, the output array is initialized from this first point, which needs to be the corner location in order to avoid an unexpected spatial shift in the output).

See if e4cff50 makes things any clearer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't (0,) * ndim the index of the first entry of kernel? kernel.reshape(-1)[0] is the first entry of kernel. If it is 0 then (0,) * ndim is not in indices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree now that what you suggest is equivalent and would be fine with changing to that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On closer look, I would prefer to leave this check as-is so that it avoids a device->host transfer if using a CuPy backend. (For CuPy, kernel.reshape(-1)[0] will give a 0-dimensional cupy.ndarray rather than a host scalar and comparing to 0 requires overhead of transferring that value from the GPU back to the host to do the comparison)

remove duplicate declarations

Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>
@pep8speaks
Copy link

pep8speaks commented Jan 4, 2021

Hello @grlee77! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-01-07 13:19:11 UTC

skimage/filters/_sparse.py Outdated Show resolved Hide resolved
Copy link
Member

@emmanuelle emmanuelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @grlee77 ! I had the same comment as @rfezzani with respect to assertion vs error but it's not in a function of the public API and the public-API function calling the helper function will not cause problems, so this should not block the PR. Also, I noted that correlate_sparse is in the public API but it's not use in any gallery example, I'm not sure it is very discoverable for users (but once again this should not block the PR).

Copy link
Member

@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments from me. Thanks @grlee77, very nice!

# implementation assumes this corner is first in kernel_indices_in_values
assert tuple(idx) == (0, ) * image.ndim
out = _get_view(image, kernel_shape, idx, val)
if not out.flags.owndata:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that one of the goals of this PR is to work with alternative array implementations, wouldn't it be safer to just assume that we need a copy here? ie we should aim to target the Array API rather than just NumPy and CuPy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that and don't think it will substantially reduce performance, but given the reliance on scipy.ndimage elsewhere in the library for many things related to filtering/morphology/labeling (as well as usage of FFTs or sparse matrices in various places), we will already not be able to realistically do many things purely with the Array API. I think NumPy+SciPy, Dask+dask-image and CuPy are the only implementations that provide most of the needed features currently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, dask arrays don't have .flags either...

In [1]: arr = da.random.random((5,))
In [2]: arr.flags
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-c469d1119f64> in <module>
----> 1 arr.flags

AttributeError: 'Array' object has no attribute 'flags'

Comment on lines 37 to 38
elif val == -1:
return -v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this faster than -1 * v? I presume yes — then maybe add a comment above the start of the if/else to say we optimise to avoid multiplication/copy for these values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspected it may be faster, but hadn't timed it. I tried just now and it was faster on very small arrays, but the same on large arrays, so probably not important to have it as a special case

import numpy as np
a = np.ones((4096, 4096))

%timeit (-a)
22.6 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit (-1 * a)
22.6 ms ± 33.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

a = np.ones((16, 16))

%timeit (-a)
400 ns ± 2.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit (-1 * a)
913 ns ± 3.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Copy link
Contributor Author

@grlee77 grlee77 Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same relative performance pattern occurs for CuPy as reported for NumPy above (it is necessary to synchronize to measure the actual time since CUDA kernels run asynchronously, so you will only measure the CPU overhead without that)

import cupy as cp
a = cp.ones((4096, 4096))
d = cp.cuda.Device()

%timeit (-a); d.synchronize()
701 µs ± 621 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit (-1 * a); d.synchronize()
705 µs ± 713 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

a = cp.ones((16, 16))

%timeit (-a); d.synchronize()
8.89 µs ± 86.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit (-1 * a); d.synchronize()
12.8 µs ± 228 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

This also demonstrates how CuPy is much faster when working with large arrays, but NumPy is much faster on tiny arrays because it doesn't the kernel launch overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coolio, let's remove that special case then?

sum_sq_full = correlate_sparse(integral_sq, kern, mode='valid')
g2 = sum_sq_full / total_window_size
kernel_shape = tuple(_w + 1 for _w in w)
m = _correlate_sparse(integral, kernel_shape, kernel_indices_and_values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a minor point, from an API niceness standpoint I'd rather kernel_indices and kernel_values be two separate arrays that are zipped inside the function. Not critical though.

@grlee77
Copy link
Contributor Author

grlee77 commented Jan 6, 2021

The CI failures seen here should be resolved by #5175

@stefanv stefanv merged commit c799c7d into scikit-image:master Jan 17, 2021
@stefanv
Copy link
Member

stefanv commented Jan 17, 2021

Thanks, Greg!

rfezzani added a commit that referenced this pull request Jan 22, 2021
* resize using scipy.ndimage.zoom

* Clip negative values in output of RGB to HED conversion. Fixes #5164

* Fixed stain separation tests.

- Round trips start with stains, such that the roundtrip does not require
negative values.
- Using a blue+red+orange stain combination, the test previously used
a 2-stain combination that made it impossible to do a correct roundtrip.

* Improve IHC stain separation example.

* BENCH: add benchmarks for resize and rescale

* move mode conversion code to _shared.utils.py

use grid-constant and grid-wrap for SciPy >= 1.6.0

closes gh-5064

* separate grid mode conversion utility

* Update doc/examples/color_exposure/plot_ihc_color_separation.py

Improvement to the IHC example.

Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>

* add TODO for removing legacy code paths

* avoid pep8 warning in benchmark_interpolation.py

* keep same ValueError on unrecognized modes

* test_rank.py: fetch reference data once into a dictionary of arrays (#5175)

This should avoid stochastic test failures that have been occuring on GitHub Actions

* Add normalized mutual information metric (#5158)

* Add normalized mutual information metric

* Add tests for NMI

* Add NMI to metrics init

* Fix error message for incorrect target shape

* Add 3D NMI test

* properly add nmi to metrics init

* _true, _test -> 0, 1

* ravel -> reshape(-1)

* Apply suggestions from @mkcor's code review

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>

* Fix warning and exception messages

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>

* segment Other section of TODO by date

based on NEP-29 24 month window for dropping upstream package version

* Fix bug in optical flow code and examples

mode 'nearest' is a scipy.ndimage mode, but warp only accepts the numpy.pad equivalent of 'edge'

* fix additional cases of warp with 'nearest'->'edge'

* DOC: Fixup to linspace in example (#5181)

Right now the data is not sampled every half degree. Also, -pi/2 to pi/2.

* Minor fixups to Hough line transform code and examples (#5182)

* Correction to linspace call in hough_line default

`theta = np.pi / 2 - np.arange(180) / 180.0 * np.pi` done correctly below.

* BUG: Fixup to image orientation in example

* DOC: Tiny typo

* Changed range to conform to documentation

The original was [pi/2, -pi/2), not [-pi/2, pi/2). Updated wording in docs.

* Fixed x-bounds of hough image in example

* Fixed long line

* Fixes to tests

Number of lines in checkerboard, longer line in peak ordering assertion, updated angle

* Accidentally modified the wrong value.

* Updated example to include three lines

As requested in review

* Fixed off-by one error in pixel bins in hough line transform (#5183)

* Added 1/2 pixel bounds to extent of displayed images (#5184)

* fast, non-Cython implementation for correlate_sparse (#5171)

* Replace correlate_sparse Cython version with a simple slicing-based variant in _mean_std

This slicing variant has two benefits:
1.) profiled 30-40% faster on 2d and 3d cases I tested with
2.) can immediately be used by other array backends such as CuPy

* remove intermediate padded_sq variable to reduce memory footprint

* Update skimage/filters/_sparse.py

remove duplicate declarations

Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>

* move _to_np_mode and _to_ndimage_mode to _shared.utils.py

* add comment about corner_index needing to be present

* raise RuntimeError on missing corner index

* update code style based on reviewer comments

Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>

* Add release step on github to RELEASE.txt (See #5185) (#5187)

Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>

* Prevent integer overflow in EllipseModel (#5179)

* Change dtype of CircleModel and EllipseModel to float, to avoid integer overflows

* remove failing test and add code review notes

* Apply suggestions from code review

Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>

* Remove unused pytest import

Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>

Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>

* Add saturation parameter to color.label2rgb (#5156)

* Remove reference to opencv in threshold_local documentation (#5191)

Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>
Co-authored-by: Riadh <r.fezzani@vitadx.com>

* resize using scipy.ndimage.zoom

* BENCH: add benchmarks for resize and rescale

* move mode conversion code to _shared.utils.py

use grid-constant and grid-wrap for SciPy >= 1.6.0

closes gh-5064

* separate grid mode conversion utility

* add TODO for removing legacy code paths

* avoid pep8 warning in benchmark_interpolation.py

* keep same ValueError on unrecognized modes

* segment Other section of TODO by date

based on NEP-29 24 month window for dropping upstream package version

* Fix bug in optical flow code and examples

mode 'nearest' is a scipy.ndimage mode, but warp only accepts the numpy.pad equivalent of 'edge'

* fix additional cases of warp with 'nearest'->'edge'

* Apply suggestions from code review

Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>

Co-authored-by: Cris Luengo <cris.l.luengo@gmail.com>
Co-authored-by: Cris Luengo <crisluengo@users.noreply.github.com>
Co-authored-by: Riadh Fezzani <rfezzani@gmail.com>
Co-authored-by: Alexandre de Siqueira <alex.desiqueira@igdore.org>
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>
Co-authored-by: Joseph Fox-Rabinovitz <madphysicist@users.noreply.github.com>
Co-authored-by: François Boulogne <fboulogne@sciunto.org>
Co-authored-by: Mark Boer <m.h.boer.2@gmail.com>
Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
Co-authored-by: Carlos Andrés Álvarez Restrepo <charlie_cha@outlook.com>
Co-authored-by: Riadh <r.fezzani@vitadx.com>
@grlee77 grlee77 deleted the fast_mean_std branch July 8, 2021 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants