Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Richardson-Lucy deconvolution: allow single-precision computation #4880

Merged

Conversation

grlee77
Copy link
Contributor

@grlee77 grlee77 commented Aug 2, 2020

Description

This PR improves the performance of deconvolution by keeping the computations in single-precision when the input image is single-precision.

It also avoids unnecessary copies of the input arrays by adding copy=False to existing .astype() calls.

This can give up to a factor 2 performance improvement when calling Richardson Lucy either single-threaded or in various multi-threaded scenarios. See the timings for double-precision in #4083 (comment) vs. single-precision in #4083 (comment).

If this seems reasonable to others, I can also adapt the example from the comments into an ASV benchmark.

Checklist

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.

@emmanuelle
Copy link
Member

Thanks for the PR! An ASV benchmark would be great if you find the time :-).

@grlee77
Copy link
Contributor Author

grlee77 commented Aug 5, 2020

I added some simple ASV benchmarks for richardson_lucy. There is also a separate commit that fixed the existing restoration benchmarks, which were currently failing to run due to a misnamed variable!

As expected, float32 operation is now faster. This PR also reduced peak memory use a bit in the float64 case by avoiding internal copies.

Result of benchmarks for release 0.17.2

[  0.00%] ·· Benchmarking existing-py_home_lee8rx_miniconda3_envs_pyir_bin_python3.7
[ 40.00%] ··· Running (benchmark_restoration.DeconvolutionSuite.time_denoise_nl_means_f32--)..
[ 60.00%] ··· benchmark_restoration.DeconvolutionSuite.peakmem_denoise_nl_means_f32                       293M
[ 70.00%] ··· benchmark_restoration.DeconvolutionSuite.peakmem_richardson_lucy_f64                        293M
[ 80.00%] ··· benchmark_restoration.DeconvolutionSuite.peakmem_setup                                      124M
[ 90.00%] ··· benchmark_restoration.DeconvolutionSuite.time_denoise_nl_means_f32                    2.32±0.01s
[100.00%] ··· benchmark_restoration.DeconvolutionSuite.time_richardson_lucy_f64                     2.33±0.02s

Result of benchmarks for this PR

[  0.00%] ·· Benchmarking existing-py_home_lee8rx_miniconda3_envs_pyir_bin_python3.7
[ 40.00%] ··· Running (benchmark_restoration.DeconvolutionSuite.time_denoise_nl_means_f32--)..
[ 60.00%] ··· benchmark_restoration.DeconvolutionSuite.peakmem_richardson_lucy_f32                        200M
[ 70.00%] ··· benchmark_restoration.DeconvolutionSuite.peakmem_richardson_lucy_f64                        276M
[ 80.00%] ··· benchmark_restoration.DeconvolutionSuite.peakmem_setup                                      124M
[ 90.00%] ··· benchmark_restoration.DeconvolutionSuite.time_richardson_lucy_f32                     1.64±0.01s
[100.00%] ··· benchmark_restoration.DeconvolutionSuite.time_richardson_lucy_f64                     2.30±0.01s

@grlee77
Copy link
Contributor Author

grlee77 commented Aug 5, 2020

For the benchmarks, I ended up making a new DeconvolutionSuite class since the setup method is different to the one in the existing RestorationSuite class (need to store a PDF and convolve it with the input image).

@grlee77
Copy link
Contributor Author

grlee77 commented Aug 6, 2020

I think maybe the doc failure might be because I haven't rebased on master since that PR was merged?

@grlee77 grlee77 force-pushed the richardson_lucy_preserve_single branch from 073ec41 to c1c2c1c Compare August 6, 2020 01:11
Copy link
Member

@emmanuelle emmanuelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @grlee77 !

@jni
Copy link
Member

jni commented Aug 12, 2020

@grlee77 why is time RL f32 not shown in your benchmarks result?

@grlee77
Copy link
Contributor Author

grlee77 commented Aug 14, 2020

@grlee77 why is time RL f32 not shown in your benchmarks result?

It is there, but was misnamed as "time_denoise_nl_means_f32" due to a copy paste error that was later fixed in bf38f4c.

I will edit the comment above to reflect the correct naming.

@jni
Copy link
Member

jni commented Aug 16, 2020

@grlee77 this is then ready after that fix and a rebase/merge on master!

If the input image is single precision, all computations and the output will be in
single precision.

Internal copies are avoided when possible.
rgb2gray is not needed for the grayscale camera image.

img_as_float used to rescale to [0, 1] so clip=False is not needed
in the richardson_lucy call.
@grlee77 grlee77 force-pushed the richardson_lucy_preserve_single branch from bf38f4c to 97f0546 Compare August 17, 2020 05:29
@alexdesiqueira alexdesiqueira merged commit 8202192 into scikit-image:master Aug 17, 2020
@alexdesiqueira
Copy link
Member

Thank you @grlee77!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants