Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep cufftPlan2d across ConvolveImpl::convolve calls #3386

Merged
merged 1 commit into from Dec 14, 2022

Conversation

r2d3
Copy link
Contributor

@r2d3 r2d3 commented Nov 30, 2022

on some CUDA versions creating/destroying cufftPlan2d is very time consuming we now create/destroy them in ConvolveImpl::create() when dft_size change and destroy them in the dtor

this solves issue #3385

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@r2d3 r2d3 changed the base branch from master to 4.x November 30, 2022 22:09
@r2d3 r2d3 marked this pull request as draft December 1, 2022 16:06
on some CUDA versions creating/destroying cufftPlan2d is very time consuming
we now create them in ConvolveImpl::create() and destroy them in the dtor

this solves issue opencv#3385
@r2d3 r2d3 marked this pull request as ready for review December 1, 2022 16:43
@asmorkalov asmorkalov self-requested a review December 14, 2022 11:16
@asmorkalov asmorkalov self-assigned this Dec 14, 2022
Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks a lot for the contribution. Tested manually with Ubuntu 18.04, CUDA 10.2, GF 1080. Performance improvement is visible even with existing OpenCV perf tests:

Geometric mean (ms)

                   Name of Test                     default after   after   
                                                            cufft   cufft   
                                                             fix     fix    
                                                                      vs    
                                                                   default  
                                                                  (x-factor)
Convolve::Sz_KernelSz_Ccorr::(1280x720, 17, false)   1.242  0.558    2.23   
Convolve::Sz_KernelSz_Ccorr::(1280x720, 17, true)    1.233  0.567    2.18   
Convolve::Sz_KernelSz_Ccorr::(1280x720, 27, false)   1.200  0.641    1.87   
Convolve::Sz_KernelSz_Ccorr::(1280x720, 27, true)    1.206  0.523    2.30   
Convolve::Sz_KernelSz_Ccorr::(1280x720, 32, false)   1.197  0.527    2.27   
Convolve::Sz_KernelSz_Ccorr::(1280x720, 32, true)    1.181  0.600    1.97   
Convolve::Sz_KernelSz_Ccorr::(1280x720, 64, false)   1.178  0.524    2.25   
Convolve::Sz_KernelSz_Ccorr::(1280x720, 64, true)    1.197  0.524    2.28   
Convolve::Sz_KernelSz_Ccorr::(1280x1024, 17, false)  1.423  0.816    1.74   
Convolve::Sz_KernelSz_Ccorr::(1280x1024, 17, true)   1.419  0.813    1.74   
Convolve::Sz_KernelSz_Ccorr::(1280x1024, 27, false)  1.428  0.825    1.73   
Convolve::Sz_KernelSz_Ccorr::(1280x1024, 27, true)   1.428  0.825    1.73   
Convolve::Sz_KernelSz_Ccorr::(1280x1024, 32, false)  1.419  0.820    1.73   
Convolve::Sz_KernelSz_Ccorr::(1280x1024, 32, true)   1.439  0.821    1.75   
Convolve::Sz_KernelSz_Ccorr::(1280x1024, 64, false)  1.416  0.829    1.71   
Convolve::Sz_KernelSz_Ccorr::(1280x1024, 64, true)   1.433  0.821    1.74   
Convolve::Sz_KernelSz_Ccorr::(1920x1080, 17, false)  1.978  1.278    1.55   
Convolve::Sz_KernelSz_Ccorr::(1920x1080, 17, true)   1.972  1.275    1.55   
Convolve::Sz_KernelSz_Ccorr::(1920x1080, 27, false)  1.993  1.289    1.55   
Convolve::Sz_KernelSz_Ccorr::(1920x1080, 27, true)   1.980  1.281    1.55   
Convolve::Sz_KernelSz_Ccorr::(1920x1080, 32, false)  1.979  1.280    1.55   
Convolve::Sz_KernelSz_Ccorr::(1920x1080, 32, true)   1.978  1.276    1.55   
Convolve::Sz_KernelSz_Ccorr::(1920x1080, 64, false)  1.985  1.281    1.55   
Convolve::Sz_KernelSz_Ccorr::(1920x1080, 64, true)   1.984  1.278    1.55

@asmorkalov asmorkalov merged commit c0133e5 into opencv:4.x Dec 14, 2022
@r2d3 r2d3 deleted the convolve_cuda branch December 14, 2022 15:27
@alalek alalek mentioned this pull request Jan 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants