Added CUDA 12.4+ support #3744

asmorkalov · 2024-05-24T12:16:17Z

Tries to fix #3690 for CUDA 12.4+
Related patch to main repo: opencv/opencv#25658

Changes:

Added branches to support new variadic implementation of thrust::tuple
Added branch with std::array instead of std::tuple in split-merge and grid operations. The new branch got rid of namespace clash: cv::cuda in OpenCV and ::cuda in CUDA standard library (injected by Thrust). Old tuple branches presumed for compatibility with old code and CUDA versions before 12.4.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

Sha-x2-nk · 2024-05-24T15:35:02Z

Hi. Your repo seems to have an error. I am compiling on CUDA 12.5

asmorkalov · 2024-05-24T17:52:32Z

Yes, the solution is not complete yet.

cudawarped · 2024-05-24T19:12:10Z

Hi. Your repo seems to have an error. I am compiling on CUDA 12.5 ...image instead of text for some reason...

Your error has nothing to do with this PR. You are building against a commit in the contrib repo before #3378 was added when the line it mentions

typedef texture<T, cudaTextureType2D, cudaReadModeElementType> TexRef;

was removed.

Sha-x2-nk · 2024-05-25T02:48:20Z

Hi. Your repo seems to have an error. I am compiling on CUDA 12.5 ...image instead of text for some reason...

Your error has nothing to do with this PR. You are building against a commit in the contrib repo before #3378 was added when the line it mentions
typedef texture<T, cudaTextureType2D, cudaReadModeElementType> TexRef;
was removed.

Hey. So you're saying his pr not on latest opencv_contrib??

cudawarped · 2024-05-25T04:57:34Z

Hey. So you're saying his pr not on latest opencv_contrib??

Exactly this PR is Open not Merged so it is on asmorkalov's private branch. Additionally you are not building against the latest commit from opencv_contrib 4.x either as indicated by your error.

Sha-x2-nk · 2024-05-25T05:00:24Z

Hey. So you're saying his pr not on latest opencv_contrib??

Exactly this PR is Open not Merged so it is on asmorkalov's private branch. Additionally you are not building against the latest commit from opencv_contrib 4.x either as indicated by your error.

about that, I cloned and built his repo only.

cudawarped · 2024-05-25T05:08:37Z

about that, I cloned and built his repo only.

You must be building against a commit in his repo which pre-dates the PR (#3378) where the error you are getting would have been removed. You need to checkout his vardic_tuple branch.

Either way you will still get errors with CUDA 12.5 at the moment. The point is that they will be different errors which are related to this PR.

asmorkalov · 2024-05-29T17:34:00Z

@cudawarped @vrabaud could you try the PR on your side and provide comments. I tested with Ubuntu 22.04, CUDA 12.5, cuDNN 9.1 and GeForce 2080. Please let me know, if you observe issues.

cudawarped · 2024-05-29T19:42:48Z

@cudawarped @vrabaud could you try the PR on your side and provide comments. I tested with Ubuntu 22.04, CUDA 12.5, cuDNN 9.1 and GeForce 2080. Please let me know, if you observe issues.

I get the following error when building with Windows 11, CUDA 12.5, cuDNN 9.1

[1761/3991] Building CUDA object modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj
FAILED: modules/photo/CMakeFiles/opencv_photo.dir/Debug/src/cuda/nlm.cu.obj
C:\PROGRA~1\NVIDIA~2\CUDA\v12.5\bin\nvcc.exe -forward-unknown-to-host-compiler -DCVAPI_EXPORTS -D_USE_MATH_DEFINES -D_VARIADIC_MAX=10 -D_WIN32_WINNT=0x0601 -D__OPENCV_BUILD=1 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DCMAKE_INTDIR=\"Debug\" -ID:\build\opencv\cuda_12_5_t\3rdparty\ippicv\ippicv_win\icv\include -ID:\build\opencv\cuda_12_5_t\3rdparty\ippicv\ippicv_win\iw\include -ID:\repos\opencv\opencv\modules\photo\src -ID:\repos\opencv\opencv\modules\photo\include -ID:\build\opencv\cuda_12_5_t\modules\photo -ID:\repos\opencv\contrib\modules\cudev\include -ID:\repos\opencv\opencv\modules\core\include -ID:\repos\opencv\contrib\modules\cudaarithm\include -ID:\repos\opencv\opencv\modules\imgproc\include -ID:\repos\opencv\contrib\modules\cudafilters\include -ID:\repos\opencv\contrib\modules\cudaimgproc\include -ID:\repos\opencv\opencv\modules\ts\include -ID:\repos\opencv\opencv\modules\imgcodecs\include -ID:\repos\opencv\opencv\modules\videoio\include -ID:\repos\opencv\opencv\modules\highgui\include -isystem D:\build\opencv\cuda_12_5_t -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -D_WINDOWS -Xcompiler=" /GR /EHsc" -Xcompiler=" -Zi -Ob0 -Od /RTC1" -std=c++14 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-MDd -D_FORCE_INLINES -Xcompiler=-DCVAPI_EXPORTS -Xcudafe --display_error_number --diag-suppress 1394,1388 "-Xcompiler= /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise /FS      /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819  /Zi /Ob0 /Od /RTC1    " -MD -MT modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj -MF modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj.d -x cu -c D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu -o modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj -Xcompiler=-Fdlib\Debug\opencv_photo4100d.pdb,-FS
D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu(421): error: no instance of overloaded function "cv::cuda::device::reduce" matches the argument list
            argument types are: (cuda::std::__4::tuple<volatile float *, volatile float *>, cuda::std::__4::tuple<float &, float &>, const unsigned int, const cuda::std::__4::tuple<cv::cuda::device::plus<float>, cv::cuda::device::plus<float>>)
                  reduce<CTA_SIZE>(Unroll<VecTraits<T>::cn>::template smem_tuple<CTA_SIZE>(cta_buffer),
                  ^
D:\repos\opencv\opencv\modules\core\include\opencv2/core/cuda/reduce.hpp(71): note #3327-D: candidate function template "cv::cuda::device::reduce<N,P0,P1,P2,P3,P4,P5,P6,P7,P8,P9,R0,R1,R2,R3,R4,R5,R6,R7,R8,R9,Op0,Op1,Op2,Op3,Op4,Op5,Op6,Op7,Op8,Op9>(const thrust::THRUST_200400_860_NS::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> &, const thrust::THRUST_200400_860_NS::tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9> &, unsigned int, const thrust::THRUST_200400_860_NS::tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9> &)" failed deduction
      __declspec(__device__) __forceinline void reduce(const thrust::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
                                                ^
D:\repos\opencv\opencv\modules\core\include\opencv2/core/cuda/reduce.hpp(63): note #3327-D: candidate function template "cv::cuda::device::reduce<N,T,Op>(volatile T *, T &, unsigned int, const Op &)" failed deduction
      __declspec(__device__) __forceinline void reduce(volatile T* smem, T& val, unsigned int tid, const Op& op)
                                                ^
          detected during:
            instantiation of "void cv::cuda::device::imgproc::FastNonLocalMeans<T>::convolve_window(int, int, const int *, T &) const [with T=uchar]" at line 472
            instantiation of "void cv::cuda::device::imgproc::FastNonLocalMeans<T>::operator()(cv::cuda::PtrStepSz<T> &) const [with T=uchar]" at line 479
            instantiation of "void cv::cuda::device::imgproc::fast_nlm_kernel(cv::cuda::device::imgproc::FastNonLocalMeans<T>, cv::cuda::PtrStepSz<T>) [with T=uchar]" at line 505
            instantiation of "void cv::cuda::device::imgproc::nlm_fast_gpu<T>(const cv::cuda::PtrStepSzb &, cv::cuda::PtrStepSzb, cv::cuda::PtrStepi, int, int, float, cudaStream_t) [with T=uchar]" at line 511

D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu(421): error: no instance of overloaded function "cv::cuda::device::reduce" matches the argument list
            argument types are: (cuda::std::__4::tuple<volatile float *, volatile float *, volatile float *>, cuda::std::__4::tuple<float &, float &, float &>, const unsigned int, const cuda::std::__4::tuple<cv::cuda::device::plus<float>, cv::cuda::device::plus<float>, cv::cuda::device::plus<float>>)
                  reduce<CTA_SIZE>(Unroll<VecTraits<T>::cn>::template smem_tuple<CTA_SIZE>(cta_buffer),

asmorkalov · 2024-05-30T09:37:00Z

@cudawarped Thanks for the trial. You also need corresponding patch in main repo: opencv/opencv#25658

modules/cudaimgproc/src/histogram.cpp

modules/cudev/include/opencv2/cudev/grid/detail/split_merge.hpp

modules/cudev/include/opencv2/cudev/grid/detail/transform.hpp

…UDA 12.4).

cudawarped · 2024-05-30T11:43:14Z

@cudawarped Thanks for the trial. You also need corresponding patch in main repo: opencv/opencv#25658

Building on Windows 11 for both CUDA 12.3 and 12.5 and passing all CUDA tests on RTX 3070 except DNN and previously failing ones (#3374).

Adds a preprocessor check to conditionally include <cuda/std/tuple> only for CUDA versions 12.4 and above. This ensures backward compatibility with older legacy CUDA versions. Related to pull request opencv#3744

Add conditional include for <cuda/std/tuple> to support CUDA 12.4+ #3751 Fixes #3752 Adds a preprocessor check to conditionally include <cuda/std/tuple> only for CUDA versions 12.4 and above. This ensures backward compatibility with older legacy CUDA versions. Related to pull request #3744 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

asmorkalov changed the title ~~Added branch with variadic version of Trust tuple~~ WIP: Added branch with variadic version of Trust tuple May 24, 2024

asmorkalov mentioned this pull request May 28, 2024

Added branch with variadic version of Trust tuple opencv/opencv#25658

Merged

6 tasks

asmorkalov changed the title ~~WIP: Added branch with variadic version of Trust tuple~~ Added branch with variadic version of Trust tuple May 29, 2024

asmorkalov changed the title ~~Added branch with variadic version of Trust tuple~~ Added CUDA 12.4+ support May 29, 2024

asmorkalov requested a review from VadimLevin May 29, 2024 17:34

asmorkalov added category: build/install category: cuda labels May 29, 2024

VadimLevin reviewed May 30, 2024

View reviewed changes

modules/cudaimgproc/src/histogram.cpp Outdated Show resolved Hide resolved

modules/cudev/include/opencv2/cudev/grid/detail/split_merge.hpp Outdated Show resolved Hide resolved

modules/cudev/include/opencv2/cudev/grid/detail/transform.hpp Show resolved Hide resolved

asmorkalov added 7 commits May 30, 2024 14:42

Added branch compatible with variadic tuple version in Trust (since C…

7e8ae30

…UDA 12.4).

More tuple fixes.

153e8e1

Temporary hack.

46dc673

Fixed compilation issue with tuple_size()

350c5e5

Added cuda merge branch with vector instead of tuple.

4fcbc9c

Got rid of tuple in polar-cart operations.

7a41af5

Use std::array instead of std::vector to reduce memory allocations.

56a6cdf

asmorkalov force-pushed the as/variadic_tuple branch from a3caa83 to 56a6cdf Compare May 30, 2024 11:43

Added comments about cuda namespace clash.

4b25fc4

VadimLevin approved these changes May 30, 2024

View reviewed changes

asmorkalov assigned VadimLevin May 30, 2024

asmorkalov merged commit 1ed3dd2 into opencv:4.x May 30, 2024
11 checks passed

peters mentioned this pull request Jun 3, 2024

Add conditional include for <cuda/std/tuple> to support CUDA 12.4+ peters/opencv_contrib#1

Closed

5 tasks

peters mentioned this pull request Jun 3, 2024

Add conditional include for <cuda/std/tuple> to support CUDA 12.4+ #3751

Merged

6 tasks

mshabunin mentioned this pull request Jun 14, 2024

Merge 4.x -> 5.x #3758

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added CUDA 12.4+ support #3744

Added CUDA 12.4+ support #3744

asmorkalov commented May 24, 2024 •

edited

Sha-x2-nk commented May 24, 2024

asmorkalov commented May 24, 2024

cudawarped commented May 24, 2024

Sha-x2-nk commented May 25, 2024

cudawarped commented May 25, 2024

Sha-x2-nk commented May 25, 2024

cudawarped commented May 25, 2024

asmorkalov commented May 29, 2024

cudawarped commented May 29, 2024

asmorkalov commented May 30, 2024

cudawarped commented May 30, 2024 •

edited

Added CUDA 12.4+ support #3744

Added CUDA 12.4+ support #3744

Conversation

asmorkalov commented May 24, 2024 • edited

Pull Request Readiness Checklist

Sha-x2-nk commented May 24, 2024

asmorkalov commented May 24, 2024

cudawarped commented May 24, 2024

Sha-x2-nk commented May 25, 2024

cudawarped commented May 25, 2024

Sha-x2-nk commented May 25, 2024

cudawarped commented May 25, 2024

asmorkalov commented May 29, 2024

cudawarped commented May 29, 2024

asmorkalov commented May 30, 2024

cudawarped commented May 30, 2024 • edited

asmorkalov commented May 24, 2024 •

edited

cudawarped commented May 30, 2024 •

edited