Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added CUDA 12.4+ support #3744

Merged
merged 8 commits into from
May 30, 2024
Merged

Conversation

asmorkalov
Copy link
Contributor

@asmorkalov asmorkalov commented May 24, 2024

Tries to fix #3690 for CUDA 12.4+
Related patch to main repo: opencv/opencv#25658

Changes:

  • Added branches to support new variadic implementation of thrust::tuple
  • Added branch with std::array instead of std::tuple in split-merge and grid operations. The new branch got rid of namespace clash: cv::cuda in OpenCV and ::cuda in CUDA standard library (injected by Thrust). Old tuple branches presumed for compatibility with old code and CUDA versions before 12.4.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@Sha-x2-nk
Copy link

Hi. Your repo seems to have an error. I am compiling on CUDA 12.5
image

@asmorkalov
Copy link
Contributor Author

Yes, the solution is not complete yet.

@asmorkalov asmorkalov changed the title Added branch with variadic version of Trust tuple WIP: Added branch with variadic version of Trust tuple May 24, 2024
@cudawarped
Copy link
Contributor

Hi. Your repo seems to have an error. I am compiling on CUDA 12.5 ...image instead of text for some reason...

Your error has nothing to do with this PR. You are building against a commit in the contrib repo before #3378 was added when the line it mentions

typedef texture<T, cudaTextureType2D, cudaReadModeElementType> TexRef;

was removed.

@Sha-x2-nk
Copy link

Hi. Your repo seems to have an error. I am compiling on CUDA 12.5 ...image instead of text for some reason...

Your error has nothing to do with this PR. You are building against a commit in the contrib repo before #3378 was added when the line it mentions


typedef texture<T, cudaTextureType2D, cudaReadModeElementType> TexRef;

was removed.

Hey. So you're saying his pr not on latest opencv_contrib??

@cudawarped
Copy link
Contributor

Hey. So you're saying his pr not on latest opencv_contrib??

Exactly this PR is Open not Merged so it is on asmorkalov's private branch. Additionally you are not building against the latest commit from opencv_contrib 4.x either as indicated by your error.

@Sha-x2-nk
Copy link

Hey. So you're saying his pr not on latest opencv_contrib??

Exactly this PR is Open not Merged so it is on asmorkalov's private branch. Additionally you are not building against the latest commit from opencv_contrib 4.x either as indicated by your error.

about that, I cloned and built his repo only.

@cudawarped
Copy link
Contributor

about that, I cloned and built his repo only.

You must be building against a commit in his repo which pre-dates the PR (#3378) where the error you are getting would have been removed. You need to checkout his vardic_tuple branch.

Either way you will still get errors with CUDA 12.5 at the moment. The point is that they will be different errors which are related to this PR.

@asmorkalov asmorkalov changed the title WIP: Added branch with variadic version of Trust tuple Added branch with variadic version of Trust tuple May 29, 2024
@asmorkalov asmorkalov changed the title Added branch with variadic version of Trust tuple Added CUDA 12.4+ support May 29, 2024
@asmorkalov
Copy link
Contributor Author

@cudawarped @vrabaud could you try the PR on your side and provide comments. I tested with Ubuntu 22.04, CUDA 12.5, cuDNN 9.1 and GeForce 2080. Please let me know, if you observe issues.

@cudawarped
Copy link
Contributor

@cudawarped @vrabaud could you try the PR on your side and provide comments. I tested with Ubuntu 22.04, CUDA 12.5, cuDNN 9.1 and GeForce 2080. Please let me know, if you observe issues.

I get the following error when building with Windows 11, CUDA 12.5, cuDNN 9.1

[1761/3991] Building CUDA object modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj
FAILED: modules/photo/CMakeFiles/opencv_photo.dir/Debug/src/cuda/nlm.cu.obj
C:\PROGRA~1\NVIDIA~2\CUDA\v12.5\bin\nvcc.exe -forward-unknown-to-host-compiler -DCVAPI_EXPORTS -D_USE_MATH_DEFINES -D_VARIADIC_MAX=10 -D_WIN32_WINNT=0x0601 -D__OPENCV_BUILD=1 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DCMAKE_INTDIR=\"Debug\" -ID:\build\opencv\cuda_12_5_t\3rdparty\ippicv\ippicv_win\icv\include -ID:\build\opencv\cuda_12_5_t\3rdparty\ippicv\ippicv_win\iw\include -ID:\repos\opencv\opencv\modules\photo\src -ID:\repos\opencv\opencv\modules\photo\include -ID:\build\opencv\cuda_12_5_t\modules\photo -ID:\repos\opencv\contrib\modules\cudev\include -ID:\repos\opencv\opencv\modules\core\include -ID:\repos\opencv\contrib\modules\cudaarithm\include -ID:\repos\opencv\opencv\modules\imgproc\include -ID:\repos\opencv\contrib\modules\cudafilters\include -ID:\repos\opencv\contrib\modules\cudaimgproc\include -ID:\repos\opencv\opencv\modules\ts\include -ID:\repos\opencv\opencv\modules\imgcodecs\include -ID:\repos\opencv\opencv\modules\videoio\include -ID:\repos\opencv\opencv\modules\highgui\include -isystem D:\build\opencv\cuda_12_5_t -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -D_WINDOWS -Xcompiler=" /GR /EHsc" -Xcompiler=" -Zi -Ob0 -Od /RTC1" -std=c++14 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-MDd -D_FORCE_INLINES -Xcompiler=-DCVAPI_EXPORTS -Xcudafe --display_error_number --diag-suppress 1394,1388 "-Xcompiler= /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise /FS      /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819  /Zi /Ob0 /Od /RTC1    " -MD -MT modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj -MF modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj.d -x cu -c D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu -o modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj -Xcompiler=-Fdlib\Debug\opencv_photo4100d.pdb,-FS
D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu(421): error: no instance of overloaded function "cv::cuda::device::reduce" matches the argument list
            argument types are: (cuda::std::__4::tuple<volatile float *, volatile float *>, cuda::std::__4::tuple<float &, float &>, const unsigned int, const cuda::std::__4::tuple<cv::cuda::device::plus<float>, cv::cuda::device::plus<float>>)
                  reduce<CTA_SIZE>(Unroll<VecTraits<T>::cn>::template smem_tuple<CTA_SIZE>(cta_buffer),
                  ^
D:\repos\opencv\opencv\modules\core\include\opencv2/core/cuda/reduce.hpp(71): note #3327-D: candidate function template "cv::cuda::device::reduce<N,P0,P1,P2,P3,P4,P5,P6,P7,P8,P9,R0,R1,R2,R3,R4,R5,R6,R7,R8,R9,Op0,Op1,Op2,Op3,Op4,Op5,Op6,Op7,Op8,Op9>(const thrust::THRUST_200400_860_NS::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> &, const thrust::THRUST_200400_860_NS::tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9> &, unsigned int, const thrust::THRUST_200400_860_NS::tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9> &)" failed deduction
      __declspec(__device__) __forceinline void reduce(const thrust::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
                                                ^
D:\repos\opencv\opencv\modules\core\include\opencv2/core/cuda/reduce.hpp(63): note #3327-D: candidate function template "cv::cuda::device::reduce<N,T,Op>(volatile T *, T &, unsigned int, const Op &)" failed deduction
      __declspec(__device__) __forceinline void reduce(volatile T* smem, T& val, unsigned int tid, const Op& op)
                                                ^
          detected during:
            instantiation of "void cv::cuda::device::imgproc::FastNonLocalMeans<T>::convolve_window(int, int, const int *, T &) const [with T=uchar]" at line 472
            instantiation of "void cv::cuda::device::imgproc::FastNonLocalMeans<T>::operator()(cv::cuda::PtrStepSz<T> &) const [with T=uchar]" at line 479
            instantiation of "void cv::cuda::device::imgproc::fast_nlm_kernel(cv::cuda::device::imgproc::FastNonLocalMeans<T>, cv::cuda::PtrStepSz<T>) [with T=uchar]" at line 505
            instantiation of "void cv::cuda::device::imgproc::nlm_fast_gpu<T>(const cv::cuda::PtrStepSzb &, cv::cuda::PtrStepSzb, cv::cuda::PtrStepi, int, int, float, cudaStream_t) [with T=uchar]" at line 511

D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu(421): error: no instance of overloaded function "cv::cuda::device::reduce" matches the argument list
            argument types are: (cuda::std::__4::tuple<volatile float *, volatile float *, volatile float *>, cuda::std::__4::tuple<float &, float &, float &>, const unsigned int, const cuda::std::__4::tuple<cv::cuda::device::plus<float>, cv::cuda::device::plus<float>, cv::cuda::device::plus<float>>)
                  reduce<CTA_SIZE>(Unroll<VecTraits<T>::cn>::template smem_tuple<CTA_SIZE>(cta_buffer),

@asmorkalov
Copy link
Contributor Author

@cudawarped Thanks for the trial. You also need corresponding patch in main repo: opencv/opencv#25658

@cudawarped
Copy link
Contributor

cudawarped commented May 30, 2024

@cudawarped Thanks for the trial. You also need corresponding patch in main repo: opencv/opencv#25658

Building on Windows 11 for both CUDA 12.3 and 12.5 and passing all CUDA tests on RTX 3070 except DNN and previously failing ones (#3374).

@asmorkalov asmorkalov merged commit 1ed3dd2 into opencv:4.x May 30, 2024
11 checks passed
peters added a commit to peters/opencv_contrib that referenced this pull request Jun 3, 2024
Adds a preprocessor check to conditionally include <cuda/std/tuple> 
only for CUDA versions 12.4 and above. This ensures backward compatibility 
with older legacy CUDA versions.

Related to pull request opencv#3744
peters added a commit to peters/opencv_contrib that referenced this pull request Jun 3, 2024
Adds a preprocessor check to conditionally include <cuda/std/tuple> 
only for CUDA versions 12.4 and above. This ensures backward compatibility 
with older legacy CUDA versions.

Related to pull request opencv#3744
asmorkalov pushed a commit that referenced this pull request Jun 6, 2024
Add conditional include for <cuda/std/tuple> to support CUDA 12.4+ #3751

Fixes #3752

Adds a preprocessor check to conditionally include <cuda/std/tuple>  only for CUDA versions 12.4 and above. This ensures backward compatibility  with older legacy CUDA versions.

Related to pull request #3744

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
@mshabunin mshabunin mentioned this pull request Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA Toolkit 12.4.0 tuple incompatibility
4 participants