Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to build against cudnn 8.0.2 #18030

Closed
jiapei100 opened this issue Aug 4, 2020 · 5 comments · Fixed by #18060
Closed

Failed to build against cudnn 8.0.2 #18030

jiapei100 opened this issue Aug 4, 2020 · 5 comments · Fixed by #18060

Comments

@jiapei100
Copy link

jiapei100 commented Aug 4, 2020

System information (version)
  • OpenCV => 4.4
  • Operating System / Platform => Ubuntu 20.04
  • Compiler => gcc/g++ 9.3.0
  • Cuda => 11.0
  • NVidia Driver => 450.57
  • GPU: Geforce 2080 Ti
Detailed description
[ 32%] Building NVCC (Device) object modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o
cd ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda && /snap/cmake/513/bin/cmake -E make_directory ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/.
cd ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda && /snap/cmake/513/bin/cmake -D verbose:BOOL=ON -D build_configuration:STRING=Release -D generated_file:STRING=....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_detection_output.cu.o -D generated_cubin_file:STRING=....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_detection_output.cu.o.cubin.txt -P ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.Release.cmake
-- Removing ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_detection_output.cu.o
/snap/cmake/513/bin/cmake -E rm -f ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_detection_output.cu.o
-- Generating dependency file: ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.NVCC-depend
/usr/local/cuda/bin/nvcc -M -D__CUDACC__ ....../opencv/modules/dnn/src/cuda/detection_output.cu -o ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.NVCC-depend -ccbin /usr/bin/cc -m64 -DVK_NO_PROTOTYPES -D__OPENCV_BUILD=1 -D_USE_MATH_DEFINES -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -D__STDC_FORMAT_MACROS -DCV_CUDA4DNN=1 -DOPENCV_DNN_EXTERNAL_PROTOBUF=1 -DHAVE_PROTOBUF=1 -Xcompiler ,\"-fsigned-char\",\"-ffast-math\",\"-W\",\"-Wall\",\"-Werror=return-type\",\"-Werror=non-virtual-dtor\",\"-Werror=address\",\"-Werror=sequence-point\",\"-Wformat\",\"-Werror=format-security\",\"-Winit-self\",\"-Wpointer-arith\",\"-Wuninitialized\",\"-Winit-self\",\"-Wno-comment\",\"-Wno-strict-overflow\",\"-fdiagnostics-show-option\",\"-Wno-long-long\",\"-pthread\",\"-fno-omit-frame-pointer\",\"-pg\",\"-g\",\"-msse\",\"-msse2\",\"-msse3\",\"-fvisibility=hidden\",\"-fopenmp\",\"-Wno-deprecated\",\"-Wno-missing-declarations\",\"-Wno-shadow\",\"-Wno-unused-parameter\",\"-Wno-sign-compare\",\"-Wno-undef\",\"-Wno-invalid-offsetof\",\"-Wno-unused-but-set-variable\",\"-O3\",\"-DNDEBUG\",\"-DNDEBUG\" -gencode arch=compute_75,code=sm_75 -D_FORCE_INLINES --use_fast_math -Xcompiler -DCVAPI_EXPORTS -Xcompiler -fPIC --std=c++14 -DNVCC -I/usr/local/cuda/include -I....../opencv/build -I/usr/include/va -I/usr/include/eigen3/Eigen -I/usr/include/x86_64-linux-gnu -I....../opencv/3rdparty/include -I....../opencv/modules/dnn/include -I....../opencv/build/modules/dnn -I....../opencv_contrib/modules/cudev/include -I....../opencv/modules/core/include -I....../opencv/modules/imgproc/include -I/usr/local/include
-- Generating temporary cmake readable file: ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.depend.tmp
/snap/cmake/513/bin/cmake -D input_file:FILEPATH=....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.NVCC-depend -D output_file:FILEPATH=....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.depend.tmp -D verbose=ON -P /snap/cmake/513/share/cmake-3.18/Modules/FindCUDA/make2cmake.cmake
-- Copy if different ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.depend.tmp to ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.depend
/snap/cmake/513/bin/cmake -E copy_if_different ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.depend.tmp ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.depend
-- Removing ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.depend.tmp and ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.NVCC-depend
/snap/cmake/513/bin/cmake -E rm -f ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.depend.tmp ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o.NVCC-depend
-- Generating ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_detection_output.cu.o
/usr/local/cuda/bin/nvcc ....../opencv/modules/dnn/src/cuda/detection_output.cu -c -o ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_detection_output.cu.o -ccbin /usr/bin/cc -m64 -DVK_NO_PROTOTYPES -D__OPENCV_BUILD=1 -D_USE_MATH_DEFINES -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -D__STDC_FORMAT_MACROS -DCV_CUDA4DNN=1 -DOPENCV_DNN_EXTERNAL_PROTOBUF=1 -DHAVE_PROTOBUF=1 -Xcompiler ,\"-fsigned-char\",\"-ffast-math\",\"-W\",\"-Wall\",\"-Werror=return-type\",\"-Werror=non-virtual-dtor\",\"-Werror=address\",\"-Werror=sequence-point\",\"-Wformat\",\"-Werror=format-security\",\"-Winit-self\",\"-Wpointer-arith\",\"-Wuninitialized\",\"-Winit-self\",\"-Wno-comment\",\"-Wno-strict-overflow\",\"-fdiagnostics-show-option\",\"-Wno-long-long\",\"-pthread\",\"-fno-omit-frame-pointer\",\"-pg\",\"-g\",\"-msse\",\"-msse2\",\"-msse3\",\"-fvisibility=hidden\",\"-fopenmp\",\"-Wno-deprecated\",\"-Wno-missing-declarations\",\"-Wno-shadow\",\"-Wno-unused-parameter\",\"-Wno-sign-compare\",\"-Wno-undef\",\"-Wno-invalid-offsetof\",\"-Wno-unused-but-set-variable\",\"-O3\",\"-DNDEBUG\",\"-DNDEBUG\" -gencode arch=compute_75,code=sm_75 -D_FORCE_INLINES --use_fast_math -Xcompiler -DCVAPI_EXPORTS -Xcompiler -fPIC --std=c++14 -DNVCC -I/usr/local/cuda/include -I....../opencv/build -I/usr/include/va -I/usr/include/eigen3/Eigen -I/usr/include/x86_64-linux-gnu -I....../opencv/3rdparty/include -I....../opencv/modules/dnn/include -I....../opencv/build/modules/dnn -I....../opencv_contrib/modules/cudev/include -I....../opencv/modules/core/include -I....../opencv/modules/imgproc/include -I/usr/local/include
....../opencv/modules/dnn/src/cuda/detection_output.cu: In instantiation of ?typename std::enable_if<(current != 0), void>::type cv::dnn::cuda4dnn::kernels::dispatch_decode_bboxes(int, Args&& ...) [with T = __half; long unsigned int current = 15; Args = {const cv::dnn::cuda4dnn::csl::Stream&, cv::dnn::cuda4dnn::csl::Span<__half>&, cv::dnn::cuda4dnn::csl::Span<const __half>&, cv::dnn::cuda4dnn::csl::Span<const __half>&, bool&, bool&, long unsigned int&, long unsigned int&, float&, float&}; typename std::enable_if<(current != 0), void>::type = void]?:
....../opencv/modules/dnn/src/cuda/detection_output.cu:704:32:   required from ?void cv::dnn::cuda4dnn::kernels::decode_bboxes(const cv::dnn::cuda4dnn::csl::Stream&, cv::dnn::cuda4dnn::csl::Span<T>, cv::dnn::cuda4dnn::csl::View<T>, cv::dnn::cuda4dnn::csl::View<T>, std::size_t, bool, std::size_t, bool, bool, bool, bool, bool, float, float) [with T = __half; cv::dnn::cuda4dnn::csl::View<T> = cv::dnn::cuda4dnn::csl::Span<const __half>; std::size_t = long unsigned int]?
....../opencv/modules/dnn/src/cuda/detection_output.cu:707:391:   required from here
....../opencv/modules/dnn/src/cuda/detection_output.cu:687:92: error: no matching function for call to ?launch_decode_boxes_kernel<__half, (15 & 8), (15 & 4), (15 & 2), (15 & 1)>(const cv::dnn::cuda4dnn::csl::Stream&, cv::dnn::cuda4dnn::csl::Span<__half>&, cv::dnn::cuda4dnn::csl::Span<const __half>&, cv::dnn::cuda4dnn::csl::Span<const __half>&, bool&, bool&, long unsigned int&, long unsigned int&, float&, float&)?
  687 |         launch_decode_boxes_kernel<T, current & 8, current & 4, current & 2, current & 1>(std::forward<Args>(args)...);
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~   
....../opencv/modules/dnn/src/cuda/detection_output.cu:666:1: note: candidate: ?template<class T, bool SHARE_LOCATION, bool VARIANCE_ENCODED_IN_TARGET, bool CORNER_TRUE_CENTER_FALSE, bool CLIP_BBOX> void cv::dnn::cuda4dnn::kernels::launch_decode_boxes_kernel(const cv::dnn::cuda4dnn::csl::Stream&, cv::dnn::cuda4dnn::csl::Span<T>, cv::dnn::cuda4dnn::csl::View<T>, cv::dnn::cuda4dnn::csl::View<T>, bool, bool, cv::dnn::cuda4dnn::csl::device::size_type, cv::dnn::cuda4dnn::csl::device::index_type, float, float)?
  666 | void launch_decode_boxes_kernel(const Stream& stream, Span<T> decoded_bboxes, View<T> locations, View<T> priors,
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~
....../opencv/modules/dnn/src/cuda/detection_output.cu:666:1: note:   template argument deduction/substitution failed:
....../opencv/modules/dnn/src/cuda/detection_output.cu:687:92: error: narrowing conversion of ?8? from ?long unsigned int? to ?bool? [-Wnarrowing]
  687 |         launch_decode_boxes_kernel<T, current & 8, current & 4, current & 2, current & 1>(std::forward<Args>(args)...);
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~   
....../opencv/modules/dnn/src/cuda/detection_output.cu:687:92: error: narrowing conversion of ?4? from ?long unsigned int? to ?bool? [-Wnarrowing]
....../opencv/modules/dnn/src/cuda/detection_output.cu:687:92: error: narrowing conversion of ?2? from ?long unsigned int? to ?bool? [-Wnarrowing]
....../opencv/modules/dnn/src/cuda/detection_output.cu: In instantiation of ?typename std::enable_if<(current != 0), void>::type cv::dnn::cuda4dnn::kernels::dispatch_decode_bboxes(int, Args&& ...) [with T = float; long unsigned int current = 15; Args = {const cv::dnn::cuda4dnn::csl::Stream&, cv::dnn::cuda4dnn::csl::Span<float>&, cv::dnn::cuda4dnn::csl::Span<const float>&, cv::dnn::cuda4dnn::csl::Span<const float>&, bool&, bool&, long unsigned int&, long unsigned int&, float&, float&}; typename std::enable_if<(current != 0), void>::type = void]?:
....../opencv/modules/dnn/src/cuda/detection_output.cu:704:32:   required from ?void cv::dnn::cuda4dnn::kernels::decode_bboxes(const cv::dnn::cuda4dnn::csl::Stream&, cv::dnn::cuda4dnn::csl::Span<T>, cv::dnn::cuda4dnn::csl::View<T>, cv::dnn::cuda4dnn::csl::View<T>, std::size_t, bool, std::size_t, bool, bool, bool, bool, bool, float, float) [with T = float; cv::dnn::cuda4dnn::csl::View<T> = cv::dnn::cuda4dnn::csl::Span<const float>; std::size_t = long unsigned int]?
....../opencv/modules/dnn/src/cuda/detection_output.cu:708:388:   required from here
....../opencv/modules/dnn/src/cuda/detection_output.cu:687:92: error: no matching function for call to ?launch_decode_boxes_kernel<float, (15 & 8), (15 & 4), (15 & 2), (15 & 1)>(const cv::dnn::cuda4dnn::csl::Stream&, cv::dnn::cuda4dnn::csl::Span<float>&, cv::dnn::cuda4dnn::csl::Span<const float>&, cv::dnn::cuda4dnn::csl::Span<const float>&, bool&, bool&, long unsigned int&, long unsigned int&, float&, float&)?
....../opencv/modules/dnn/src/cuda/detection_output.cu:666:1: note: candidate: ?template<class T, bool SHARE_LOCATION, bool VARIANCE_ENCODED_IN_TARGET, bool CORNER_TRUE_CENTER_FALSE, bool CLIP_BBOX> void cv::dnn::cuda4dnn::kernels::launch_decode_boxes_kernel(const cv::dnn::cuda4dnn::csl::Stream&, cv::dnn::cuda4dnn::csl::Span<T>, cv::dnn::cuda4dnn::csl::View<T>, cv::dnn::cuda4dnn::csl::View<T>, bool, bool, cv::dnn::cuda4dnn::csl::device::size_type, cv::dnn::cuda4dnn::csl::device::index_type, float, float)?
  666 | void launch_decode_boxes_kernel(const Stream& stream, Span<T> decoded_bboxes, View<T> locations, View<T> priors,
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~
....../opencv/modules/dnn/src/cuda/detection_output.cu:666:1: note:   template argument deduction/substitution failed:
....../opencv/modules/dnn/src/cuda/detection_output.cu:687:92: error: narrowing conversion of ?8? from ?long unsigned int? to ?bool? [-Wnarrowing]
  687 |         launch_decode_boxes_kernel<T, current & 8, current & 4, current & 2, current & 1>(std::forward<Args>(args)...);
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~   
....../opencv/modules/dnn/src/cuda/detection_output.cu:687:92: error: narrowing conversion of ?4? from ?long unsigned int? to ?bool? [-Wnarrowing]
....../opencv/modules/dnn/src/cuda/detection_output.cu:687:92: error: narrowing conversion of ?2? from ?long unsigned int? to ?bool? [-Wnarrowing]
....../opencv/modules/dnn/src/cuda/detection_output.cu:685:1: warning: ?typename std::enable_if<(current != 0), void>::type cv::dnn::cuda4dnn::kernels::dispatch_decode_bboxes(int, Args&& ...) [with T = float; long unsigned int current = 14; Args = {const cv::dnn::cuda4dnn::csl::Stream&, cv::dnn::cuda4dnn::csl::Span<float>&, cv::dnn::cuda4dnn::csl::Span<const float>&, cv::dnn::cuda4dnn::csl::Span<const float>&, bool&, bool&, long unsigned int&, long unsigned int&, float&, float&}]? used but never defined
  685 | ::type dispatch_decode_bboxes(int selector, Args&& ...args) {
      | ^~~~~~~~~~~~~~~~~~~~~~
....../opencv/modules/dnn/src/cuda/detection_output.cu:685:1: warning: ?typename std::enable_if<(current != 0), void>::type cv::dnn::cuda4dnn::kernels::dispatch_decode_bboxes(int, Args&& ...) [with T = __half; long unsigned int current = 14; Args = {const cv::dnn::cuda4dnn::csl::Stream&, cv::dnn::cuda4dnn::csl::Span<__half>&, cv::dnn::cuda4dnn::csl::Span<const __half>&, cv::dnn::cuda4dnn::csl::Span<const __half>&, bool&, bool&, long unsigned int&, long unsigned int&, float&, float&}]? used but never defined
-- Removing ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_detection_output.cu.o
/snap/cmake/513/bin/cmake -E rm -f ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_detection_output.cu.o
CMake Error at cuda_compile_1_generated_detection_output.cu.o.Release.cmake:280 (message):
  Error generating file
  ....../opencv/build/modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_detection_output.cu.o


make[2]: *** [modules/dnn/CMakeFiles/opencv_dnn.dir/build.make:3057: modules/dnn/CMakeFiles/cuda_compile_1.dir/src/cuda/cuda_compile_1_generated_detection_output.cu.o] Error 1
make[2]: Leaving directory '....../opencv/build'
make[1]: *** [CMakeFiles/Makefile2:4371: modules/dnn/CMakeFiles/opencv_dnn.dir/all] Error 2
make[1]: Leaving directory '....../opencv/build'
make: *** [Makefile:185: all] Error 2
@jiapei100 jiapei100 changed the title Failed against cudnn 8.0.2 Failed to build against cudnn 8.0.2 Aug 4, 2020
@jiapei100
Copy link
Author

Here we are....

A lot of people have compalined gcc 9.3. Therefore, I tried clang10, which successfully built opencv4.4.0-dev, after some trivial modification (Refer to #17952 ), and the file modified is gkernel.hpp

@YashasSamaga
Copy link
Contributor

YashasSamaga commented Aug 9, 2020

I have reproduced this problem. It's happening in gcc 9 only. (compiles without errors in gcc 8 and below).

The following patch fixes the problem.

- launch_decode_boxes_kernel<T, current & 8, current & 4, current & 2, current & 1>(std::forward<Args>(args)...); 
+ launch_decode_boxes_kernel<T,
                                   static_cast<bool>(current & 8),
                                   static_cast<bool>(current & 4),
                                   static_cast<bool>(current & 2),
                                   static_cast<bool>(current & 1)
                                  >(std::forward<Args>(args)...);

@YashasSamaga
Copy link
Contributor

YashasSamaga commented Aug 9, 2020

Therefore, I tried clang10, which successfully built opencv4.4.0-dev, after some trivial modification (Refer to #17952 ), and the file modified is gkernel.hpp

clang10 is throwing the same error.

MCVE: https://godbolt.org/

MSVC and gcc 8 and below have no problem but all clang versions have a problem. clang is much more pedantic than gcc.

@jiapei100
Copy link
Author

@YashasSamaga

Hi, your solution must be correct.
I successfully built OpenCV-4.4.0 Release yesterday. It seems current git version needs your solution...
I just release my clang-10 successfully compiled a version of OpenCL instead of CUDA.

@YashasSamaga
Copy link
Contributor

I think this issue should stay open. It's a bug. gcc 9 and clang are correct about the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants