cuda4dnn: fix version check diagnostic #19222

YashasSamaga · 2020-12-26T17:07:27Z

The checkVersions() that was introduced in #17788 was partially useless since a version mismatch (that could create problems) would throw an exception before checkVersions().

The FP16 and device compatibility checks have been removed from BackendRegistry. BackendRegistry creates a static local object. Hence, subsequent calls to get targets with device id changed can return the wrong set of targets. These checks are instead done in initCUDA now. The target will switch to FP32 if FP16 is not supported.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
The PR is proposed to proper branch
There is reference to original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

force_builders=Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu-cuda-cc52:18.04
Xbuild_image:Custom=ubuntu-cuda:18.04

crackwitz · 2021-02-03T19:23:28Z

I believe this is related, but might not be: https://forum.opencv.org/t/dnn-gpu-broken-cuda-issues-pls-help/1298
someone reported this:
checkVersions CUDART version 11020 reported by cuDNN 8100 does not match with the version reported by CUDART 11000

YashasSamaga · 2021-02-04T06:42:48Z

I believe this is related, but might not be: https://forum.opencv.org/t/dnn-gpu-broken-cuda-issues-pls-help/1298
someone reported this:
checkVersions CUDART version 11020 reported by cuDNN 8100 does not match with the version reported by CUDART 11000

This is mostly because of the new compatibility system and versioning scheme since CUDA 11.1. I will look into it and make appropriate changes here since it's related.

First introduced in CUDA 11.1, CUDA Enhanced Compatibility provides two benefits:

By leveraging semantic versioning across components in the CUDA Toolkit, an application can be built for one CUDA minor release (such as 11.1) and work across all future minor releases within the major family (such as 11.x).

By leveraging the semantic versioning starting with CUDA 11, components in the CUDA Toolkit will remain binary compatible across the minor versions of the toolkit. In order to maintain binary compatibility across minor versions, the CUDA runtime no longer bumps up the minimum driver version required for every minor release - this only happens when a major release is shipped. This feature is called CUDA Enhanced Compatibility.

References: https://docs.nvidia.com/deploy/cuda-compatibility/index.html

asmorkalov · 2021-02-25T14:10:06Z

@YashasSamaga Friendly reminder about the patch.

YashasSamaga · 2021-03-06T13:38:45Z

modules/dnn/src/dnn.cpp

        {
            backends.push_back(std::make_pair(DNN_BACKEND_CUDA, DNN_TARGET_CUDA));
-            if (cuda4dnn::doesDeviceSupportFP16())
-                backends.push_back(std::make_pair(DNN_BACKEND_CUDA, DNN_TARGET_CUDA_FP16));
+            backends.push_back(std::make_pair(DNN_BACKEND_CUDA, DNN_TARGET_CUDA_FP16));


This might be a breaking change. The issue is that BackendRegistry is a singleton object but the available targets for CUDA depend on the current device that has been selected.

If the user initially used a device without FP16 support and then switches to a device with FP16 support, FP16 target won't be returned with getAvailableTargets since the registry is initialized when the device without FP16 support was present.

So now it always returns both targets and there is a fallback to FP32 in initCUDABackend if FP16 isn't supported.

alalek

Thank you!

YashasSamaga changed the title ~~cuda4dnn: fix version check diagnostic~~ [WIP] cuda4dnn: fix version check diagnostic Dec 27, 2020

YashasSamaga marked this pull request as draft December 27, 2020 09:47

YashasSamaga force-pushed the cuda4dnn-fix-build-diagnostics branch 2 times, most recently from 3a1c0f7 to c42543a Compare March 6, 2021 13:25

YashasSamaga changed the title ~~[WIP] cuda4dnn: fix version check diagnostic~~ cuda4dnn: fix version check diagnostic Mar 6, 2021

fix checkVersions()

d0fe6ad

YashasSamaga force-pushed the cuda4dnn-fix-build-diagnostics branch from c42543a to d0fe6ad Compare March 6, 2021 13:33

YashasSamaga commented Mar 6, 2021

View reviewed changes

YashasSamaga marked this pull request as ready for review March 6, 2021 13:40

alalek approved these changes Mar 10, 2021

View reviewed changes

opencv-pushbot merged commit fbb38cc into opencv:master Mar 10, 2021

alalek mentioned this pull request Apr 9, 2021

(5.x) Merge 4.x #19885

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda4dnn: fix version check diagnostic #19222

cuda4dnn: fix version check diagnostic #19222

YashasSamaga commented Dec 26, 2020 •

edited

crackwitz commented Feb 3, 2021 •

edited

YashasSamaga commented Feb 4, 2021 •

edited

asmorkalov commented Feb 25, 2021

YashasSamaga Mar 6, 2021 •

edited

alalek left a comment

Navigation Menu

cuda4dnn: fix version check diagnostic #19222

cuda4dnn: fix version check diagnostic #19222

Conversation

YashasSamaga commented Dec 26, 2020 • edited

Pull Request Readiness Checklist

crackwitz commented Feb 3, 2021 • edited

YashasSamaga commented Feb 4, 2021 • edited

asmorkalov commented Feb 25, 2021

YashasSamaga Mar 6, 2021 • edited

Choose a reason for hiding this comment

alalek left a comment

Choose a reason for hiding this comment

YashasSamaga commented Dec 26, 2020 •

edited

crackwitz commented Feb 3, 2021 •

edited

YashasSamaga commented Feb 4, 2021 •

edited

YashasSamaga Mar 6, 2021 •

edited