Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda4dnn: fix version check diagnostic #19222

Merged

Conversation

YashasSamaga
Copy link
Contributor

@YashasSamaga YashasSamaga commented Dec 26, 2020

The checkVersions() that was introduced in #17788 was partially useless since a version mismatch (that could create problems) would throw an exception before checkVersions().

The FP16 and device compatibility checks have been removed from BackendRegistry. BackendRegistry creates a static local object. Hence, subsequent calls to get targets with device id changed can return the wrong set of targets. These checks are instead done in initCUDA now. The target will switch to FP32 if FP16 is not supported.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu-cuda-cc52:18.04
Xbuild_image:Custom=ubuntu-cuda:18.04

@YashasSamaga YashasSamaga changed the title cuda4dnn: fix version check diagnostic [WIP] cuda4dnn: fix version check diagnostic Dec 27, 2020
@YashasSamaga YashasSamaga marked this pull request as draft December 27, 2020 09:47
@crackwitz
Copy link
Contributor

crackwitz commented Feb 3, 2021

I believe this is related, but might not be: https://forum.opencv.org/t/dnn-gpu-broken-cuda-issues-pls-help/1298
someone reported this:
checkVersions CUDART version 11020 reported by cuDNN 8100 does not match with the version reported by CUDART 11000

@YashasSamaga
Copy link
Contributor Author

YashasSamaga commented Feb 4, 2021

I believe this is related, but might not be: https://forum.opencv.org/t/dnn-gpu-broken-cuda-issues-pls-help/1298
someone reported this:
checkVersions CUDART version 11020 reported by cuDNN 8100 does not match with the version reported by CUDART 11000

This is mostly because of the new compatibility system and versioning scheme since CUDA 11.1. I will look into it and make appropriate changes here since it's related.

First introduced in CUDA 11.1, CUDA Enhanced Compatibility provides two benefits:

By leveraging semantic versioning across components in the CUDA Toolkit, an application can be built for one CUDA minor release (such as 11.1) and work across all future minor releases within the major family (such as 11.x). 

By leveraging the semantic versioning starting with CUDA 11, components in the CUDA Toolkit will remain binary compatible across the minor versions of the toolkit. In order to maintain binary compatibility across minor versions, the CUDA runtime no longer bumps up the minimum driver version required for every minor release - this only happens when a major release is shipped. This feature is called CUDA Enhanced Compatibility.

References: https://docs.nvidia.com/deploy/cuda-compatibility/index.html

@asmorkalov
Copy link
Contributor

@YashasSamaga Friendly reminder about the patch.

@YashasSamaga YashasSamaga force-pushed the cuda4dnn-fix-build-diagnostics branch 2 times, most recently from 3a1c0f7 to c42543a Compare March 6, 2021 13:25
@YashasSamaga YashasSamaga changed the title [WIP] cuda4dnn: fix version check diagnostic cuda4dnn: fix version check diagnostic Mar 6, 2021
@YashasSamaga YashasSamaga force-pushed the cuda4dnn-fix-build-diagnostics branch from c42543a to d0fe6ad Compare March 6, 2021 13:33
{
backends.push_back(std::make_pair(DNN_BACKEND_CUDA, DNN_TARGET_CUDA));
if (cuda4dnn::doesDeviceSupportFP16())
backends.push_back(std::make_pair(DNN_BACKEND_CUDA, DNN_TARGET_CUDA_FP16));
backends.push_back(std::make_pair(DNN_BACKEND_CUDA, DNN_TARGET_CUDA_FP16));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a breaking change. The issue is that BackendRegistry is a singleton object but the available targets for CUDA depend on the current device that has been selected.

If the user initially used a device without FP16 support and then switches to a device with FP16 support, FP16 target won't be returned with getAvailableTargets since the registry is initialized when the device without FP16 support was present.

So now it always returns both targets and there is a fallback to FP32 in initCUDABackend if FP16 isn't supported.

@YashasSamaga YashasSamaga marked this pull request as ready for review March 6, 2021 13:40
Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@opencv-pushbot opencv-pushbot merged commit fbb38cc into opencv:master Mar 10, 2021
@alalek alalek mentioned this pull request Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: dnn category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants