-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misleading Kokkos::Cuda::initialize ERROR message when compiled for wrong GPU architecture #1944
Comments
The problem is that Kokkos runs a kernel to detect the architecture the code was compiled for and the kernel fails to to launch so the architecture is detected incorrectly. The error message is misleading and we should investigate if there is a better way to detect the value of |
Whats the status of this?
Output from CUDA deviceQuery:
Are there work-arounds? |
@ian-bertolacci To avoid this error you need to correctly set the architecture flag when you configure Kokkos. For a Geforce GTX 1080 the architecture flag should be set to Pascal. |
@dsunder Brilliant! Fixed my blocking issues. Thank you so much for your help. |
Fixed the error and warning messages. Note I also strengthened the error criteria. If you compile for X.Y and you run on M.N it will error out for |
@micahahoward @rrdrake @sebrowne @vbrunini FYI this will affect Trilinos' CMake options, once the changes hit Trilinos. We'll need more platform specificity on the architecture choice. |
Meaning "Pascal60, Volta70, etc.?" |
@sebrowne wrote:
Yup -- those warnings will become errors. It may just mean more Trilinos build scripts and/or module options. |
Awesome, thanks |
How can I find the correct architecture flag? Thank you. |
The NVIDIA specs for that model tell you it has the Turing architecture. You can also search for its "compute capability" which would tel you "7.5" |
@dalg24 Thanks. And AMPERE80 is the Kokkos_ARCH for A100, right? |
Yes. |
I built the kokkos-tutorials Intro-Short/Exercises/02/Solution on a Lenovo P50 laptop with a Maxwell based Quadro M1000M GPU running RHEL7. When run it provides the following misleading error message:
Kokkos::Cuda::initialize ERROR: running kernels compiled for compute capability 0.0 (< 5.0) on device with compute capability 5.0 (>=5.0), this would give incorrect results!
Aborted (core dumped)
The problem is caused by KOKKOS_ARCH set to Volta70 in the Makefile. If KOKKOS_ARCH is changed to Maxwell, the example works properly on the machine. Shouldn't the error message be stating the the kernel has been compiled for 7.0 rather than 0.0?
The text was updated successfully, but these errors were encountered: