Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading Kokkos::Cuda::initialize ERROR message when compiled for wrong GPU architecture #1944

Closed
wcohen opened this issue Dec 18, 2018 · 13 comments
Assignees
Labels
Enhancement Improve existing capability; will potentially require voting

Comments

@wcohen
Copy link

wcohen commented Dec 18, 2018

I built the kokkos-tutorials Intro-Short/Exercises/02/Solution on a Lenovo P50 laptop with a Maxwell based Quadro M1000M GPU running RHEL7. When run it provides the following misleading error message:

Kokkos::Cuda::initialize ERROR: running kernels compiled for compute capability 0.0 (< 5.0) on device with compute capability 5.0 (>=5.0), this would give incorrect results!
Aborted (core dumped)

The problem is caused by KOKKOS_ARCH set to Volta70 in the Makefile. If KOKKOS_ARCH is changed to Maxwell, the example works properly on the machine. Shouldn't the error message be stating the the kernel has been compiled for 7.0 rather than 0.0?

@dsunder
Copy link
Contributor

dsunder commented Dec 18, 2018

The problem is that Kokkos runs a kernel to detect the architecture the code was compiled for and the kernel fails to to launch so the architecture is detected incorrectly. The error message is misleading and we should investigate if there is a better way to detect the value of __CUDA__ARCH__ without running a kernel. At a minimum we can improve the error message to say that we were unable to determine the architecture.

@crtrott crtrott added the Question For Kokkos internal and external contributors and users label Dec 19, 2018
@ndellingwood ndellingwood added this to the 2019 April milestone Feb 7, 2019
@ian-bertolacci
Copy link

Whats the status of this?
I am getting a similar error:

Kokkos::Cuda::initialize ERROR: running kernels compiled for compute capability 3.5 (< 5.0) on device with compute capability 6.1 (>=5.0), this would give incorrect results!

Output from CUDA deviceQuery:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 8114 MBytes (8508145664 bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP:     2560 CUDA Cores
  GPU Max Clock rate:                            1797 MHz (1.80 GHz)
  Memory Clock rate:                             5005 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1080
Result = PASS

Are there work-arounds?

@dsunder
Copy link
Contributor

dsunder commented Jul 11, 2019

@ian-bertolacci To avoid this error you need to correctly set the architecture flag when you configure Kokkos. For a Geforce GTX 1080 the architecture flag should be set to Pascal.

@ian-bertolacci
Copy link

@dsunder Brilliant! Fixed my blocking issues. Thank you so much for your help.

@crtrott crtrott added Enhancement Improve existing capability; will potentially require voting and removed Question For Kokkos internal and external contributors and users labels Aug 21, 2019
@crtrott crtrott removed this from the 2019 April milestone Aug 21, 2019
@dsunder dsunder added this to the Tentative 3.1 Release milestone Sep 4, 2019
@crtrott
Copy link
Member

crtrott commented Mar 12, 2020

Fixed the error and warning messages. Note I also strengthened the error criteria. If you compile for X.Y and you run on M.N it will error out for X!=M||Y<N - which is what CUDA requires technically. I.e. you can run code compiled for a older minor architecture revision on a newer minor revision but you can't run across major revisions.

@mhoemmen
Copy link
Contributor

@micahahoward @rrdrake @sebrowne @vbrunini FYI this will affect Trilinos' CMake options, once the changes hit Trilinos. We'll need more platform specificity on the architecture choice.

@sebrowne
Copy link
Contributor

@micahahoward @rrdrake @sebrowne @vbrunini FYI this will affect Trilinos' CMake options, once the changes hit Trilinos. We'll need more platform specificity on the architecture choice.

Meaning "Pascal60, Volta70, etc.?"

@mhoemmen
Copy link
Contributor

@sebrowne wrote:

Meaning "Pascal60, Volta70, etc.?"

Yup -- those warnings will become errors. It may just mean more Trilinos build scripts and/or module options.

@sebrowne
Copy link
Contributor

Awesome, thanks

@crtrott crtrott closed this as completed Apr 14, 2020
@jwwtc
Copy link

jwwtc commented Nov 2, 2022

@ian-bertolacci To avoid this error you need to correctly set the architecture flag when you configure Kokkos. For a Geforce GTX 1080 the architecture flag should be set to Pascal.

How can I find the correct architecture flag?
For example, is there a proper flag for Quadro RTX 4000?

Thank you.

@dalg24
Copy link
Member

dalg24 commented Nov 2, 2022

How can I find the correct architecture flag? For example, is there a proper flag for Quadro RTX 4000?

The NVIDIA specs for that model tell you it has the Turing architecture. You can also search for its "compute capability" which would tel you "7.5"
Then if you look at the available arch options in Kokkos for NVIDIA GPUs you will find Kokkos_ARCH_TURING75

@jwwtc
Copy link

jwwtc commented Feb 11, 2023

@dalg24 Thanks. And AMPERE80 is the Kokkos_ARCH for A100, right?

@masterleinad
Copy link
Contributor

@dalg24 Thanks. And AMPERE80 is the Kokkos_ARCH for A100, right?

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improve existing capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

10 participants