Can not pass unit test `cuda.space` without a GT 720 #25

piyueh · 2015-06-17T18:17:04Z

Hi,

I use the following commands to build Kokkos (with UVM enabled):

$ mkdir build
$ cd build
$ ../generate_makefile.bash --with-cuda --kokkos-path=${HOME}/Downloads/kokkos --compiler=nvcc_wrapper --cxxflags="-DKOKKOS_USE_CUDA_UVM"
$ make build-test
$ make test

There are three CUDA devices on my machine:

Device 0: Tesla K40c
Device 1: Tesla K20c
Device 2: GeForce GT 720

I use the environment variable CUDA_VISIBLE_DEVICES to control which device I want to use. I found that only the cases that enable GT 720 plus one of the Tesla cards (or both Tesla cards) can pass the unit tests. (i.e. CUDA_VISIBLE_DEVICES=0,1,2, 0,2, or 1,2)

Other cases always get stuck at cuda.space.

Did I miss something when building Kokkos?? Thanks!

The text was updated successfully, but these errors were encountered:

crtrott · 2015-06-17T19:08:17Z

This is weird. I don't see anything obvious wrong. Let me try and reproduce.

crtrott · 2015-06-17T19:17:02Z

Oh I have an idea. Can you add CUDA_LAUNCH_BLOCKING=1 as an environment variable.

This is what might happen: if you force UVM usage you need to generally fence kernels. There are tests which don't do that because they test asynchronous behaviour. The reason that you need to fence is related to the coherency protocol of the CUDA driver. Now if you have two devices CUDA actually doesn't do proper UVM, it falls back to zero copy mode (i.e. all uvm allocations are actually cuda host pinned allocations). At that point you don't need to fence anymore. Thats why you need two devices in your list so that you make CUDA use its fallback.

CUDA_LAUNCH_BLOCKING=1 is the way to make that work.

Let me know how that goes.

crtrott · 2015-06-17T19:21:02Z

Btw. if this turns out to be correct: lets open another issue for improved documentation in the Programming guide while closing this one.

ipdemes · 2015-06-17T19:37:40Z

I would agree that you need to add this to the Programming guide: there
were already several people ho had similar issue.

Irina
On 06/17/2015 01:21 PM, Christian Trott wrote:

Btw. if this turns out to be correct: lets open another issue for
improved documentation in the Programming guide while closing this one.

—
Reply to this email directly or view it on GitHub
#25 (comment).

piyueh · 2015-06-17T19:38:35Z

Hi,

I had already set CUDA_LAUNCH_BLOCKING=1. Actually I had another problem before I set the CUDA_LAUNCH_BLOCKING. (issue #23 ).

So it doesn't work after I set CUDA_LAUNCH_BLOCKING=1.

Is it possible that the problem comes from my hardware? My machine has two CPU sockets. The K40c and K20c belong to the 1st CPU socket, and the GT 720 belongs to the 2nd socket.

piyueh · 2015-06-17T19:51:34Z

Sorry, I made a mistake. The machine I was using only has one CPU socket, and all three GPUs belong to that socket.

crtrott · 2015-06-17T20:10:09Z

Ok I can not reproduce the behaviour on my machine with 3 GPUs using your configure line.
Even without CUDA_LAUNCH_BLOCKING it crashes later, not in cuda.spaces. That one passes.

Can you give more info on your conifguration? Output of nvidia-smi, nvcc --help, gcc --version, your OS etc. ?

piyueh · 2015-06-17T21:11:34Z

The OS is Arch linux. The following txt files show the information of the machine I compiled and ran the unit tests.

gcc_version.txt, nvcc_version.txt, nvcc_help.txt, nvdia-smi.txt, hwloc_lstopo.txt, deviceQuery.txt

I just tried the unit tests on another machine, which has only one GT 720. That machine updated its nvidia driver to 352.21 just now. And all unit tests passed on that machine ! (I remember on this machine, before updating the driver, the unit tests also got stuck at cuda.space. But I'm not pretty sure about this.)

I'll ask the administrator to update nvidia driver and then try the unit tests again.

crtrott · 2015-06-17T21:15:54Z

Can you post the text from those things directly? DropBox seems to be blocked from Sandia. At least it doesn't allow me to access those files from here.

piyueh · 2015-06-17T21:31:18Z

Sure.

gcc --version

gcc (GCC) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.```

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27

nvidia-smi

nvcc --help and other commands' output are pretty long, so I put them in a GitHub repository. Hope you can access them.

nvcc --help
https://github.com/piyueh/Temp/blob/master/nvcc_help.txt

lstopo-no-graphics
https://github.com/piyueh/Temp/blob/master/hwloc_lstopo.txt

deviceQuery:
https://github.com/piyueh/Temp/blob/master/deviceQuery.txt

crtrott · 2015-06-17T21:53:39Z

OK nothing comes to mind what could cause this. You could run with cuda-gdb the individual test and see if you get anymore info (i.e. which line does it crash in etc.).

I can't reproduce it here on my machine (which also has 3 GPUs). I saw that you earlier said you are on Arch linux. Does that mean you are actually on the distro archlinux?

If so I am not sure how well supported such hardware close features as UVM are. I know that they have issues on Fedora occasionally with it. And Fedora actually shows up as a supported distro on NVIDIAs website. Sorry.

piyueh · 2015-06-17T22:18:45Z

Now I can confirm the problem comes from the nvidia driver. After upgrading the nvidia driver to 352.21, the problem resolved.

Probably there are some problems in the old-version nvidia driver provided by archlinux official repository.

Thanks for your help!

crtrott · 2015-06-17T22:30:10Z

Great that it got solved. Closing the issue now.

Add parallel_reduce dynamic sized elements of View

crtrott closed this as completed Jun 17, 2015

Rombur added a commit to Rombur/kokkos that referenced this issue Jan 9, 2020

Add parallel_reduce dynamic sized elements of View (kokkos#25)

ee536de

Add parallel_reduce dynamic sized elements of View

ndellingwood mentioned this issue Nov 4, 2020

Added "TRIBITS Pthread" to PTHREAD CMake configuration in Kokkos. #3558

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not pass unit test `cuda.space` without a GT 720 #25

Can not pass unit test `cuda.space` without a GT 720 #25

piyueh commented Jun 17, 2015

crtrott commented Jun 17, 2015

crtrott commented Jun 17, 2015

crtrott commented Jun 17, 2015

ipdemes commented Jun 17, 2015

piyueh commented Jun 17, 2015

piyueh commented Jun 17, 2015

crtrott commented Jun 17, 2015

piyueh commented Jun 17, 2015

crtrott commented Jun 17, 2015

piyueh commented Jun 17, 2015

crtrott commented Jun 17, 2015

piyueh commented Jun 17, 2015

crtrott commented Jun 17, 2015

Can not pass unit test cuda.space without a GT 720 #25

Can not pass unit test cuda.space without a GT 720 #25

Comments

piyueh commented Jun 17, 2015

crtrott commented Jun 17, 2015

crtrott commented Jun 17, 2015

crtrott commented Jun 17, 2015

ipdemes commented Jun 17, 2015

piyueh commented Jun 17, 2015

piyueh commented Jun 17, 2015

crtrott commented Jun 17, 2015

piyueh commented Jun 17, 2015

crtrott commented Jun 17, 2015

piyueh commented Jun 17, 2015

crtrott commented Jun 17, 2015

piyueh commented Jun 17, 2015

crtrott commented Jun 17, 2015

Can not pass unit test `cuda.space` without a GT 720 #25

Can not pass unit test `cuda.space` without a GT 720 #25