Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not pass unit test cuda.space without a GT 720 #25

Closed
piyueh opened this issue Jun 17, 2015 · 13 comments
Closed

Can not pass unit test cuda.space without a GT 720 #25

piyueh opened this issue Jun 17, 2015 · 13 comments

Comments

@piyueh
Copy link

piyueh commented Jun 17, 2015

Hi,

I use the following commands to build Kokkos (with UVM enabled):

$ mkdir build
$ cd build
$ ../generate_makefile.bash --with-cuda --kokkos-path=${HOME}/Downloads/kokkos --compiler=nvcc_wrapper --cxxflags="-DKOKKOS_USE_CUDA_UVM"
$ make build-test
$ make test

There are three CUDA devices on my machine:

  1. Device 0: Tesla K40c
  2. Device 1: Tesla K20c
  3. Device 2: GeForce GT 720

I use the environment variable CUDA_VISIBLE_DEVICES to control which device I want to use. I found that only the cases that enable GT 720 plus one of the Tesla cards (or both Tesla cards) can pass the unit tests. (i.e. CUDA_VISIBLE_DEVICES=0,1,2, 0,2, or 1,2)

Other cases always get stuck at cuda.space.

Did I miss something when building Kokkos?? Thanks!

@crtrott
Copy link
Member

crtrott commented Jun 17, 2015

This is weird. I don't see anything obvious wrong. Let me try and reproduce.

@crtrott
Copy link
Member

crtrott commented Jun 17, 2015

Oh I have an idea. Can you add CUDA_LAUNCH_BLOCKING=1 as an environment variable.

This is what might happen: if you force UVM usage you need to generally fence kernels. There are tests which don't do that because they test asynchronous behaviour. The reason that you need to fence is related to the coherency protocol of the CUDA driver. Now if you have two devices CUDA actually doesn't do proper UVM, it falls back to zero copy mode (i.e. all uvm allocations are actually cuda host pinned allocations). At that point you don't need to fence anymore. Thats why you need two devices in your list so that you make CUDA use its fallback.

CUDA_LAUNCH_BLOCKING=1 is the way to make that work.

Let me know how that goes.

@crtrott
Copy link
Member

crtrott commented Jun 17, 2015

Btw. if this turns out to be correct: lets open another issue for improved documentation in the Programming guide while closing this one.

@ipdemes
Copy link

ipdemes commented Jun 17, 2015

I would agree that you need to add this to the Programming guide: there
were already several people ho had similar issue.

Irina
On 06/17/2015 01:21 PM, Christian Trott wrote:

Btw. if this turns out to be correct: lets open another issue for
improved documentation in the Programming guide while closing this one.


Reply to this email directly or view it on GitHub
#25 (comment).

@piyueh
Copy link
Author

piyueh commented Jun 17, 2015

Hi,

I had already set CUDA_LAUNCH_BLOCKING=1. Actually I had another problem before I set the CUDA_LAUNCH_BLOCKING. (issue #23 ).

So it doesn't work after I set CUDA_LAUNCH_BLOCKING=1.

Is it possible that the problem comes from my hardware? My machine has two CPU sockets. The K40c and K20c belong to the 1st CPU socket, and the GT 720 belongs to the 2nd socket.

@piyueh
Copy link
Author

piyueh commented Jun 17, 2015

Sorry, I made a mistake. The machine I was using only has one CPU socket, and all three GPUs belong to that socket.

@crtrott
Copy link
Member

crtrott commented Jun 17, 2015

Ok I can not reproduce the behaviour on my machine with 3 GPUs using your configure line.
Even without CUDA_LAUNCH_BLOCKING it crashes later, not in cuda.spaces. That one passes.

Can you give more info on your conifguration? Output of nvidia-smi, nvcc --help, gcc --version, your OS etc. ?

@piyueh
Copy link
Author

piyueh commented Jun 17, 2015

The OS is Arch linux. The following txt files show the information of the machine I compiled and ran the unit tests.

gcc_version.txt, nvcc_version.txt, nvcc_help.txt, nvdia-smi.txt, hwloc_lstopo.txt, deviceQuery.txt

I just tried the unit tests on another machine, which has only one GT 720. That machine updated its nvidia driver to 352.21 just now. And all unit tests passed on that machine ! (I remember on this machine, before updating the driver, the unit tests also got stuck at cuda.space. But I'm not pretty sure about this.)

I'll ask the administrator to update nvidia driver and then try the unit tests again.

@crtrott
Copy link
Member

crtrott commented Jun 17, 2015

Can you post the text from those things directly? DropBox seems to be blocked from Sandia. At least it doesn't allow me to access those files from here.

@piyueh
Copy link
Author

piyueh commented Jun 17, 2015

Sure.

gcc --version

gcc (GCC) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.```

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27

nvidia-smi
image

nvcc --help and other commands' output are pretty long, so I put them in a GitHub repository. Hope you can access them.

nvcc --help
https://github.com/piyueh/Temp/blob/master/nvcc_help.txt

lstopo-no-graphics
https://github.com/piyueh/Temp/blob/master/hwloc_lstopo.txt

deviceQuery:
https://github.com/piyueh/Temp/blob/master/deviceQuery.txt

@crtrott
Copy link
Member

crtrott commented Jun 17, 2015

OK nothing comes to mind what could cause this. You could run with cuda-gdb the individual test and see if you get anymore info (i.e. which line does it crash in etc.).

I can't reproduce it here on my machine (which also has 3 GPUs). I saw that you earlier said you are on Arch linux. Does that mean you are actually on the distro archlinux?

If so I am not sure how well supported such hardware close features as UVM are. I know that they have issues on Fedora occasionally with it. And Fedora actually shows up as a supported distro on NVIDIAs website. Sorry.

@piyueh
Copy link
Author

piyueh commented Jun 17, 2015

Now I can confirm the problem comes from the nvidia driver. After upgrading the nvidia driver to 352.21, the problem resolved.

Probably there are some problems in the old-version nvidia driver provided by archlinux official repository.

Thanks for your help!

@crtrott
Copy link
Member

crtrott commented Jun 17, 2015

Great that it got solved. Closing the issue now.

@crtrott crtrott closed this as completed Jun 17, 2015
Rombur added a commit to Rombur/kokkos that referenced this issue Jan 9, 2020
Add parallel_reduce dynamic sized elements of View
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants