-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not pass unit test cuda.space
without a GT 720
#25
Comments
This is weird. I don't see anything obvious wrong. Let me try and reproduce. |
Oh I have an idea. Can you add CUDA_LAUNCH_BLOCKING=1 as an environment variable. This is what might happen: if you force UVM usage you need to generally fence kernels. There are tests which don't do that because they test asynchronous behaviour. The reason that you need to fence is related to the coherency protocol of the CUDA driver. Now if you have two devices CUDA actually doesn't do proper UVM, it falls back to zero copy mode (i.e. all uvm allocations are actually cuda host pinned allocations). At that point you don't need to fence anymore. Thats why you need two devices in your list so that you make CUDA use its fallback. CUDA_LAUNCH_BLOCKING=1 is the way to make that work. Let me know how that goes. |
Btw. if this turns out to be correct: lets open another issue for improved documentation in the Programming guide while closing this one. |
I would agree that you need to add this to the Programming guide: there Irina
|
Hi, I had already set So it doesn't work after I set Is it possible that the problem comes from my hardware? My machine has two CPU sockets. The K40c and K20c belong to the 1st CPU socket, and the GT 720 belongs to the 2nd socket. |
Sorry, I made a mistake. The machine I was using only has one CPU socket, and all three GPUs belong to that socket. |
Ok I can not reproduce the behaviour on my machine with 3 GPUs using your configure line. Can you give more info on your conifguration? Output of nvidia-smi, nvcc --help, gcc --version, your OS etc. ? |
The OS is Arch linux. The following txt files show the information of the machine I compiled and ran the unit tests. gcc_version.txt, nvcc_version.txt, nvcc_help.txt, nvdia-smi.txt, hwloc_lstopo.txt, deviceQuery.txt I just tried the unit tests on another machine, which has only one GT 720. That machine updated its nvidia driver to 352.21 just now. And all unit tests passed on that machine ! (I remember on this machine, before updating the driver, the unit tests also got stuck at I'll ask the administrator to update nvidia driver and then try the unit tests again. |
Can you post the text from those things directly? DropBox seems to be blocked from Sandia. At least it doesn't allow me to access those files from here. |
Sure.
nvcc --help and other commands' output are pretty long, so I put them in a GitHub repository. Hope you can access them.
deviceQuery: |
OK nothing comes to mind what could cause this. You could run with cuda-gdb the individual test and see if you get anymore info (i.e. which line does it crash in etc.). I can't reproduce it here on my machine (which also has 3 GPUs). I saw that you earlier said you are on Arch linux. Does that mean you are actually on the distro archlinux? If so I am not sure how well supported such hardware close features as UVM are. I know that they have issues on Fedora occasionally with it. And Fedora actually shows up as a supported distro on NVIDIAs website. Sorry. |
Now I can confirm the problem comes from the nvidia driver. After upgrading the nvidia driver to 352.21, the problem resolved. Probably there are some problems in the old-version nvidia driver provided by archlinux official repository. Thanks for your help! |
Great that it got solved. Closing the issue now. |
Add parallel_reduce dynamic sized elements of View
Hi,
I use the following commands to build Kokkos (with UVM enabled):
There are three CUDA devices on my machine:
I use the environment variable
CUDA_VISIBLE_DEVICES
to control which device I want to use. I found that only the cases that enable GT 720 plus one of the Tesla cards (or both Tesla cards) can pass the unit tests. (i.e.CUDA_VISIBLE_DEVICES=0,1,2
,0,2
, or1,2
)Other cases always get stuck at
cuda.space
.Did I miss something when building Kokkos?? Thanks!
The text was updated successfully, but these errors were encountered: