-
Notifications
You must be signed in to change notification settings - Fork 929
Description
This maybe not a real issue, but at least unexpected behaviour.
Testet with cuda 7 and openmpi 1.8.5.
Background: If using hosts with only on gpu equipped, you ever hardly use the cudaSetDevice statement in your code, as all cuda calls default to the only existing device (0).
When allocating cuda device memory in the host thread and calling MPI_Send / Recv from other threads, mpi crashes with CUDA: Error in cuMemcpy.
A "workaround" is, to set the cudaDevice in every thread explitly, which definitly is a good advise at all.
Nevertheless, the cuda documentation states, that every thread's cudaDevice defaults to device 0, if not changed by the user.
This statement holds true, for cuda api calls in different threads, but openmpi seems to do some dark magic under the hood, changing this default device.