Skip to content
Permalink
Browse files

Add HOROVOD_CUDA_HOME documentation (#911)

* Add HOROVOD_CUDA_HOME documentation

Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>

* Add `HOROVOD_CUDA_INCLUDE` and `HOROVOD_CUDA_LIB`

Signed-off-by: Alex Sergeev <alsrgv@users.noreply.github.com>

* Copyedits

Signed-off-by: Alex Sergeev <alexander.sergeev@live.com>
  • Loading branch information...
alsrgv committed Mar 13, 2019
1 parent 088db5a commit 1b3bb7dcc15c2380f70c584133f20e49b92becd9
Showing with 18 additions and 0 deletions.
  1. +18 −0 docs/troubleshooting.md
@@ -310,3 +310,21 @@ will likely cause NCCL and MPI to fail. In order to disable IPC in NCCL and MPI
memory, use:
* `export NCCL_P2P_DISABLE=1` for NCCL.
* `--mca btl_smcuda_use_cuda_ipc 0` flag for OpenMPI and similar flags for other vendors.

### libcudart.so.X.Y: cannot open shared object file: No such file or directory

If you notice that your program crashes with a `libcudart.so.X.Y: cannot open shared object file: No such file or directory` error, it's likely that your framework and Horovod were build with different versions of CUDA.

To build Horovod with a specific CUDA version, use the `HOROVOD_CUDA_HOME` environment variable during installation:

```bash
$ pip uninstall -y horovod
$ HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_NCCL_HOME=/path/to/nccl HOROVOD_CUDA_HOME=/path/to/cuda pip install --no-cache-dir horovod
```

Alternatively, you can use the `HOROVOD_CUDA_INCLUDE` and `HOROVOD_CUDA_LIB` environment variables to specify the CUDA library to use:

```bash
$ pip uninstall -y horovod
$ HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_NCCL_HOME=/path/to/nccl HOROVOD_CUDA_INCLUDE=/path/to/cuda/include HOROVOD_CUDA_LIB=/path/to/cuda/lib64 pip install --no-cache-dir horovod
```

0 comments on commit 1b3bb7d

Please sign in to comment.
You can’t perform that action at this time.