OpenMPI needs to compiled from source with ./configure --with-cuda
"For Linux 64, Open MPI is built with CUDA awareness but this support is disabled by default.
To enable it, please set the environmental variable OMPI_MCA_opal_cuda_support=true
before
launching your MPI processes. Equivalently, you can set the MCA parameter in the command line:
mpiexec --mca opal_cuda_support 1
..."
We use mpi4py==3.1.0a0
features so you need to pip install https://github.com/mpi4py/mpi4py/archive/master.zip
if 3.1.0 isn't released yet.
https://stackoverflow.com/a/57938838
The short of it is
- create a
Python Debug Server
run configuration - install the
pydevd_pycharm
pip package (matching your PyCharm version) - check allow parallel runs
- run as many debug runs as mpi processes on the remote machine
- reverse ssh tunnel the ports that the debug server on your local machine starts on e.g.
ssh max@localhost -p2222 -R 65300:localhost:65300 -R 65303:localhost:65303
- put the
pydevd_pycharm.settrace("localhost", port=port_mapping[rank], stdoutToServer=True, stderrToServer=True)
where you want the breakpoint - run the script using
mpiexec
- map the source in the run configuration (need to map the specific files)
- you can only set one port in the run configuration and that prevents multiple parallel runs. better alternative (so you don't have to keep changing the ports in
set_trace
is to change the reverse tunnel instead)
if you get mpi4py.MPI.Exception: MPI_ERR_TRUNCATE: message truncated
you forgot to add dtype
to cupy.
https://www.open-mpi.org/faq/?category=runcuda#mpi-cuda-dev-opa for GPUDirect RDMA in OpenMPI