HPCG Cuda Binary with MPI support not working properly for multiple hosts #65

Pl4tiNuM · 2020-09-04T05:56:50Z

Hello,

I am trying to run HPCG with cuda support using MPI on multiple hosts. Specifically, I use the binary found in the website (https://www.hpcg-benchmark.org/software/view.html?id=267).

I have setup my cluster with all the required libraries and am able to run the benchmark on one node. The problem is when I try to use multiple MPI hosts. Instructions say that in order to run with multiple hosts (e.g. 2 hosts with 2 GPUs each), we have to issue a command like below:

mpirun -np 4 -hostfile hosts2 ./xhpcg-3.1_gcc_485_cuda-10.0.130_ompi-3.1.0_sm_35_sm_50_sm_60_sm_70_sm_75_ver_10_9_18

where hosts2 looks like this:

mpi-worker-0
mpi-worker-1

However, when I issue the command like the above, all the processes (4) are deployed on the first MPI host found in the hosts2 file (i.e. mpi-worker-0 in this case) and none is deployed on the second one.

Is there anything I can do?

Thanks in advance,
Dimosthenis

The text was updated successfully, but these errors were encountered:

viniciusferrao · 2021-07-12T04:36:32Z

Just add slots=2 after each line on the hosts file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPCG Cuda Binary with MPI support not working properly for multiple hosts #65

HPCG Cuda Binary with MPI support not working properly for multiple hosts #65

Pl4tiNuM commented Sep 4, 2020

viniciusferrao commented Jul 12, 2021

HPCG Cuda Binary with MPI support not working properly for multiple hosts #65

HPCG Cuda Binary with MPI support not working properly for multiple hosts #65

Comments

Pl4tiNuM commented Sep 4, 2020

viniciusferrao commented Jul 12, 2021