Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCG Cuda Binary with MPI support not working properly for multiple hosts #65

Open
Pl4tiNuM opened this issue Sep 4, 2020 · 1 comment

Comments

@Pl4tiNuM
Copy link

Pl4tiNuM commented Sep 4, 2020

Hello,

I am trying to run HPCG with cuda support using MPI on multiple hosts. Specifically, I use the binary found in the website (https://www.hpcg-benchmark.org/software/view.html?id=267).

I have setup my cluster with all the required libraries and am able to run the benchmark on one node. The problem is when I try to use multiple MPI hosts. Instructions say that in order to run with multiple hosts (e.g. 2 hosts with 2 GPUs each), we have to issue a command like below:

mpirun -np 4 -hostfile hosts2 ./xhpcg-3.1_gcc_485_cuda-10.0.130_ompi-3.1.0_sm_35_sm_50_sm_60_sm_70_sm_75_ver_10_9_18

where hosts2 looks like this:

mpi-worker-0
mpi-worker-1

However, when I issue the command like the above, all the processes (4) are deployed on the first MPI host found in the hosts2 file (i.e. mpi-worker-0 in this case) and none is deployed on the second one.

Is there anything I can do?

Thanks in advance,
Dimosthenis

@viniciusferrao
Copy link

Just add slots=2 after each line on the hosts file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants