-
Notifications
You must be signed in to change notification settings - Fork 929
Description
My openmpi is:
$ ompi_info --version
Open MPI v4.1.7rc1
It comes from a spack installation.
What I want to do is to launch an mpmd job (2 binaries: progA and progB) with openmpi mpiexec
on several slurm nodes, such that there are N copies of progA and Q copies of progB on each node,
and each process with T openmp threads.
An illustration for nodes with 16 cores (0 to 15, no SMT).
I want to use 4 omp/rank, and have 3 ranks of progA, followed by 1 rank of progB, on each node.
So I want to achieve this placement:
host rank cores
111111
0123456789012345
host0, rank0: AAAA
host0, rank1: AAAA
host0, rank2: AAAA
host0, rank3: BBBB
host1, rank0: AAAA
host1, rank1: AAAA
host1, rank2: AAAA
host1, rank3: BBBB
etc.
Others asked the same or similar questions before,
but it seems there is not clear answer, e.g:
I tried both suggestions from that thread, but they did not help
The suggestion here was to use srun, but srun normally does not allow placing binaries separated by : on the same node.
I tried specifying -H: for each binary, and using --hostfile and --mca rmaps seq, but
never succeeding. The error is either
A sequential map was requested, but not enough node entries
were given to support the requested number of processes:
or
No nodes are available for this job, either due to a failure to
allocate nodes to the job, or allocated nodes being marked
as unavailable (e.g., down, rebooting, or a process attempting
to be relocated to another node when none are available).
depending on the exact syntax and numerical values.
Please help
I hope my description is clear, but if not, I'm happy to provide
specific examples I tried and the results.
Thank you
Anton