Skip to content

correct syntax for mpmd on same node #13473

@ashterenli

Description

@ashterenli

My openmpi is:

$ ompi_info --version
Open MPI v4.1.7rc1

It comes from a spack installation.

What I want to do is to launch an mpmd job (2 binaries: progA and progB) with openmpi mpiexec
on several slurm nodes, such that there are N copies of progA and Q copies of progB on each node,
and each process with T openmp threads.

An illustration for nodes with 16 cores (0 to 15, no SMT).
I want to use 4 omp/rank, and have 3 ranks of progA, followed by 1 rank of progB, on each node.
So I want to achieve this placement:

host   rank          cores
                        111111
              0123456789012345
host0, rank0: AAAA
host0, rank1:     AAAA
host0, rank2:         AAAA
host0, rank3:             BBBB
host1, rank0: AAAA
host1, rank1:     AAAA
host1, rank2:         AAAA
host1, rank3:             BBBB

etc.

Others asked the same or similar questions before,
but it seems there is not clear answer, e.g:

  1. https://www.mail-archive.com/users@lists.open-mpi.org/msg34226.html

I tried both suggestions from that thread, but they did not help

  1. https://stackoverflow.com/questions/79023892/how-to-use-different-processes-per-resource-ppr-and-processing-elements-pe-f

The suggestion here was to use srun, but srun normally does not allow placing binaries separated by : on the same node.

I tried specifying -H: for each binary, and using --hostfile and --mca rmaps seq, but
never succeeding. The error is either

A sequential map was requested, but not enough node entries
were given to support the requested number of processes:

or

No nodes are available for this job, either due to a failure to
allocate nodes to the job, or allocated nodes being marked
as unavailable (e.g., down, rebooting, or a process attempting
to be relocated to another node when none are available).

depending on the exact syntax and numerical values.

Please help

I hope my description is clear, but if not, I'm happy to provide
specific examples I tried and the results.

Thank you

Anton

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions