Skip to content

Oversubscription error appears when changing to 5.0.8 (from 4.1.4) #13426

@skwde

Description

@skwde

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v5.0.8

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

5.0.8: micromamba using the conda-forge channel
4.1.4: spack

Please describe the system on which you are running

  • Operating system/version: rocky 8
  • Computer hardware:
  • Network type: ib / high speed ethernet

Details of the problem

Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.

Note: If you include verbatim output (or a code block), please use a GitHub Markdown code block like below:

In a slurm job I ask for 2 tasks and 8 cpus per task, making in total 16 cpus available.

While with version 4.1.4 the following worked without any issue:

NUMBA_NUM_THREADS=${SLURM_CPUS_PER_TASK:?}
mpiexec -n "${SLURM_NTASKS:?}" --map-by slot:pe="${NUMBA_NUM_THREADS:?}" python pi_hybrid.py

This throws an error with 5.0.8.

I don't get why, it seems MPI processes are also counted now? When I set NUMBA_NUM_THREADS to 7 it works but then 2 CPUS are basically unused because the MPI processes are idle during the numba parallelization.

Possible solutions are any of the following

mpiexec -n "${SLURM_NTASKS:?}" --cpus-per-proc "${NUMBA_NUM_THREADS:?}" python pi_hybrid.py
mpiexec -n "${SLURM_NTASKS:?}" --bind-to none python pi_hybrid.py
mpiexec -n "${SLURM_NTASKS:?}" --oversubscribe --map-by slot:pe="${NUMBA_NUM_THREADS:?}" python pi_hybrid.py

--cpus-per-proc seems the most obvious to me, but in the docs (https://docs.open-mpi.org/en/v5.0.8/man-openmpi/man1/mpirun.1.html#options-old-hard-coded-content-mdash-to-be-audited) the following is mentioend:

deprecated in favor of --map-by <obj>:PE=n

So I would rather use --map-by (which is only working when using --oversubscribe (no other <obj> seems to work, using core for instance just quits without giving any output in .out / .err) which seems counterintuitive.

Using --bind-to none is also not what I want because this does not ensure that all numba threads are on the same socket.

So what am I missing here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions