-
Notifications
You must be signed in to change notification settings - Fork 926
Description
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v5.0.8
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
5.0.8: micromamba using the conda-forge channel
4.1.4: spack
Please describe the system on which you are running
- Operating system/version: rocky 8
- Computer hardware:
- Network type: ib / high speed ethernet
Details of the problem
Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.
Note: If you include verbatim output (or a code block), please use a GitHub Markdown code block like below:
In a slurm job I ask for 2 tasks and 8 cpus per task, making in total 16 cpus available.
While with version 4.1.4 the following worked without any issue:
NUMBA_NUM_THREADS=${SLURM_CPUS_PER_TASK:?}
mpiexec -n "${SLURM_NTASKS:?}" --map-by slot:pe="${NUMBA_NUM_THREADS:?}" python pi_hybrid.py
This throws an error with 5.0.8.
I don't get why, it seems MPI processes are also counted now? When I set NUMBA_NUM_THREADS
to 7 it works but then 2 CPUS are basically unused because the MPI processes are idle during the numba parallelization.
Possible solutions are any of the following
mpiexec -n "${SLURM_NTASKS:?}" --cpus-per-proc "${NUMBA_NUM_THREADS:?}" python pi_hybrid.py
mpiexec -n "${SLURM_NTASKS:?}" --bind-to none python pi_hybrid.py
mpiexec -n "${SLURM_NTASKS:?}" --oversubscribe --map-by slot:pe="${NUMBA_NUM_THREADS:?}" python pi_hybrid.py
--cpus-per-proc
seems the most obvious to me, but in the docs (https://docs.open-mpi.org/en/v5.0.8/man-openmpi/man1/mpirun.1.html#options-old-hard-coded-content-mdash-to-be-audited) the following is mentioend:
deprecated in favor of
--map-by <obj>:PE=n
So I would rather use --map-by
(which is only working when using --oversubscribe
(no other <obj>
seems to work, using core
for instance just quits without giving any output in .out
/ .err
) which seems counterintuitive.
Using --bind-to none
is also not what I want because this does not ensure that all numba threads are on the same socket.
So what am I missing here?