New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PMIx detection is broken in OMPI 4.x and below when PMIx 3x and above are used #8823
Comments
The Unfortunately, we cannot just call |
I see, thank you Ralph. |
See open-mpi#8823 for the details.
See open-mpi#8823 for the details.
See open-mpi#8823 for the details.
See open-mpi#8823 for the details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for more details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for the details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for more details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for the details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for more details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for the details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for more details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for the details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for more details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for the details. Signed-off-by: Artem Polyakov <artpol84@gmail.com> (cherry picked from commit 0b3c1d9)
See open-mpi#8823 for more details. Signed-off-by: Artem Polyakov <artpol84@gmail.com> (cherry picked from commit 2210251)
See open-mpi#8823 for the details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for more details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for the details. Signed-off-by: Artem Polyakov <artpol84@gmail.com> (cherry picked from commit 30b29b3)
See open-mpi#8823 for more details. Signed-off-by: Artem Polyakov <artpol84@gmail.com> (cherry picked from commit ea6e2d8)
See open-mpi#8823 for more details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
All committed |
See open-mpi#8823 for the details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
See open-mpi#8823 for more details. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Background information
I was helping folks from LANL to debug the issues with their Slurm/PMIx environment and we observed the bug related to the PMIx detection logic. Basically for PMIx v3.x and above the PMIx is not correctly detected by OMPI.
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
In the experiments, OMPI 4.1.0 was used with internal PMIx v3.2.2.
Slurm was built with 2 versions of PMIx:
pmix_v2
)pmix_v3
)Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
From tarball sources.
Details of the problem
It was noticed that when doing a direct launch with Slurm PMIx plugin built with PMIx v3.1.5 the s1 component is selected:
The issue is not observed with the PMIx v2.x based Slurm PMIx plugin.
The selection logic in PMIx is based on the following environment variables:
So priority "5" is set if the PMIx presence is not detected.
Inspecting the environment observed by the application processes for PMIx v3.x and 2.x with the following command:
$ env | grep PMIX
shows the following:
PMIX v2
PMIx v3
This shows that
PMIX_SERVER_URI
envar that is used by PMIx selection logic is no longer present for PMIx v3.The ORTE-based launch only works because no other component is available:
As the fix, I suggest to check for env with the name
PMIX_SERVER_URI*
.Maybe a better solution can be implemented on PMIx side.
The text was updated successfully, but these errors were encountered: