-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make automatic distributed initialization work with Open MPI 5.x, PMIx, and PRRTE #14576
Comments
@EricHallahan are you interested in implementing the support? cc; @nvcastet |
Prior to filing this issue, I made a patch to the existing implementation to make it work for Open MPI 5.x. I am willing to contribute it, but the question remains as to how to maintain support for earlier versions (something I didn't consider for my personal use) as they are going to remain in use for many years to come. |
@EricHallahan Thanks a lot for raising this issue and the thorough discussion! |
That is certainly a valid option! I'll go ahead and try that. |
@EricHallahan could you contribute your patch, or maybe put it out there? It would be useful for me as well. |
(And do you know if there is a way to do the same for MPICH by any chance?) |
You can use auto-detection via mpi4py for that. See #20174 |
I know, but I don't want to. |
Background
#13929 introduced automatic JAX distributed initialization via the Open MPI Open Run-Time Environment (ORTE) layer and its orterun process launcher (also known by its many aliases mpirun, mpiexec, oshrun, shmemrun).
Upcoming Open MPI 5.x series releases do away with the previous ORTE infrastructure for one based around the PMIx standard via the OpenPMIx reference PMIx implementation and complimentary PMIx Reference Run-Time Environment (PRRTE); in Open MPI 5.x the mpirun/mpiexec launcher is simply a wrapper for the PRRTE prterun launcher.
PMIx and PRRTE has differing behavior to ORTE which makes the implementation introduced in #13929 incompatible with Open MPI 5.x. With Open MPI 5.0 (now in its tenth release candidate) continuing to approach release, there seems to be value in preparing JAX for this change.
Considerations & Challenges
Continued compatibility with ORTE and orterun
The current implementation (as introduced in #13929) is fully usable with Open MPI versions prior to 5.x, and it is important to maintain compatibility with these releases when introducing support for Open MPI 5.x. It is unclear to me whether it would be wiser to make the current implementation compatible with the PRRTE-based launcher, or to create a separate piece of code to handle it.
New behaviors
PMIx/PRRTE exposes relevant information differently than ORTE.
OMPI_MCA_orte_hnp_uri
no longer exists, and the server URI is now instead exposed to the process via a family ofPMIX_SERVER_URI
environment variables (one for each supported version of the PMIx standard). This means that the current implementation is not activated at all by the PRRTE process launcher. Even if it was, the value of these variables are not the same asOMPI_MCA_orte_hnp_uri
and require a meaningfully different handling: The identifier prior to the address is no longer exclusively numeric and instead is the base of the job namespace (exposed viaPMIX_NAMESPACE
), derived from the server tmpdir (exposed viaPMIX_SERVER_TMPDIR
), which itself is derived from the server hostname (exposed viaPMIX_HOSTNAME
) and the numeric job identifier.Open MPI environment variables exposing the world size, world rank, and local rank are unchanged, but PMIx also exposes the world rank itself via
PMIX_RANK
.Detecting if the process is launched with prterun is more convenient than with orterun:
PRTE_LAUNCHED
is set to1
.The text was updated successfully, but these errors were encountered: