Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the connect/accept/spawn code to better utilize PMIx Group support #12398

Closed
wants to merge 3 commits into from

Conversation

rhc54
Copy link
Contributor

@rhc54 rhc54 commented Mar 10, 2024

Add a second method for doing connect/accept

The "old" method relies on PMIx publish/lookup followed by
a call to PMIx_Connect. It then does a "next cid" method
to get the next communicator ID, which has multiple algorithms
including one that calls PMIx_Group.

Simplify this by using PMIx_Group_construct in place of
PMIx_Connect, and have it return the next communicator ID.
This is more scalable and reliable than the prior method.

Retain the "old" method for now as this is new code. Create
a new MCA param "OMPI_MCA_dpm_enable_new_method" to switch
between the two approaches. Default it to "true" for now
for ease of debugging.

NOTE: this includes an update to the submodule pointers
for PMIx and PRRTE to obtain the required updates to
those code bases.

Signed-off-by: Ralph Castain rhc@pmix.org

Use PMIx_Group_destruct for disconnect

It provides a barrier and gives both the host
and the PMIx library a chance to cleanup the
group. Retain the old "fence"-based algorithm
as well for when the old connect/accept
method is used.

Signed-off-by: Ralph Castain rhc@pmix.org

Up the submodule pointers to include required support

Update to get the changes required to support this new
dpm connect/accept/spawn method.

Signed-off-by: Ralph Castain rhc@pmix.org

The "old" method relies on PMIx publish/lookup followed by
a call to PMIx_Connect. It then does a "next cid" method
to get the next communicator ID, which has multiple algorithms
including one that calls PMIx_Group.

Simplify this by using PMIx_Group_construct in place of
PMIx_Connect, and have it return the next communicator ID.
This is more scalable and reliable than the prior method.

Retain the "old" method for now as this is new code. Create
a new MCA param "OMPI_MCA_dpm_enable_new_method" to switch
between the two approaches. Default it to "true" for now
for ease of debugging.

NOTE: this includes an update to the submodule pointers
for PMIx and PRRTE to obtain the required updates to
those code bases.

Signed-off-by: Ralph Castain <rhc@pmix.org>
It provides a barrier and gives both the host
and the PMIx library a chance to cleanup the
group. Retain the old "fence"-based algorithm
as well for when the old connect/accept
method is used.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Update to get the changes required to support this new
dpm connect/accept/spawn method.

Signed-off-by: Ralph Castain <rhc@pmix.org>
@hppritcha
Copy link
Member

Did you figure out the modex data issue that was causing tcp btl to not work?

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 11, 2024

Did you figure out the modex data issue that was causing tcp btl to not work?

I believe so. The initial problem was a bug in the group code that caused us not to store the data from the child job. Once I fixed that, I could pass Lisandro's test.

However, once I started adding more procs and spreading them over multiple nodes, it failed again. I tracked it down to not getting the modex data from all participating procs. Currently, we collect the data from the procs that participate directly in the PMIx_Group_construct operation - and not those that are joining as "add members".

So when the parent and child jobs only have one process, then all is good. Once either has more than one process, then we are missing modex info.

What I intend to do is have mpirun provide the modex data (since it always has a copy of it) instead of assembling it on the backend in PMIx_Group_construct. This will give us all of the modex info, and even allow us to support construction of groups where "add members" includes additional namespaces not directly participating.

I'm tied up today with a medical procedure, but hope to get to it over the next few days.

@janjust janjust marked this pull request as draft March 12, 2024 15:31
@jsquyres
Copy link
Member

(while talking in the Tuesday meeting, we saw Ralph's note that he's still working on this, so we moved the PR to "draft" just to make sure we don't accidentally merge it too early)

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 13, 2024

Ah yes - thanks for moving to draft! As I noted elsewhere, I am not actually proposing this for OMPI to use. Only using this to help drive some work on the PMIx/PRRTE group support and wanted to test it a bit more broadly. If someone else wants to utilize the capability, they are welcome to do so.

Should have included that clarification here as well. Sorry for the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants