Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

github action: add mpi4py to v50x #12411

Merged
merged 1 commit into from
Mar 15, 2024
Merged

Conversation

hppritcha
Copy link
Member

There were multiple commits involved with getting mpi4py smoke test into github actions for main branch, so just do a not a cherry pick here.

bot:notacherrypick

There were multiple commits involved with getting mpi4py smoke test
into github actions for main branch, so just do a not a cherry pick here.

bot:notacherrypick

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@github-actions github-actions bot added this to the v5.0.3 milestone Mar 15, 2024
@rhc54
Copy link
Contributor

rhc54 commented Mar 15, 2024

This is the intercomm create problem we see in main - not surprising, is it?

@hppritcha
Copy link
Member Author

No its different. The changes to the comm_cid.c code on main that "broke" intercomm support - at least when using OB1 PML - haven't been ported back to the v5.0.x branch. Note the qualifier - i don't observe the intercomm problems if I use the UCX PML or the OFI MTL.

This failure is related to #12407

@rhc54
Copy link
Contributor

rhc54 commented Mar 15, 2024

I realize that is the issue that was filed - my question was if you expect the intercomm test to work given what we see in main. Sounds like you do expect it to work, and it does work in some circumstances - so it doesn't sound like something related to PMIx/PRRTE, or else it wouldn't work in all scenarios.

@hppritcha
Copy link
Member Author

I realize that is the issue that was filed - my question was if you expect the intercomm test to work given what we see in main. Sounds like you do expect it to work, and it does work in some circumstances - so it doesn't sound like something related to PMIx/PRRTE, or else it wouldn't work in all scenarios.

correct the intercomm problem is not a pmix/prrte problem. i would not expect to see the intercomm failures on v5.0.x unless commit 23df181 is cherry-picked back to v5.0.x

@bosilca
Copy link
Member

bosilca commented Mar 15, 2024

Just to make sure that everyone here understand, the comm_cid.c changes in question would have affected all the PML (ob1, ucx, *) in the same way, if the error was in comm_cid. As I said in another comment, it might be time to stop blaming that particular commit and try to understand what really is going on.

@wenduwan
Copy link
Contributor

The failure is related to a pmix commit. See #12408

@wenduwan wenduwan requested a review from janjust March 15, 2024 18:54
Copy link
Contributor

@wenduwan wenduwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hppritcha Thank you very much! I like this a lot.

Want to get opinion from @janjust too

@wenduwan wenduwan merged commit 24be74e into open-mpi:v5.0.x Mar 15, 2024
10 of 11 checks passed
@rhc54
Copy link
Contributor

rhc54 commented Mar 15, 2024

So does this mean that all future PRs to v5.0 will be committed with failed CI? Or are you going to start disabling tests over here to get this to pass? Just asking to determine if it is worth spending time on v5.0 bugs.

@hppritcha
Copy link
Member Author

the mpi4py test harness has a straightforward way to disable a test based on the MPI (and possible version) that was used to build it. i figure we'd just open PRs against mpi4py main to - hopefully temporarily - disable specific tests as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants