Skip to content

pmix3x: use PMIX_VALUE_LOAD() and PMIX_INFO_LOAD() macros #11472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 9, 2023

Conversation

ggouaillardet
Copy link
Contributor

@ggouaillardet ggouaillardet commented Mar 8, 2023

Refs. #10416

bot:notacherrypick

@github-actions
Copy link

github-actions bot commented Mar 8, 2023

Hello! The Git Commit Checker CI bot found a few problems with this PR:

381aabf: pmix3x: use PMIX_VALUE_LOAD() and PMIX_INFO_LOAD()...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@ggouaillardet ggouaillardet force-pushed the topic/v4.1.x/pmix3x_macros branch from 381aabf to 2b97485 Compare March 8, 2023 01:59
@github-actions
Copy link

github-actions bot commented Mar 8, 2023

Hello! The Git Commit Checker CI bot found a few problems with this PR:

2b97485: pmix3x: use PMIX_VALUE_LOAD() and PMIX_INFO_LOAD()...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@ggouaillardet ggouaillardet force-pushed the topic/v4.1.x/pmix3x_macros branch from 2b97485 to 51acc18 Compare March 8, 2023 02:20
@github-actions
Copy link

github-actions bot commented Mar 8, 2023

Hello! The Git Commit Checker CI bot found a few problems with this PR:

51acc18: pmix3x: use PMIX_VALUE_LOAD() and PMIX_INFO_LOAD()...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

Refs. open-mpi#10416

bot:notacherrypick

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
@ggouaillardet ggouaillardet force-pushed the topic/v4.1.x/pmix3x_macros branch from 51acc18 to 6e8e14f Compare March 8, 2023 04:50
@jsquyres
Copy link
Member

@ggouaillardet What's the status of this PR? It's still in "Draft" mode.

@jsquyres jsquyres added this to the v4.1.6 milestone Mar 14, 2023
@ggouaillardet
Copy link
Contributor Author

ggouaillardet commented Mar 14, 2023 via email

@rhc54
Copy link
Contributor

rhc54 commented Mar 19, 2023

Well, it won't be "fixed" in 4.2.4 as it isn't really a "problem" in 4.2.3 😄 - it has to do with the OMPI code not sticking to the official public APIs and instead using internal PMIx library functions (albeit publicly visible symbols). All I've done for 4.2.4 is to "typedef" the old symbols to their new standardized versions to help alleviate breakage.

I'm not sure I understand why all the changes were made to switch to "info_load" instead of "value_load" as the latter remains a perfectly fine, standardized way to load values. You just need to use the official macros instead of the internal functions. You are welcome to make the change if that's what you want - but it certainly isn't required.

@dvzrv
Copy link

dvzrv commented May 23, 2023

Well, it won't be "fixed" in 4.2.4 as it isn't really a "problem" in 4.2.3 smile - it has to do with the OMPI code not sticking to the official public APIs and instead using internal PMIx library functions (albeit publicly visible symbols). All I've done for 4.2.4 is to "typedef" the old symbols to their new standardized versions to help alleviate breakage.

I'm not sure I understand why all the changes were made to switch to "info_load" instead of "value_load" as the latter remains a perfectly fine, standardized way to load values. You just need to use the official macros instead of the internal functions. You are welcome to make the change if that's what you want - but it certainly isn't required.

@rhc54 could you point out which commits you are referring to on https://github.com/openpmix/openpmix/tree/v4.2 so that we can potentially cherry-pick from that branch?

@rhc54
Copy link
Contributor

rhc54 commented May 23, 2023

@rhc54 could you point out which commits you are referring to on https://github.com/openpmix/openpmix/tree/v4.2 so that we can potentially cherry-pick from that branch?

Looks like it was 8e70e37e2

@lahwaacz
Copy link
Contributor

lahwaacz commented May 24, 2023

@rhc54 I actually tested these two approaches:

  1. I built openmpi-4.1.5 with the patch from this pull request applied
  2. I built openpmix-4.2.4rc1 and then rebuilt openmpi-4.1.5 without the patch

But only the first one fixes the issues with MPI_Comm_split_type (see the issue #10416). In the second approach, MPI_Comm_split_type has the same issues (in my case it just deadlocks). This was on Arch Linux before @dvzrv applied the patch distro-wide.

@rhc54
Copy link
Contributor

rhc54 commented May 26, 2023

I built openpmix-4.2.4rc1 and then rebuilt openmpi-4.1.5 without the patch

Quite possible that the fix post-dates the rc1 tag. Might be worth trying it with the head of the PMIx v4.2 branch.

@lahwaacz
Copy link
Contributor

Quite possible that the fix post-dates the rc1 tag. Might be worth trying it with the head of the PMIx v4.2 branch.

The pinpointed commit openpmix/openpmix@8e70e37 is older than the tag - it's 3rd line below:

$ git log --format=oneline v4.2.4rc1 | head
8e71cf0e6753192b075edffc46fe09de597dc7c1 (tag: refs/tags/v4.2.4rc1) Update branch headers for rc1
a00074c2fcf448d2bc6c61453c727f7ac94ed96d Fix fetch of globally unique keys
8e70e37e2ab973b50251bd2704214029ee7d38e4 Cleanup code style, fix bit checks, and fix backward compatibility
b3bc7ec76523a1266b2bb7407dbab02420925d34 Correct cbdata type in pmix_show_help callback function
ec2e2f8776283de5188d3f924eb43c7ec6807800 Fix some Python bindings errors
b3d578dd6c3c3eea081f5ebffa9394e0ce443de4 Complete PMIX_*_FREE() conversion.
f51d343142de3bcdd127edcf8d5fadf94b126b1c Silence couple of nit warnings
d2ea5c67cbd70926b00f6e8467dbc55e748b0a7e Improve PMIX_*_RELEASE() consistency.
22a64fe8d312d6192452e9a31df7b40126af247d Convert the majority of PMIX_*_FREE() macros.
e6328142f66fb5f2c3f8d8ff342c057d830e4f42 Plug memory leaks.

@rhc54
Copy link
Contributor

rhc54 commented May 26, 2023

Could be that there are other commits required - I honestly don't know. Try with the head of the v4.2 branch and see if the problem still exists.

@lahwaacz
Copy link
Contributor

Could be that there are other commits required - I honestly don't know. Try with the head of the v4.2 branch and see if the problem still exists.

Ok, I just did that and the current head of the v4.2 branch (openpmix/openpmix@cd813ef) behaves the same as the v4.2.4rc1 tag (I get a deadlock).

@rhc54
Copy link
Contributor

rhc54 commented May 26, 2023

Then I guess you should use this patch - I have no other advice as this is an OMPI integration issue.

@jsquyres
Copy link
Member

jsquyres commented Jun 7, 2023

@ggouaillardet It looks like this patch is necessary to both fix #11729 and make Open MPI v4.1.x compatible with any PMIx >= v4.2.3.

You still have this patch marked as "Draft". What's the current status?

@rhc54
Copy link
Contributor

rhc54 commented Jun 7, 2023

I believe it is draft solely because I questioned whether all the changes are truly necessary - however, there is no harm done by the larger patch. This patch should be fine as-is.

@jsquyres
Copy link
Member

jsquyres commented Jun 7, 2023

I believe it is draft solely because I questioned whether all the changes are truly necessary - however, there is no harm done by the larger patch. This patch should be fine as-is.

Ok. I see @ggouaillardet's comment from 14 Mar:

I did not have much time to work on it.

It is not fully completed, but i think it can be used.

Is there more to be done here?

@jsquyres
Copy link
Member

jsquyres commented Jun 8, 2023

We have a report from #11729 (comment) that it makes his usage of Open MPI v4.1.x work with PMIx 4.2.3.

@ggouaillardet Is this PR sufficient as-is?

@jsquyres jsquyres marked this pull request as ready for review June 8, 2023 14:37
@jsquyres
Copy link
Member

jsquyres commented Jun 8, 2023

We had some off-PR discussion about this:

  • @ggouaillardet mentioned that he wanted to take another pass through this PR and see if he missed anything.
  • The PR has been shown in at least 2 cases to enable people to use OpenPMIX v4.2.3 with Open MPI v4.1.x.
  • Meaning: this PR seems to fix at least some -- if not all -- of the problem. It seems safe to merge as-is.
  • That being said, it might not be a bad thing for someone to take a pass through this PR and see if we missed anything.

@rhc54
Copy link
Contributor

rhc54 commented Jun 9, 2023

I looked it over and think it is okay. There are additional data types supported by PMIx that are not covered here, but they also are not data types used by OMPI - so no point in worrying about something you'll never see.

@jsquyres
Copy link
Member

jsquyres commented Jun 9, 2023

Thanks @rhc54!

@jsquyres jsquyres merged commit c053cb8 into open-mpi:v4.1.x Jun 9, 2023
@jsquyres jsquyres added the NEWS label Jun 9, 2023
@jsquyres
Copy link
Member

#11472 has been merged, so I'm closing this issue. Thank you for the report!

bartoldeman added a commit to ComputeCanada/easybuild-easyconfigs that referenced this pull request Sep 19, 2023
I needed this to use any MPI application with srun with pmix or
with plain mpirun, or else I would run into OOB/TCP communication
errors for even "mpirun hostname" across two nodes (see
open-mpi/ompi#11729
for another user with the same problem). This patch is taken from
open-mpi/ompi#11472 and with it,
both srun and mpirun run flawlessly without further ado.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants