Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pmi: work with 3rd party pmi libraries #6662

Merged
merged 6 commits into from
Sep 8, 2023
Merged

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Sep 1, 2023

Pull Request Description

Cray PMI does not define PMI2_keyval_t. Check it in configure and typedef INFO_TYPE PMI2_keyval_t if needed. This will allow mpich to compile using Cray PMI. The PMI2_keyval_t is only used in PMI2_Job_Spawn. The name publishing API always uses NULL for its info pointers. It is possible that our INFO_TYPE is incompatible with Cray PMI's internal type, which will break PMI2_Job_spawn. If so, this needs to be fixed in the future, possibly on the Cray side.

A summary:

To build with 3rd party PMI library, e.g. Cray PMI, Slurm PMI, OpenPMIx, use:
   --with-pmi1=path   (libpmi.so)
   --with-pmi2=path   (libpmi2.so)
   --with-pmix=path   (libpmix.so)

As of this commit, following options are more or less independent:
   --with-pmi=default|pmi1|pmi2|pmix                            (PMI Version)
   --with-pmilib=default|no|mpich|install                         (Whether to build/install builtin libpmi)
   --with-pm=default|no|gforker|remshell|hydra              (Whether to build process manager)

Typical users do not need to touch any of these options. 

A special option is --with-pmi=slurm. This is mainly because Slurm installs
header as #include "slurm/pmi.h".

Fixes #6653
[skip warnings]

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

Cray PMI does not define PMI2_keyval_t. Check it in configure and
typedef INFO_TYPE PMI2_keyval_t if needed. This will allow mpich to compile using
Cray PMI. The PMI2_keyval_t is only used in PMI2_Job_Spawn. The name
publishing API always uses NULL for its info pointers. It is possible
that our INFO_TYPE is incompatible with Cray PMI's internal type, which
will break PMI2_Job_spawn. If so, this need be fixed in the future,
possibly on the Cray side.
Skip PMI2_Set_threaded if it does not exist. For example, it does not
exist from Cray's libpmi2.so.
Make sure to set LIBS before checking PMI capability.

Also disable built-in pm if user is configuring with external pmi
library. Hydra can be installed separated if needed.
@hzhou
Copy link
Contributor Author

hzhou commented Sep 5, 2023

test:mpich/pmi ✔️ - https://jenkins-pmrs.cels.anl.gov/job/mpich-review-pmi/242/

@hzhou hzhou requested a review from raffenet September 5, 2023 22:45
@hzhou hzhou mentioned this pull request Sep 6, 2023
4 tasks
Now we can directly link with cray pmi using --with-pmi=path or
--with-pmi2=path.
These configure macros are no longer used since we support runtime PMI
versions.
@hzhou hzhou force-pushed the 2308_pmi_cray branch 2 times, most recently from 713102e to e5572f8 Compare September 6, 2023 18:11
@hzhou hzhou changed the title pmi: check whether PMI2_keyval_t is missing pmi: work with 3rd party pmi libraries Sep 6, 2023
configure.ac Outdated
@@ -1704,22 +1663,39 @@ esac
enable_pmi1="no"
enable_pmi2="no"
enable_pmix="no"
if test "$with_pmi" = "pmi2" ; then
if test "$with_pmi" = "pmi1" ; then
enable_pmi2="yes"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enable_pmi2="yes"
enable_pmi1="yes"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Applied

Allow users to force --with-pm and --with-pmilib options. It allows
users to build hydra even with external pmi library or build embedded
libpmi concurrently with an external pmi library (e.g. libpmix). This
may not work due to conflicts, but it is at user's descretion.

Use may load 3rd party pmi libraries with:
   --with-pmi1=path   (libpmi.so)
   --with-pmi2=path   (libpmi2.so)
   --with-pmix=path   (libpmix.so)

As of this commit, following options are more or less independent:
   --with-pmi=default|pmi1|pmi2|pmix
   --with-pmilib=default|no|mpich|install
   --with-pm=default|no|gforker|remshell|hydra

The default option can be internally overwritten. with_pmilib=no will
not build internal(embedded) libpmi. with_pm=no will not build internal
pm.

A special option is --with-pmi=slurm. This is mainly because Slurm installs
header as #include "slurm/pmi.h".
@hzhou
Copy link
Contributor Author

hzhou commented Sep 6, 2023

test:mpich/pmi ✔️ (except for pmix spawn failures)

@hzhou hzhou requested a review from raffenet September 7, 2023 03:20
Copy link
Contributor

@raffenet raffenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK with me. @sonjahapp I can't request your review since you are not yet part of @pmodels/mpich-partec (invite just sent). Could you comment saying this works for your use-case before we merge? Thanks.

@sonjahapp
Copy link
Contributor

Could you comment saying this works for your use-case before we merge?

For a configuration including support for hydra, PMI1 and PMI2 provided via MPICHs libpmi, and PMIx provided by OpenPMIx I have to configure with --with-pm=hydra --with-pmilib=mpich --with-pmix=<path-to-openpmix>. Right? I tried this combination of parameters and it worked. But I want to be sure that I'm not overlooking something.
If this is the right way, the current solution works for my use case.

You may want to update the documentation doc/wiki/design/PMI.md with respect to the changes introduced by this PR.

A question out of curiosity:
How do you plan to handle potential conflicts between --with-pmix=<path-to-openpmix> --with-pmi=mpich as soon as MPICH's libpmi has PMIx support? You mentioned that you are working on this. Will it still be possible to use MPICH's libpmi for PMI[1,2] and an OpenPMIx for PMIx? In other words, will it be possible to "slice out" the future PMIx support in MPICHs libpmi in favour of an external PMIx lib?

@hzhou
Copy link
Contributor Author

hzhou commented Sep 8, 2023

For a configuration including support for hydra, PMI1 and PMI2 provided via MPICHs libpmi, and PMIx provided by OpenPMIx I have to configure with --with-pm=hydra --with-pmilib=mpich --with-pmix=<path-to-openpmix>. Right? I tried this combination of parameters and it worked. But I want to be sure that I'm not overlooking something. If this is the right way, the current solution works for my use case.

Yes, this is how it supposed to work.

You may want to update the documentation doc/wiki/design/PMI.md with respect to the changes introduced by this PR.

Will do.

A question out of curiosity: How do you plan to handle potential conflicts between --with-pmix=<path-to-openpmix> --with-pmi=mpich as soon as MPICH's libpmi has PMIx support? You mentioned that you are working on this. Will it still be possible to use MPICH's libpmi for PMI[1,2] and an OpenPMIx for PMIx? In other words, will it be possible to "slice out" the future PMIx support in MPICHs libpmi in favour of an external PMIx lib?

I suppose we can pass configure flags to builtin pmi to skip certain protocols. For example, if you do --with-pmilib=mpich --with-pmix=<path-to-openpmix>, we can configure pmi with --disable-pmix.

@hzhou hzhou merged commit ea23f68 into pmodels:main Sep 8, 2023
4 of 5 checks passed
@hzhou hzhou deleted the 2308_pmi_cray branch September 8, 2023 14:20
@raffenet
Copy link
Contributor

raffenet commented Sep 8, 2023

I think something like --with-pmilib=embedded:pmi1,pmi2 to exclude pmix support in the embedded library could work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

build: Error compiling on polaris and theta on gitlab-ci
3 participants