Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Oct 20, 2015

:bot🏷️bug
:bot:assign: @miked-mellanox
:bot:milestone:v1.10.1

This patch removes a priority check that disables cm if the previous
pml had higher priority. The check was incorrect as coded and is
unnecessary as we finalize all but one pml anyway.

Fixes open-mpi/ompi#1035

(cherry picked from open-mpi/ompi@2fd176a)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@mike-dubman
Copy link
Member

could you please describe what problem this PR solves? and if it change previous component selection logic - what is a new one?

@hjelmn
Copy link
Member Author

hjelmn commented Oct 20, 2015

This PR solves a problem that only occurs during dynamic loading. If ob1 gets loaded before cm then cm never gets selected.

Looks like I need to fix a few bugs in the PR before you can review this.

@hjelmn
Copy link
Member Author

hjelmn commented Oct 20, 2015

@rhc54 Is it ok to change the arguments of an internal API in a release series? I added an argument to mca_base_select. If not I will rename the new version with _new and have the old version call with NULL as the priority argument. I can drop a bunch of changes from this patch if I do that.

@rhc54
Copy link

rhc54 commented Oct 20, 2015

I believe so - @jsquyres: you have any qualms?

@jsquyres
Copy link
Member

We do try to avoid changing major internal APIs so that any developers outside of the OMPI tree (e.g., http://processors.wiki.ti.com/index.php?title=MCSDK_HPC_3.x_OpenMPI) doesn't get unexpectedly screwed with a new Open MPI minor release. I think mca_base_select() qualifies as a major API.

So I'd be in favor of @hjelmn's proposed approach: add a _new version (with a suitable comment explaining why this is happening, yadda yadda yadda) and have the old version just call the new version with a NULL param.

@rhc54
Copy link

rhc54 commented Oct 20, 2015

I can live with that, but then let's do that as a custom patch only for 1.10.1 - I'd hate to have that workaround in the future code base.

@hjelmn
Copy link
Member Author

hjelmn commented Oct 20, 2015

@rhc54 Agreed. Just want to get this functionality into 1.10.1.

@jsquyres
Copy link
Member

@rhc54 Agreed: this is specific for v1.10.x. Master / v2.x will continue to not have the _new workaround.

@hjelmn hjelmn force-pushed the v1.10_cm_priority branch from eaa495b to ff8b6eb Compare October 20, 2015 19:20
@hjelmn
Copy link
Member Author

hjelmn commented Oct 20, 2015

@rhc54 @jsquyres @miked-mellanox Good to go now.

@hjelmn
Copy link
Member Author

hjelmn commented Oct 20, 2015

Shoot, lost a commit. Let me rebase this one more time.

This commit removes code that checks the ob1 priority vs the previous
priority. The previous priority is meaningless here and may only cause
ob1 to disable itself when it shouldn't.

(cherry picked from open-mpi/ompi@bedd802)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The mca_base_select function uses returned priorities to select the
best component/module. This priority may be of use to the caller so
pass that information back in an optional argument. If the priority is
not needed pass NULL.

This is a modification of
open-mpi/ompi@8b5810f . The primary
difference is we left the original mca_base_select in place for
compatibility.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit changes the priority of mtl components to be relative to
pml/ob1 and updates the mtl interface to expose this priority. cm now
sets its own priority based on the priority of the selected mtl
component.

(cherry picked from open-mpi/ompi@53f6b57)

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hjelmn hjelmn force-pushed the v1.10_cm_priority branch from ff8b6eb to 0cecbbd Compare October 20, 2015 19:28
@jsquyres
Copy link
Member

👍

@jsquyres
Copy link
Member

Also need @regrant and @tkordenbrock to chime in to see what this whole thing does to portals BTL / MTL.

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-v1.8-pr/883/ for details.

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-v1.8-pr/884/ for details.

@rhc54
Copy link

rhc54 commented Oct 21, 2015

I'm just going to mark this as pushed-back so I don't accidentally commit it before @regrant and @tkordenbrock can confirm no issue with portals, and @miked-mellanox confirms same for mxm

@hjelmn
Copy link
Member Author

hjelmn commented Oct 22, 2015

Looks good to go with comments from @regrant in open-mpi/ompi#1058

:bot:nolabel:pushed-back

@hjelmn
Copy link
Member Author

hjelmn commented Oct 22, 2015

@yburette Included the ofi priority change.

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-v1.8-pr/897/ for details.

rhc54 pushed a commit that referenced this pull request Oct 22, 2015
@rhc54 rhc54 merged commit db12c3a into open-mpi:v1.10 Oct 22, 2015
alinask pushed a commit to alinask/ompi-release that referenced this pull request Dec 10, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants