Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Conversation

@mike-dubman
Copy link
Member

master commit: open-mpi/ompi#973

To make sure no rank will call mxm_ep_disconnect before others finish doing MPI
communications. this could be removed after refactoring MXM disconnect flow.
@mike-dubman
Copy link
Member Author

:bot:milestone:v1.10.1
:bot:assign: @jsquyres
:bot🏷️enhancement

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-v1.8-pr/792/ for details.

@mike-dubman
Copy link
Member Author

Actually, this commit changed mpi_finalize behavior and introduced issue in mxm:

c66ebf6

Before this commit, ompi_comm_finalize() was called before del_procs(), which ensured a barrier before calling del_procs.

@jsquyres
Copy link
Member

jsquyres commented Oct 5, 2015

It doesn't seem like a good idea to unconditionally have an extra barrier in finalize, especially for systems that don't require it.

Can this barrier be made conditional, somehow? E.g., systems that require it can ask for it somehow? (e.g., a PML, and/or BTL / MTL flag that indicates that the underlying point-to-point system needs it)

@jsquyres
Copy link
Member

jsquyres commented Oct 5, 2015

Actually, I looked closer on master: it's a PMIX fence, and it's not conditional -- but it's OOB, not MPI.

Can you explain further why this patch is necessary? Doesn't the rte barrier prevent MXM endpoints from disconnecting from each other too early?

FWIW: @rhc54 and I talked about this patch, and we came up with a different theory as to why you need this patch... but after reading your commit message, I'm not sure it's correct (it had to do with direct launch and how the SLURM PMI fence is blocking)...

@mike-dubman
Copy link
Member Author

The mxm disconnect flow requires that all processes doing disconnect will be waiting in "mxm disconnect" flow. So, oob or mpi barrier makes it happen.

Before commit: c66ebf6 - we had barrier in-place.

Maybe we can have global flag which will be set by component which requires barrier in finalize and mxm will set it?

@yosefe - what do you think

@jsquyres
Copy link
Member

jsquyres commented Oct 6, 2015

@miked-mellanox As I mentioned, there's already an RTE barrier in the v1.10 finalize.

@mike-dubman
Copy link
Member Author

@jsquyres
Copy link
Member

jsquyres commented Oct 6, 2015

Ah, ok -- you didn't say that before (that it needed to be before del_procs).

Then yes, probably a global flag indicating "I need an RTE barrier before del_procs in ompi_mpi_finalize" (that defaults to false) would probably be fine. This basically what I suggested in #632 (comment).

Just curious: why is this only coming up a year and a half after c66ebf6 was committed?

@jsquyres
Copy link
Member

jsquyres commented Oct 6, 2015

Actually, a better question is: can the RTE barrier be moved up in the v1.10 ompi_mpi_finalize() to be before the del_procs, like it is in master? (I have to run at the moment and can't check this myself) If so, this whole conditional thing isn't necessary (although we could probably still do the conditional thing as an optimization for the transports that don't need it -- but we should probably do that on master first).

@mike-dubman
Copy link
Member Author

Please see this: http://www.open-mpi.org/community/lists/devel/2014/07/15203.php for why-now.

@mike-dubman
Copy link
Member Author

moving RTE barrier to before del_procs can work for us, it was there before.
What was a reason in c66ebf6 to move it where it is now?

@rhc54
Copy link

rhc54 commented Oct 6, 2015

If you look carefully at the referenced commit, it is a revert of a prior commit from @hjelmn that had moved the barrier before del_procs. This was part of a broader finalize change. The reason it was reverted was a long discussion on telecon resulting from a spew of errors in the 1.8 series MTT that resulted from the change. So I reverted it as requested, and we scheduled the matter for discussion at the upcoming OMPI devel conference in June of 2014.

The result of all that was a conclusion that the finalize change proposed by @hjelmn wasn't immediately backwards compatible as it relied on a number of other things. And so we left the 1.8 series alone.

So I remain puzzled as to why this has been in the release series since the 1.7 days, was briefly modified as you now request, and shortly thereafter reverted - and yet only now is being raised again. From the messages of 15 months ago, it looks like this is an intermittent issue at most - has it not been a concern because it is rarely hit? Or as was suggested back then, is it only seen when someone's program isn't quite correct?

@mike-dubman
Copy link
Member Author

I dont remember all details :(
We used private repo w/ MPI_Barrier patch applied to overcome it.
it is a real issue for us.

@jsquyres
Copy link
Member

jsquyres commented Oct 6, 2015

And apparently I don't remember all the details, either (since I just suggested moving the RTE barrier up before del_procs, which is apparently bad!). Doh!

Trying to remember a bit more: I think that back then, it wasn't so much an issue of moving the RTE barrier up as it was moving the del_procs down... wasn't it? I.e., wasn't it moving the location of the del_procs that caused the problem (that was then reverted)?

Armed with this (possibly incorrect) information...

  1. Can we just move the RTE barrier up earlier in ompi_mpi_finalize() (in v1.10) and not hurt anything else? (vs. moving the del_procs later in the function, which caused problems)
  2. Or, if that doesn't work, perhaps we can add a 2nd, conditional RTE barrier before del_procs (i.e., MXM can set a global flag that says "yes, I need this additional RTE barrier).

Thoughts?

@gpaulsen
Copy link
Member

gpaulsen commented Oct 6, 2015

I'd love to see this fixed upstream on master.

@jsquyres
Copy link
Member

jsquyres commented Oct 6, 2015

@gpaulsen What's broken on master? (note that the code is quite different over there because master has PMIx)

@mike-dubman
Copy link
Member Author

we reviewed v1.10 finalize code and probably we can add barrier call into del_proc() for yalla and mtl/mxm.
will push PR shortly

@jsquyres
Copy link
Member

jsquyres commented Oct 6, 2015

@miked-mellanox Ok, that seems like a non-controversial fix. Thanks!

mike-dubman added a commit to mike-dubman/ompi-release that referenced this pull request Oct 6, 2015
Based on open-mpi#632

* limitation: does not support MPI_Spawn
@mike-dubman
Copy link
Member Author

closing this PR, replacing with this one: #644

@mike-dubman mike-dubman closed this Oct 6, 2015
@mike-dubman mike-dubman deleted the topic/v1.10_patches_002_upstream branch October 6, 2015 15:33
alex-mikheev pushed a commit to alex-mikheev/ompi-release that referenced this pull request Nov 11, 2015
Initialize convertor in pml-cm-send/recv
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants