Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noncollective Communicator Creation #286

Closed
mpiforumbot opened this issue Jul 24, 2016 · 17 comments
Closed

Noncollective Communicator Creation #286

mpiforumbot opened this issue Jul 24, 2016 · 17 comments

Comments

@mpiforumbot
Copy link
Collaborator

mpiforumbot commented Jul 24, 2016

Originally by jdinan on 2011-07-20 18:01:39 -0500


This ticket proposes the addition of a communicator creation routine that is not collective over a parent communicator. This is important for fault tolerance when some processes in the parent communicator may have failed; for performance when a small subset of a large communicator wishes to form a new group; and for load balancing where worker groups can be dynamically resized to match the workload.

The proposed new function is:

int MPI_Group_comm_create(MPI_Comm in, MPI_Group grp, int tag, MPI_Comm *out)

This routine is collective only over processes in grp. Multiple threads are permitted to call this routine concurrently and the tag argument is used to distinguish calls.

@mpiforumbot
Copy link
Collaborator Author

Originally by jdinan on 2011-07-20 18:02:35 -0500


Attachment added: dinan_imudi11.pdf (210.0 KiB)
EuroMPI paper that describes an approach to implementing this functionality on top of MPI and provides application use cases.

@mpiforumbot
Copy link
Collaborator Author

Originally by jdinan on 2011-07-20 18:04:13 -0500


Pavan presented this proposal to the Forum on 7/20/2011 and there is strong support for it. The feedback we received was to look into whether the tag argument is required for correctness or if it can be removed to simplify the interface.

@mpiforumbot
Copy link
Collaborator Author

Originally by moody20 on 2011-10-23 09:43:11 -0500


p223, 4-10

This list is not exhaustive, e.g., dynamic processes are missing, so change "The MPI interface provides five communicator construction routines" to "The following communicator construction routines are described in this section."

Add MPI_INTERCOMM_MERGE to this list.

Order list in order of appearence as best as possible

p227, 43

In advice to users, highlight that comm_create is collective over group of input communicator, while comm_create_group is collective over the input group, which is a subset of the group of comm.

Also stress that comm_create is much faster if all procs in comm invoke the routine

p227, 46

Add rationale that this can be done with intercomm_create and intercomm_merge by building up from comm_self, but calling comm_create_group is more efficient.

p227, 47

"on communicator comm, and tag tag between processes in group" --> "on communicator comm having the tag tag between processes in group"

p229, 5-7

remove and change "MPI_COMM_CREATE is useful ..." to "MPI_COMM_CREATE or MPI_COMM_CREATE_GROUP are useful". This section is just to differentiate comm_split from comm_create.

General comments / questions:

It's clear that all procs in group must call routine, but it seems ok for procs not in group to call the routine. This means that some procs not in group may call routine and other procs also not in group may not call the routine. Does this complicate life for MPI, e.g., context allocation, etc?

If processes not in group also call the function, can pt2pt comm be done between those procs? I'm guessing not since you don't know for sure which procs will also call the function (some might others might not).

For what duration must user ensure that tag is not used? I guess such a problem also exists with intercomm_create, but I think it's much more limited. Maybe add a statement that all procs in comm (or group) must syncronize to know when tag can and can't be used again.

my preference would be to specify that only procs in a non-empty group make the call.

@mpiforumbot
Copy link
Collaborator Author

Originally by jdinan on 2011-10-24 22:50:49 -0500


Attachment added: ticket_286_oct_2011.pdf (2290.4 KiB)
Updated draft of ticket #286 for the formal reading scheduled for Oct. 2011.

@mpiforumbot
Copy link
Collaborator Author

Originally by jdinan on 2011-10-24 22:54:49 -0500


Thanks for the feedback. I made these changes and attached an updated document to the ticket.

@mpiforumbot
Copy link
Collaborator Author

Originally by jdinan on 2011-12-08 11:00:38 -0600


Attachment added: ticket_286_jan_2012.pdf (2290.0 KiB)
Updated draft of ticket #286 for the formal reading scheduled for Jan. 2012.

@mpiforumbot
Copy link
Collaborator Author

Originally by jdinan on 2011-12-16 13:13:21 -0600


Attachment added: ticket_286_jan_2012.2.pdf (2290.4 KiB)
Updated draft of ticket #286 and #305 for the formal reading scheduled for Jan. 2012.

@mpiforumbot
Copy link
Collaborator Author

Originally by jsquyres on 2012-01-06 16:35:23 -0600


Minor questions:

  • "No cached information propagates from comm to newcomm" -- why?
  • "If the calling process is a member of the group given as the group argument..." -- is there any other way besides using MPI_GROUP_EMPTY for the calling process to not be a member of the group? Or are you just allowing non-group-members to make the call, which would then be a no-op?
  • Did you implement the separate space for tags as effectively removing an upper-end bit from your TAG_UB?

@mpiforumbot
Copy link
Collaborator Author

Originally by jdinan on 2012-01-08 18:31:11 -0600


Hi Jeff,

Thanks for the thorough reading and comments. Responses are inline below:

Replying to jsquyres:

Minor questions:

  • "No cached information propagates from comm to newcomm" -- why?

This is the same behavior as MPI_Comm_create.

  • "If the calling process is a member of the group given as the group argument..." -- is there any other way besides using MPI_GROUP_EMPTY for the calling process to not be a member of the group? Or are you just allowing non-group-members to make the call, which would then be a no-op?

We're allowing non-group members to make the call and defining it to be a no-op.

The motivation was to make MPI_Comm_create_group compatible with MPI_Comm_create. If the user calls MPI_Comm_create_group collectively from all processes in the parent communicator - even if some of those processes aren't in the new group - the result is the same as you would get when calling MPI_Comm_create. Processes in the group get a communicator and all others get COMM_NULL.

  • Did you implement the separate space for tags as effectively removing an upper-end bit from your TAG_UB?

That's the implementation we have in mind. We have an implementation that uses the point-to-point tag space, but haven't yet completed one with an independent tag space - we will have this ready before the vote (hopefully at the next meeting).

Thanks again,
~Jim.

@mpiforumbot
Copy link
Collaborator Author

Originally by jsquyres on 2012-01-09 07:03:06 -0600


ACK on all points -- all sounds very reasonable. Thanks!

I'm changing the implementation status to "Waiting" since the independent tag space stuff isn't done yet (which is kind of an important point of this one).

@mpiforumbot
Copy link
Collaborator Author

Originally by jdinan on 2012-03-02 16:51:47 -0600


Attachment added: ticket_286_305.patch (41.5 KiB)
MPICH2 Implementation of tickets 286 and 305 -- patch should be applied to trunk r9475

@mpiforumbot
Copy link
Collaborator Author

Originally by jdinan on 2012-03-06 12:09:20 -0600


Attachment added: ticket_286_305.pdf (2291.8 KiB)
Updated draft of tickets #286 and #305 for second vote in May, 2012.

@mpiforumbot
Copy link
Collaborator Author

Originally by jdinan on 2012-06-27 12:34:21 -0500


Attachment added: ticket_286_305_final.pdf (429.2 KiB)
Final draft of tickets #286 and #305.

@mpiforumbot
Copy link
Collaborator Author

Originally by gropp on 2012-07-18 14:21:24 -0500


Applied. However, shouldn't MPI_COMM_CREATE_GROUP accept an info argument, following the form of the other new communicator creation routines?

@mpiforumbot
Copy link
Collaborator Author

Originally by jdinan on 2012-07-18 14:43:50 -0500


I don't think we ever had a use case for the info argument, so we never added it. The info key in MPI_Comm_split_type was added to allow the user to specify machine topology hints.

@mpiforumbot
Copy link
Collaborator Author

Originally by smithbe on 2012-07-18 16:39:00 -0500


Verified the text in HEAD (r1428) matches the ticket text here.

@mpiforumbot
Copy link
Collaborator Author

Originally by jdinan on 2012-09-24 04:17:46 -0500


This change was included in MPI 3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant