MPI_Comm_split_type functionality #287

mpiforumbot · 2016-07-24T17:06:10Z

Originally by balaji on 2011-08-25 19:25:55 -0500

Authors: MPI-3 Hybrid working group

Description

Creating communicators based on platform specific information such as shared memory capabilities can provide several benefits, especially in multi-core environments. We propose to add a new function to split a communicator based on a user-provided split type.

History

The original proposal to provide this functionality was provided by Ron Brightwell, which combined the communicator creator functionality with shared memory creation. This ticket deals with the communicator creation functionality. The shared memory creation functionality has been spawned off to a new ticket (ticket #284).

Proposed Solution

Define a new call MPI_Comm_split_type that splits a parent communicator based on a split_type argument. See the attached proposal for details.

Impact on Implementations

A trivial implementation that does not expose any shared memory capabilities is trivial. An implementation that does expose a shared memory communicator is relatively simple as well, since this functionality is internally already used in most MPI implementations. An implementation within MPICH2 is available for reference.

Impact on Applications / Users

None for current users. Adds a new function in a fully backward compatible manner.

Alternative Solutions

See past discussions in the Hybrid WG.

Entry for the Change Log

Added MPI_Comm_split_type.

mpiforumbot · 2016-07-24T17:06:13Z

Originally by balaji on 2011-08-25 19:26:47 -0500

Attachment added: context-v0.2.pdf (534.9 KiB)
Context chapter draft v0.2

mpiforumbot · 2016-07-24T17:06:13Z

Originally by balaji on 2011-08-25 19:29:03 -0500

The current draft of the context chapter, which contains a new routine MPI_Comm_split_type, has been uploaded. This routine provides a mechanism to split a communicator to form subcommunicators on which shared memory can be created.

mpiforumbot · 2016-07-24T17:06:14Z

Originally by balaji on 2011-08-26 02:32:51 -0500

Attachment added: context-v0.3.pdf (537.9 KiB)

mpiforumbot · 2016-07-24T17:06:14Z

Originally by dougmill on 2011-08-26 09:03:00 -0500

Some comments on v0.3, we can discuss at meeting.

The two predefined types seem a little disjoint. Maybe that is OK, but one relates directly to some concept of "shared" memory, the other has an implied concept of memory accessible to all ranks (endpoints) in the subcommunicator, and perhaps other implications as well. I guess I think the terms are not quite consistent. It's not really ranks/endpoints that have shared memory, it is the processes/threads that share various types of memory. I guess the endpoints proposal, and threads themselves, confuses the "hidden" relationship between processes/threads and ranks/endpoints. I'm thinking that these predefined types may not support a clean extension into various memory hierarchies that exist in numa-like systems, where various groups of processing elements share optimal access to regions/types of memory. Of course, NUMA placement is a matter of optimization, not necessity, while the tranditional concept of Shared Memory simply will not work with processes/threads that are outside of the domain (e.g. node). Even on systems that support global shared memory, though, there are performance reasons to keep the concept of "node local domain". I'll spend some time trying to think of an alternative set of types.

mpiforumbot · 2016-07-24T17:06:14Z

Originally by dougmill on 2011-08-29 07:50:59 -0500

As I thought more about this, I thought about the following domains, where each domain is a subset of the previous domain:

GLOBAL:: may not exist in all implementations, unless paltform supports global shared memory.

NODE:: Is this term universal? This would be like POSIX or SYSV "shared memory".

PROCESS:: All threads in a process "share" this memory. This might just be "normal heap".

PROCESSOR_GROUP*:: (better name needed) subset(s) of threads in a process, that share "optimal" access to some memory. This potentially represents multiple nested domains depending on the platform NUMA characteristics.

An implementation might add more domains, possibly between GLOBAL and NODE or more likely below PROCESS.

I think the other key point is that the calling thread of MPI_Comm_split_type drives how the domain (type) is processed. I guess that is already said, but the clarification being that it is the thread+endpoint that drive the selection (a thread must be attached to an endpoint to make an MPI call). Not just the endpoint (rank). This is a bit unusual in MPI, I think. This becomes most significant when dealing with sub-process memory domains, where clusters of cores have "near memory" and "far memory". Also, a thread might attach to an endpoint that is implicitly associated with a different memory domain, as there is no mechanism for a user to determine which threads shared memory domains with which endpoints. Perhaps this points out the need for another function for the endpoints proposal?

mpiforumbot · 2016-07-24T17:06:15Z

Originally by jhammond on 2011-08-29 14:04:42 -0500

Hi Doug,

Anything besides MPI_COMM_TYPE_SHM and MPI_COMM_TYPE_PROCESS seem to be what is referred to by this:

"Advice to implementors. Implementations can define their own types, or use the info argument, to assist in creating communicators that help expose platform-specific information to the application."

It will be impossible to clearly define all possible flavors of MPI_COMM_TYPE_NUMA_DOMAIN one can imagine. We can create even more mud by trying to define MPI_COMM_TYPE_ACCELERATOR (Note: I think this is a terrible idea).

It seems that MPI_COMM_TYPE_PROCESS might be slightly harder to define if MPI ranks are threads, not processes. Do we really mean MPI_COMM_TYPE_PROCESS or do we mean MPI_COMM_TYPE_RANK? Isn't the goal to have a communicator for endpoints associated with the same rank?

Best,

Jeff

Replying to dougmill:

As I thought more about this, I thought about the following domains, where each domain is a subset of the previous domain:

GLOBAL:: may not exist in all implementations, unless paltform supports global shared memory.

NODE:: Is this term universal? This would be like POSIX or SYSV "shared memory".

PROCESS:: All threads in a process "share" this memory. This might just be "normal heap".

PROCESSOR_GROUP*:: (better name needed) subset(s) of threads in a process, that share "optimal" access to some memory. This potentially represents multiple nested domains depending on the platform NUMA characteristics.

An implementation might add more domains, possibly between GLOBAL and NODE or more likely below PROCESS.

I think the other key point is that the calling thread of MPI_Comm_split_type drives how the domain (type) is processed. I guess that is already said, but the clarification being that it is the thread+endpoint that drive the selection (a thread must be attached to an endpoint to make an MPI call). Not just the endpoint (rank). This is a bit unusual in MPI, I think. This becomes most significant when dealing with sub-process memory domains, where clusters of cores have "near memory" and "far memory". Also, a thread might attach to an endpoint that is implicitly associated with a different memory domain, as there is no mechanism for a user to determine which threads shared memory domains with which endpoints. Perhaps this points out the need for another function for the endpoints proposal?

mpiforumbot · 2016-07-24T17:06:15Z

Originally by dougmill on 2011-08-29 14:23:43 -0500

I was not suggesting we define all types here, just that we have some consistency about the names and ensure that we've thought enough about how some implementers might extend it to be sure that it works.

I do not like "SHM" and "PROCESS", because they do not seem consistent. "NODE" and "PROCESS" seemed more consistent to me. Maybe I just need to know what these names refer to (what are the "units"). I was thinking "scope" of memory, so node-scoped and process-scoped seemed natural.

mpiforumbot · 2016-07-24T17:06:15Z

Originally by jhammond on 2011-08-29 14:36:42 -0500

NODE is problematic in the case where we can do load-store across the network ala SGI or Cray. Does it make sense to say that such a machine has only one node if one can use load-store across the entire machine?

Does it makes sense to say MPI_COMM_TYPE_PHYS_ADDR and MPI_COMM_TYPE_VIRT_ADDR, meaning the sets of ranks that can directly access the same physical and virtual address spaces, respectively? Ron B. is probably going to pwn this noob now :-)

mpiforumbot · 2016-07-24T17:06:16Z

Originally by dougmill on 2011-08-29 14:40:16 -0500

I was thinking that SGI/Cray situation was more of a GLOBAL scoped memory, yet another type that was implementation defined. Don't these systems still have NODE scoped memory, which I'd imagine is more efficient than GLOBAL?

mpiforumbot · 2016-07-24T17:06:16Z

Originally by balaji on 2011-08-30 13:39:22 -0500

Attachment added: context-v0.4.pdf (534.9 KiB)

mpiforumbot · 2016-07-24T17:06:17Z

Originally by balaji on 2011-08-30 13:40:57 -0500

I've attached a new version of the proposal with the changes discussed during the telecon today. Please read through the proposal to make sure everything looks OK.

mpiforumbot · 2016-07-24T17:06:17Z

Originally by balaji on 2011-10-05 09:57:21 -0500

Attachment added: context-v0.5.pdf (534.9 KiB)

mpiforumbot · 2016-07-24T17:06:17Z

Originally by balaji on 2011-10-05 10:03:16 -0500

I've attached a new version of the proposal based on the Forum feedback in Santorini.

mpiforumbot · 2016-07-24T17:06:18Z

Originally by jdinan on 2011-10-11 11:25:32 -0500

Attachment added: fulldoc-v0.5.pdf (2287.3 KiB)
This draft includes the full spec and adds the constants definition that was omitted in the chapter-only version.

mpiforumbot · 2016-07-24T17:06:18Z

Originally by jsquyres on 2011-10-27 09:03:00 -0500

Pavan -- what's the implementation status of this proposal?

mpiforumbot · 2016-07-24T17:06:19Z

Originally by balaji on 2011-10-27 10:22:16 -0500

We are working on the implementation and will have it ready before the second vote.

mpiforumbot · 2016-07-24T17:06:19Z

Originally by balaji on 2011-10-27 16:57:35 -0500

Attachment added: fulldoc-v0.6.pdf (3654.6 KiB)

mpiforumbot · 2016-07-24T17:06:20Z

Originally by balaji on 2011-10-27 17:02:54 -0500

Attached a new draft with a few ticket 0 changes.

These two were suggested at the Forum meeting.

Fixed a typo in the word "assigment" --> "assignment".
Changed "Communicator type constants" --> "Communicator split type constants".

This change needs to be discussed in the working group.

Removed the reference to MPI_WIN_ALLOCATE_SHARED since that is a separate ticket.

mpiforumbot · 2016-07-24T17:06:20Z

Originally by balaji on 2011-10-27 17:04:46 -0500

In the above changes suggested at the Forum, I forgot to mention that I included the ChangeLog entry as well.

mpiforumbot · 2016-07-24T17:11:21Z

Originally by balaji on 2011-10-27 17:07:09 -0500

An implementation of the MPI_Comm_split_type function is now publicly available in MPICH2 here: http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/trunk/ (please use r9071 or higher).

mpiforumbot · 2016-07-24T17:11:22Z

Originally by balaji on 2011-10-30 15:39:15 -0500

Updated the ticket description based on Rolf's suggestions at the Forum.

mpiforumbot · 2016-07-24T17:11:23Z

Originally by gropp on 2012-05-29 00:25:03 -0500

I assume that this does not go into the context chapter. If it does, I need a diff of the context chapter LaTeX file; it isn't feasible to work with the nearly 700 page full draft to extract the update.

mpiforumbot · 2016-07-24T17:11:23Z

Originally by balaji on 2012-05-30 02:15:22 -0500

This is supposed to go into the context chapter. I was asked to commit the changes into the approved branch. I'll do that soon and send out a note.

mpiforumbot · 2016-07-24T17:11:23Z

Originally by RolfRabenseifner on 2012-06-28 15:12:03 -0500

appLang committed (SVN 1206)

mpiforumbot · 2016-07-24T17:11:24Z

Originally by jsquyres on 2012-07-03 09:09:48 -0500

Rolf: the change log currently reads:

Added MPI_COMM_SPLIT_TYPE function and the communicator split type constand MPI_COMM_TYPE_SHARED.

But should read:

Added MPI_COMM_SPLIT_TYPE function and the communicator split type constan**t** MPI_COMM_TYPE_SHARED.

I committed the fix.

mpiforumbot · 2016-07-24T17:11:24Z

Originally by RolfRabenseifner on 2012-07-14 01:18:14 -0500

Bill already committed text to context.tex (before svn r1280),[[BR]]
I committed the new Fortran binding for MPI_COMM_SPLIT_TYPE in context.tex in svn r1281.

mpiforumbot · 2016-07-24T17:11:25Z

Originally by gropp on 2012-07-18 14:23:54 -0500

As noted, already committed to context chapter.

mpiforumbot · 2016-07-24T17:11:25Z

Originally by buntinas on 2012-07-18 16:28:07 -0500

Reviewed PDF.
-d

mpiforumbot · 2016-07-24T17:11:26Z

Originally by RolfRabenseifner on 2013-01-07 11:42:51 -0600

Since Sep. 21, 2012, this ticket is included in MPI-3.0 and the pdf is checked according to https://svn.mpi-forum.org/svn/mpi-forum-docs/trunk/meetings/2012-07-jul/mpi3-tickets.xlsx

Therefore, I set the priority to "Ticket complete".

mpiforumbot self-assigned this Jul 24, 2016

mpiforumbot added this to the 2012/01/09 California, USA milestone Jul 24, 2016

mpiforumbot added New routine(s) Ticket complete MPI 3.0 Text committed labels Jul 24, 2016

mpiforumbot added Passed and removed Ticket complete labels Jul 24, 2016

mpiforumbot closed this as completed Jul 24, 2016

mpiforumbot added Ticket complete and removed Passed labels Jul 24, 2016

This was referenced Jul 24, 2016

Shared Memory Windows #290

Closed

MPI_COMM_TYPE_NEIGHBORHOOD #297

Open

clarify MPI behavior when multiple MPI processes run in the same address space #310

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI_Comm_split_type functionality #287

MPI_Comm_split_type functionality #287

mpiforumbot commented Jul 24, 2016 •

edited

Loading

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

MPI_Comm_split_type functionality #287

MPI_Comm_split_type functionality #287

Comments

mpiforumbot commented Jul 24, 2016 • edited Loading

Description

History

Proposed Solution

Impact on Implementations

Impact on Applications / Users

Alternative Solutions

Entry for the Change Log

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016

mpiforumbot commented Jul 24, 2016 •

edited

Loading