You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I propose the following idea for a future MPI standard: persistent point-to-point collectives. The goal is to provide a flexible interface for pre-defining nearest-neighbor-like communication that allows an MPI implementation to pay most setup costs at request-creation time and to perform the communication pattern more efficiently.
Here is a straw-man API.
MPI_SEND_ADD(buf, count, datatype, dest, tag, request)
Adds a non-blocking send operation to a persistent request. Multiple operations can be added to the same request. This call would be local.
MPI_RECV_ADD(buf, count, datatype, source, tag, request)
Adds a non-blocking recv operation to a persistent request. Multiple operations can be added to the same request. This call would be local.
MPI_REQUEST_INIT(comm, request)
Makes a persistent point-to-point collective request available for use with MPI_START and MPI_WAIT. The resulting request would function like a persistent collective request. This call should come after all the ADD calls. It would be collective across the communicator.
A single persistent point-to-point collective request with MPI_START and MPI_WAIT would behave like the analogous array of persistent point-to-point requests with MPI_STARTALL and MPI_WAITALL, but with the following restrictions.
The destinations, sources, and tags of the sends and receives would all be required to match globally at INIT time.
The MPI_STATUS returned by MPI_WAIT would only support fields supported by other persistent collectives.
Then the following optimizations could all happen at INIT time.
Global matchings of sends and receives.
Registration of buffers for RDMA.
Allocation of resources for efficient synchronization and data transfers.
This API could have the following advantages over existing non-blocking and persistent point-to-point communication.
Better communication performance.
The potential to check for deadlock or mismatched messages at INIT time.
This API could have the following advantages over persistent neighborhood collectives, while maintaining similar opportunity for performance.
Simpler-to-understand construction of requests, particularly when refactoring existing point-to-point code. It would support building a request out of familiar sends and receives instead of topology constructors.
The flexibility to use multiple buffers, instead of requiring single send and receive buffers.
No need to create a new communicator, thus avoiding the potential consumption of limited resources that a separate communicator might require.
Please forgive me if the MPI Forum has already investigated similar ideas.
The text was updated successfully, but these errors were encountered:
Trey, how would this compare to MPI neighbor collectives, particularly if creating new topologies on which to communicate didn't require communicator creation?
If persistent neighborhood collectives could take a topology instead of a communicator, I think that two of the three advantages of this proposal still remain.
Simpler-to-understand construction of requests, particularly when refactoring existing point-to-point code. It would support building a request out of familiar sends and receives instead of topology constructors.
The flexibility to use multiple buffers, instead of requiring single send and receive buffers.
I propose the following idea for a future MPI standard: persistent point-to-point collectives. The goal is to provide a flexible interface for pre-defining nearest-neighbor-like communication that allows an MPI implementation to pay most setup costs at request-creation time and to perform the communication pattern more efficiently.
Here is a straw-man API.
MPI_SEND_ADD(buf, count, datatype, dest, tag, request)
Adds a non-blocking send operation to a persistent request. Multiple operations can be added to the same request. This call would be local.
MPI_RECV_ADD(buf, count, datatype, source, tag, request)
Adds a non-blocking recv operation to a persistent request. Multiple operations can be added to the same request. This call would be local.
MPI_REQUEST_INIT(comm, request)
Makes a persistent point-to-point collective request available for use with MPI_START and MPI_WAIT. The resulting request would function like a persistent collective request. This call should come after all the ADD calls. It would be collective across the communicator.
A single persistent point-to-point collective request with MPI_START and MPI_WAIT would behave like the analogous array of persistent point-to-point requests with MPI_STARTALL and MPI_WAITALL, but with the following restrictions.
Then the following optimizations could all happen at INIT time.
This API could have the following advantages over existing non-blocking and persistent point-to-point communication.
This API could have the following advantages over persistent neighborhood collectives, while maintaining similar opportunity for performance.
Please forgive me if the MPI Forum has already investigated similar ideas.
The text was updated successfully, but these errors were encountered: