Allocating receive and freeing send #32

jeffhammond · 2016-01-21T16:59:21Z

This was https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/464

Motivation

See http://meetings.mpi-forum.org/secretary/2013/09/slides/jsquyres-arecv.pdf for now.

Prior Art

Ownership Passing (OP)

Andrew Friedley, Torsten Hoefler, Greg Bronevetsky, Andrew Lumsdaine, and Ching-Chen Ma. 2013. Ownership passing: efficient distributed memory programming on multi-core systems. SIGPLAN Not. 48, 8 (February 2013), 177-186. http://doi.acm.org/10.1145/2517327.2442534
Andrew Friedley, Torsten Hoefler, Greg Bronevetsky, Andrew Lumsdaine, and Ching-Chen Ma. 2013. Ownership passing: efficient distributed memory programming on multi-core systems. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '13). ACM, New York, NY, USA, 177-186. http://doi.acm.org/10.1145/2442516.2442534
The full gamut of Andrew Friedley's work on OP is hosted on https://code.google.com/p/hmpi/ (the mutable fork is https://github.com/jeffhammond/hmpi).
LINX, according to one of the Intel MPI team members.
Multicore Communication API, according to @bosilca.

Fine-Grain MPI

FG-MPI supports MPIX_Z(send,recv) and MPIX_Iz(send,recv), which do zero-copy message passing between processes located in the same address space (as determined by MPIX_Get_collocated_(size,startrank)). http://www.cs.ubc.ca/~humaira/docs/fgmpi_userguide.pdf

Functions

int MPI_Arecv(MPI_Datatype datatype, int source, int tag, MPI_Comm comm, 
              void* outbuf, MPI_Status * status)
int MPI_Iarecv(MPI_Datatype datatype, int source, int tag, MPI_Comm comm, 
               void* outbuf, MPI_Request * request)

Note that, like MPI_Alloc_mem, outbuf is actually a void**, not a void*.

There is no count argument for MPI_(I)ARECV. One uses MPI_GET_COUNT to obtain that information. Unlike the previous proposal (by Squyres and Goodell), the function signatures here include the buffer output argument in order to obviate the need for a new function (such as MPI_STATUS_GET_BUFFER) for this purpose).

int MPI_Fsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm)
int MPI_Ifsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm, MPI_Request * request)

There are equivalent ready functions.

int MPI_Rfsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm)
int MPI_Irfsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm, MPI_Request * request)

There are equivalent synchronous functions.

int MPI_Sfsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm)
int MPI_Isfsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm, MPI_Request * request)

Semantics

The following code ''approximates'' what a naive implementation of MPI_ARECV might look like.

int MPI_Arecv(MPI_Datatype datatype, int source, int tag, MPI_Comm comm, 
              void* outbuf, MPI_Status * status)
{
  /* To be thread-safe, we would need to use Mprobe. */
  MPI_Status status;
  MPI_Probe(source, tag, comm, &status);

  int count;
  MPI_Get_count(&status, datatype, &count);

  /* To be fully general, we would need to use Type_get_extent. */
  int typesize;
  MPI_Type_size(datatype, &typesize);

  MPI_Aint bytes = (MPI_Aint)typesize * (MPI_Aint)count;

  void * buffer;
  MPI_Alloc_mem(bytes, MPI_INFO_NULL, &buffer);

  MPI_Recv(buffer, count, datatype, source, tag, comm, status);

  outbuf = buffer;

  return MPI_SUCCESS;
}

The following code ''approximates'' what a naive implementation of MPI_FSEND might look like.

int MPI_Fsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm)
{
  MPI_Send(inbuf, count, datatype, dest, tag, comm);
  MPI_Free_mem(inbuf);
  return MPI_SUCCESS;
}

The text was updated successfully, but these errors were encountered:

bosilca · 2016-01-21T17:27:06Z

For more prior work, the Multicore Communication API

tonyskjellum · 2016-01-21T17:49:16Z

Jeff, we were doing this in message passing systems in 1980s on sequent symmetry and using the message passing model from the reactive kernel and cosmic environment - when we proposed these semantics in MPI-1, they were rejected as troublesome for Fortran -- one of the standard disqualifiers :-)

Zipcode did these semantics too...

I know that is much earlier stuff but it should be clear this is 30 year old practice we are belatedly finally considering again for mpi.

jeffhammond · 2016-01-21T21:29:42Z

@tonyskjellum MPI-1 lacks MPI_Alloc_mem and MPI_Free_mem, which are necessary for this feature to be both valuable and portable under conservative assumptions. A portable implementation that delivers no benefit is easy, of course, but that's not the point.

Are there Fortran issues that are not addressed by MPI_Alloc_mem and MPI_Free_mem and ASYNCHRONOUS?

wgropp · 2016-01-21T21:53:45Z

When MPI-1 was developed, Fortran 90 was too new (few good compilers) and the POINTER feature too limited at the time. With Fortran 2008, its possible to define standard-conforming routines for these operations, so the Fortran issue is no longer present.

dholmes-epcc-ed-ac-uk · 2017-09-21T19:50:35Z

Feedback from Sept 2017 face-to-face meeting:

Torsten: referenced papers talk about irregular applications that pack/unpack data and pass ownership of the packed buffer, not the non-contiguous memory used by the calculation code.
Torsten: MPI data-types breaks the clean ownership passing semantic and disrupts the orthogonality of the MPI Standard.
George: this works well at the OS level, smallest unit of ownership there is a whole memory page, no non-contiguous memory sections.
Rich: can the ARecv just do what the user asks for, even if that is bad?
Hubert: requiring arbitrary allocation at the receiver breaks the fixed resource usage per message restriction and is therefore an "unsafe" programming style.
George: could require that user allocates and attaches a buffer for Arecv, in a similar manner to Bsend.

Dan: technically possible to define various options, but is there a use-case that we can use to drive/justify the design choices?

jeffhammond · 2017-09-22T23:58:21Z

It is my understanding that the Forum decided to cease working on this, which I think is the right decision given that it is impractical/impossible to support noncontiguous datatypes in a manner consistent with the existing features and MPI-3 shared memory provides an acceptable alternative to OP for zero-copy interprocess communication.

jeffhammond added not ready wg-p2p Point-to-Point Working Group labels Jan 21, 2016

jeffhammond self-assigned this Jan 21, 2016

jeffhammond mentioned this issue Feb 22, 2016

use zero-copy send-recv in FG-MPI ParRes/Kernels#83

Closed

jeffhammond closed this as completed Sep 22, 2017

dholmes-epcc-ed-ac-uk removed the not ready label Sep 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocating receive and freeing send #32

Allocating receive and freeing send #32

jeffhammond commented Jan 21, 2016 •

edited

Loading

bosilca commented Jan 21, 2016

tonyskjellum commented Jan 21, 2016 •

edited by jeffhammond

Loading

jeffhammond commented Jan 21, 2016

wgropp commented Jan 21, 2016 •

edited by jeffhammond

Loading

dholmes-epcc-ed-ac-uk commented Sep 21, 2017

jeffhammond commented Sep 22, 2017

Allocating receive and freeing send #32

Allocating receive and freeing send #32

Comments

jeffhammond commented Jan 21, 2016 • edited Loading

Motivation

Prior Art

Ownership Passing (OP)

Fine-Grain MPI

Functions

Semantics

bosilca commented Jan 21, 2016

tonyskjellum commented Jan 21, 2016 • edited by jeffhammond Loading

jeffhammond commented Jan 21, 2016

wgropp commented Jan 21, 2016 • edited by jeffhammond Loading

dholmes-epcc-ed-ac-uk commented Sep 21, 2017

jeffhammond commented Sep 22, 2017

jeffhammond commented Jan 21, 2016 •

edited

Loading

tonyskjellum commented Jan 21, 2016 •

edited by jeffhammond

Loading

wgropp commented Jan 21, 2016 •

edited by jeffhammond

Loading