Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocating receive and freeing send #32

Closed
jeffhammond opened this issue Jan 21, 2016 · 6 comments
Closed

Allocating receive and freeing send #32

jeffhammond opened this issue Jan 21, 2016 · 6 comments
Assignees
Labels
wg-p2p Point-to-Point Working Group

Comments

@jeffhammond
Copy link
Member

jeffhammond commented Jan 21, 2016

This was https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/464

Motivation

See http://meetings.mpi-forum.org/secretary/2013/09/slides/jsquyres-arecv.pdf for now.

Prior Art

Ownership Passing (OP)

Fine-Grain MPI

Functions

int MPI_Arecv(MPI_Datatype datatype, int source, int tag, MPI_Comm comm, 
              void* outbuf, MPI_Status * status)
int MPI_Iarecv(MPI_Datatype datatype, int source, int tag, MPI_Comm comm, 
               void* outbuf, MPI_Request * request)

Note that, like MPI_Alloc_mem, outbuf is actually a void**, not a void*.

There is no count argument for MPI_(I)ARECV. One uses MPI_GET_COUNT to obtain that information. Unlike the previous proposal (by Squyres and Goodell), the function signatures here include the buffer output argument in order to obviate the need for a new function (such as MPI_STATUS_GET_BUFFER) for this purpose).

int MPI_Fsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm)
int MPI_Ifsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm, MPI_Request * request)

There are equivalent ready functions.

int MPI_Rfsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm)
int MPI_Irfsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm, MPI_Request * request)

There are equivalent synchronous functions.

int MPI_Sfsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm)
int MPI_Isfsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm, MPI_Request * request)

Semantics

The following code ''approximates'' what a naive implementation of MPI_ARECV might look like.

int MPI_Arecv(MPI_Datatype datatype, int source, int tag, MPI_Comm comm, 
              void* outbuf, MPI_Status * status)
{
  /* To be thread-safe, we would need to use Mprobe. */
  MPI_Status status;
  MPI_Probe(source, tag, comm, &status);

  int count;
  MPI_Get_count(&status, datatype, &count);

  /* To be fully general, we would need to use Type_get_extent. */
  int typesize;
  MPI_Type_size(datatype, &typesize);

  MPI_Aint bytes = (MPI_Aint)typesize * (MPI_Aint)count;

  void * buffer;
  MPI_Alloc_mem(bytes, MPI_INFO_NULL, &buffer);

  MPI_Recv(buffer, count, datatype, source, tag, comm, status);

  outbuf = buffer;

  return MPI_SUCCESS;
}

The following code ''approximates'' what a naive implementation of MPI_FSEND might look like.

int MPI_Fsend(void* inbuf, int count, MPI_Datatype datatype, int dest, int tag, 
              MPI_Comm comm)
{
  MPI_Send(inbuf, count, datatype, dest, tag, comm);
  MPI_Free_mem(inbuf);
  return MPI_SUCCESS;
}
@jeffhammond jeffhammond added not ready wg-p2p Point-to-Point Working Group labels Jan 21, 2016
@jeffhammond jeffhammond self-assigned this Jan 21, 2016
@bosilca
Copy link
Member

bosilca commented Jan 21, 2016

For more prior work, the Multicore Communication API

@tonyskjellum
Copy link

tonyskjellum commented Jan 21, 2016

Jeff, we were doing this in message passing systems in 1980s on sequent symmetry and using the message passing model from the reactive kernel and cosmic environment - when we proposed these semantics in MPI-1, they were rejected as troublesome for Fortran -- one of the standard disqualifiers :-)

Zipcode did these semantics too...

I know that is much earlier stuff but it should be clear this is 30 year old practice we are belatedly finally considering again for mpi.

@jeffhammond
Copy link
Member Author

@tonyskjellum MPI-1 lacks MPI_Alloc_mem and MPI_Free_mem, which are necessary for this feature to be both valuable and portable under conservative assumptions. A portable implementation that delivers no benefit is easy, of course, but that's not the point.

Are there Fortran issues that are not addressed by MPI_Alloc_mem and MPI_Free_mem and ASYNCHRONOUS?

@wgropp
Copy link

wgropp commented Jan 21, 2016

When MPI-1 was developed, Fortran 90 was too new (few good compilers) and the POINTER feature too limited at the time. With Fortran 2008, its possible to define standard-conforming routines for these operations, so the Fortran issue is no longer present.

@dholmes-epcc-ed-ac-uk
Copy link
Member

Feedback from Sept 2017 face-to-face meeting:

Torsten: referenced papers talk about irregular applications that pack/unpack data and pass ownership of the packed buffer, not the non-contiguous memory used by the calculation code.
Torsten: MPI data-types breaks the clean ownership passing semantic and disrupts the orthogonality of the MPI Standard.
George: this works well at the OS level, smallest unit of ownership there is a whole memory page, no non-contiguous memory sections.
Rich: can the ARecv just do what the user asks for, even if that is bad?
Hubert: requiring arbitrary allocation at the receiver breaks the fixed resource usage per message restriction and is therefore an "unsafe" programming style.
George: could require that user allocates and attaches a buffer for Arecv, in a similar manner to Bsend.

Dan: technically possible to define various options, but is there a use-case that we can use to drive/justify the design choices?

@jeffhammond
Copy link
Member Author

It is my understanding that the Forum decided to cease working on this, which I think is the right decision given that it is impractical/impossible to support noncontiguous datatypes in a manner consistent with the existing features and MPI-3 shared memory provides an acceptable alternative to OP for zero-copy interprocess communication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wg-p2p Point-to-Point Working Group
Projects
None yet
Development

No branches or pull requests

5 participants