New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMA Notification #59

Open
jdinan opened this Issue Sep 3, 2016 · 2 comments

Comments

Projects
None yet
1 participant
@jdinan
Copy link

jdinan commented Sep 3, 2016

Problem Statement

In passive target mode, notifying the target that data has been transmitted is currently inefficient. It requires sending additional messages after operations that are to be notified have been remotely completed.

Window Counter Solution 1: Sync-and-Notify

Addition of new "synchronize-and-notify" routines:

int MPI_Win_flush_notify(int rank, MPI_Win win);
int MPI_Win_unlock_notify(int rank, MPI_Win win);
int MPI_Win_flush_all_notify(MPI_Win win);

int MPI_Win_get_notify(MPI_Win win, long count);
int MPI_Win_set_notify(MPI_Win win, long count);
int MPI_Win_wait_notify(MPI_Win win, long geq_value);

A notification counter is associated with the window, and is incremented at the target after the given passive target epoch has completed at the target (i.e. data is visible to the target process). Get, set, and wait functions are provided to enable a process to query the number of notifications it has received.

Criticism: Since the notification is separate from communication operations, e.g. put-and-notify, this can require two separate operations, which will not improve performance.

Window Counter Solution 2: Op-and-Notify

Addition of new "communicate-and-notify" routines:

int MPI_Put_notify(..., MPI_Win win); /* Identical args as MPI_Put */
int MPI_Get_notify(... , MPI_Win win);
int MPI_Accumulate_notify(..., MPI_Win win);

int MPI_Win_get_notify(MPI_Win win, long count);
int MPI_Win_set_notify(MPI_Win win, long count);
int MPI_Win_wait_notify(MPI_Win win, long geq_value);

A notification counter is associated with the window, and is incremented at the target after the given RMA operation has completed at the target (i.e. data is visible to the target process). Get, set, and wait functions are provided to enable a process to query the number of notifications it has received.

Criticism: Only one counter per window.

Matched Notifications

This adds a "tag" to RMA operations and introduces target-side synchronization operations that query for operations matching a particular tag. Communication routines look as follows:

int MPI_Put_notify(void *origin_addr, int origin_count,
        MPI_Datatype origin_type, int target_rank,
        MPI_Aint target_disp, int target_count,
        MPI_Datatype target_type, MPI_Win win, int tag);
int MPI_Get_notify(void *origin_addr, int origin_count,
        MPI_Datatype origin_type, int target_rank,
        MPI_Aint target_disp, int target_count,
        MPI_Datatype target_type, MPI_Win win, int tag);

Synchronization APIs are as follows:

int MPI_Notify_init(MPI_Win win, int src_rank, int tag,
int expected_count, MPI_Request *request);
/*Functions already available in MPI*/
int MPI_Start(MPI_Request *request);
int MPI_Test(MPI_Request *request, int *flag, MPI_Status *status);
int MPI_Wait(MPI_Request *request, MPI_Status *status);

Positives: Most general proposal, enables arbitrary synchronization DAGs.
Negatives: Introduces tag matching to RMA and need to deal with unexpected synchronization events. For past discussion, see: 09-2015 -- RMA Notified Access Implementation Discussion.pdf

Memory Synchronization: Put-and-Nofity (OpenSHMEM Style)

Put and notify operations have been supported for a while in Cray SHMEM.
Recently they have been proposed for OpenSHMEM 1.5 (openshmem-org/specification#206, openshmem-org/specification#218, openshmem-org/specification#244). The API signature is as follows:

void shmem_put_signal[_nb](void *target, const void *source, size_t len, uint64_t *sig_target, uint64_t sig_val, int pe)

The sig_target location is updated after the update to target is visible. The sig_target location is checked locally using a shmem_wait_until operation or remotely using a shmem_atomic_fetch operation.

Positives: This is the only proposal that supports directly third-party producer-consumer relationships.
Negatives: Significantly expands the scope of the memory model and requires test/wait routines to be introduced.

References

@jdinan

This comment has been minimized.

Copy link

jdinan commented Sep 3, 2016

Comments copied from Trac:

  • An alternative RMA notification interface was presented by Torsten Hoefler at the September, 2014 meeting. Slides are posted here: http://meetings.mpi-forum.org/secretary/2014/09/slides/hoefler-blitz.pdf
  • The WG finds these two ideas interesting and is interested in seeing more on this topic. There is, hover, still skepticism about the value of this idea. Specific issues include:
    • Compelling use cases
    • Existing practice, for example in OpenSHMEM, including alternatives
    • Scalability analysis, particularly of flush_all_notify
    • Implementation and performance issues on unordered networks
    • Existing and expected support in high performance networks
  • Torsten has addressed some of these issues in http://spcl.inf.ethz.ch/Publications/.pdf/notified-access-extending-rma.pdf. His proposed API is not the same, but some of the issues are agnostic. In particular, the paper discusses compelling use cases (1), alternative implementations in MPI (2) and the implementation on Cray XC30 (5), which has a network that favors dynamic routing (4).
  • From the June 2015 Forum meeting: Need a clear use case and a discussion of why send-recv is not enough for this.
  • From the Sept 2015 Forum meeting, we discussed the implementation issues with Torsten's proposal. Many of the issues relate to the processing needed for the event queue. Multiple counters might work as an alternative that may be sufficient for examples such as the Cholesky tasking implementation, and be easier to implement with network HW offload. This leaves the question of how to determine the number of counters, and when are they determined. Some applications may need an number of counters (or other notification objects) that is not known when a window is created.
@jdinan

This comment has been minimized.

Copy link

jdinan commented Sep 17, 2018

Status update: Roughly the same as it was at the June 2015 meeting -- need a strong driver to introduce this new feature and a performance comparison to show that notified RMA performs better than other approaches (e.g. send/recv, active target, etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment