Consistent use: Section 3.8.4 and 3.9. - p. 79 - Rationale for Start and Cancel #270

wesbland · 2020-02-19T22:30:06Z

Problem

There is a comment in MPI_Cancel about why a pointer to MPI_Request is passed. This same logic applies to MPI_Start, but is not mentioned there. Should it be for consistency (and they could cross-reference each other)?

Edit:
During the virtual meeting of 4th Nov 2020, @RolfRabenseifner correctly pointed out that the request parameter for the MPI_Start procedure is described as INOUT in the language independent specification, which means that the rationale that applies to MPI_Cancel cannot be applied verbatim to MPI_Start.

It is, arguably, an error that the request parameter for the MPI_Start procedure is described as INOUT in the language independent specification.

Suggested Fix

First, change the LIS description to specify the request as IN instead of INOUT.

Then,
copy the rationale from MPI_Cancel to the description of MPI_Start.

References

First attempted fix: https://github.com/mpi-forum/mpi-standard/pull/301 (closed)
Latest attempted fix: https://github.com/mpi-forum/mpi-standard/pull/619

wesbland · 2020-04-08T14:45:14Z

Probably an errata. Confirmation needed:

Editor - 0
Chapter committee - 3
Errata - 3
Full Proposal - 1

abouteiller · 2021-09-10T10:33:43Z

After reviewing the code, we have found that Open MPI actually uses request substituting during start, and has been doing so for many years without any user complaining. See https://github.com/open-mpi/ompi/blob/master/ompi/mca/pml/ob1/pml_ob1_start.c

The scenario where this is used is the following:

when starting a SEND, buffered mode sends are immediately complete (from the MPI layer perspective), while they continue to remain active from the MPI engine perspective.
This is achieved by substituting the original send request with a new (inactive) copy of the request, so that it can report MPI complete in wait, and be started again immediately if persistent;
Meanwhile the original send-request continues to be used internally to track internal completion of message fragments by the engine

Obviously this could be implemented some other way, but this is beside the point. The fox is out of the bag, and has been for many years. This code has been present in Open MPI for at least 5 years (possibly many more but I'm not going to use SVN).

Thus, the proposed change is incompatible with existing state of the practice. It would be a very bad idea to make it an errata, and we should maybe even reconsider altogether under the new information that this has been common practice in one of the major MPI implementation to use that feature for years.

Wee-Free-Scot · 2021-09-10T12:59:38Z

Oh.

Wee-Free-Scot · 2021-09-10T13:02:45Z

@abouteiller thanks for taking the time to find this implementation example. I agree that this code in Open MPI changes our perspective and restricts the scope of appropriate responses to the discrepancy between the standardised definitions of MPI_Start and MPI_Cancel.

tonyskjellum · 2021-09-12T16:25:53Z

All: I find the situation a tiny bit upsetting , so let me offer my best, calmer perspective after a few days. Because we have to be careful not to discover new features never intended in the standard by error, OR fully understand what the errors correction now does. It seems like not everyone thinks this is even in error. Which is legitimate … but an implementation need never do things this way… Marc and Bill et al would certainly have wanted to warn users about aliasing from the beginning of this were intended. They certainly spent a lot of time debating handles in MPI-1. 0) users alias requests and have no warning of this malleability of handles — are user codes failing nowadays because some MPIs do this? 1) I have not found a thread safety case that Dan could not defeat yet :-) to invalidate changing the handle —- MPI doesn’t let one touch a request in MPI in two threads without user level mutual exclusion … at least it always looks illegal… 2) I think malleability applies also to MPI_Test etc and to non persistent requests . Hence, it’s everywhere ! This indeed, if I am right, needs a careful study. So, if allowed, users evidently must not alias requests under the given rules. No such warning is in the standard that I could find . Are there any ? Multithreaded applications must critically lock requests between them to ensure the last returned call produced the latest value . That’s probably implied by illegality of simultaneity — but we have to double check all concurrent uses to be sure … probably ok here. What is the impact on tools, PMPI, etc… continuous translation / mapping between calls must be done from the time of first return of the handle. Any notion of serialization to help fault tolerance is potentially impacted … still thinking on that. Any notion of checkpoint restart of MPI state has to be checked :-) So, let’s consider the opposite solution : it’s always legal and users must not alias request handles ever. Sounds like it could be a good rule too! Will it break real application code? Not sure. If so, we are at a stalemate. I think we need to put a rule on through erratum for partitioned and persistent collective requests for MPI-4: either to state legal to update and no user aliasing of handles or the opposite, users may alias as has been done before in practice and without worry. Point to point persistent we can make the change, if agreed, for MPI-4.1. Either way, something may break :-( Am I missing any standard warning against request aliasing ? Regards, Tony Anthony Skjellum, PhD 205-807-4968 On Sep 10, 2021, at 8:59 AM, Dan Holmes ***@***.***> wrote: Oh. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

Wee-Free-Scot · 2022-07-05T11:14:21Z

@jdinan I'd like to review this issue in the HACC WG before the next voting meeting.

I just asked myself the question "will a replacement request handle need to be re-exported for use on a device?"

MPI_Request req;
MPI_Psend_init(..., &req); // MPI creates request A and returns a handle to it
MPI_Prequest_create(req, &preq); // MPI exports request A for use on a device
loop {
    MPI_Start(&req); // MPI creates request B and returns a handle to it (request A is consumed by MPI, preq is stale)
    :
    <<<kernel code>>>{ MPI_Pready(..., preq); } // error or no-op or UB? use of stale handle to request A
    :
    MPI_Wait(&req); // deadlock because user is waiting for request B to complete but there are no call to MPI_Pready yet
}
MPI_Request_free(&req); // only executes when loop doesn't

I'm thinking the HACC WG needs to get behind the side of this argument that prevents replacement of the request during MPI_Start or needs to point out that the request must be re-exported after every call to MPI_Start.

wesbland added editor pass wg-p2p Point-to-Point Working Group and removed mpi-4.x labels Feb 19, 2020

wesbland added this to To Be Classified in Editor Pass Status Apr 13, 2020

wesbland assigned dholmes-epcc-ed-ac-uk Apr 22, 2020

wesbland added the Chapter Committee Change Changes to be made by the respective chapter committee(s) label Apr 22, 2020

wesbland moved this from To Be Classified to Awaiting Implementation in Editor Pass Status Apr 22, 2020

wesbland added this to To Do in MPI 4.0 Ratification Oct 21, 2020

dholmes-epcc-ed-ac-uk moved this from Awaiting Implementation to Awaiting Informal Reading in Editor Pass Status Oct 21, 2020

dholmes-epcc-ed-ac-uk moved this from To Do to In Progress (Chapter Committee Changes) in MPI 4.0 Ratification Oct 21, 2020

wesbland added mpi-4.1 For inclusion in the MPI 4.1 standard and removed mpi-4.0 labels Nov 4, 2020

wesbland removed this from In Progress (Chapter Committee Changes) in MPI 4.0 Ratification Nov 4, 2020

dholmes-epcc-ed-ac-uk added errata Errata items for the previous MPI Standard and removed Chapter Committee Change Changes to be made by the respective chapter committee(s) editor pass labels Nov 5, 2020

dholmes-epcc-ed-ac-uk removed this from Awaiting Informal Reading in Editor Pass Status Nov 5, 2020

wesbland added this to To Do in MPI 4.1 Jun 9, 2021

wesbland moved this from To Do to In Progress in MPI 4.1 Jul 21, 2021

wesbland assigned Wee-Free-Scot and unassigned dholmes-epcc-ed-ac-uk Jul 21, 2021

Wee-Free-Scot added this to the September 2021 milestone Aug 18, 2021

wesbland removed this from the September 2021 milestone Sep 13, 2021

wesbland added this to the December 2021 milestone Sep 13, 2021

Wee-Free-Scot mentioned this issue Jul 5, 2022

Address Section 2.3, MPI_Start() semantics of the INOUT and Language Bindings #439

Open

Wee-Free-Scot added mpi-5 For inclusion in the MPI 5.0 standard and removed mpi-4.1 For inclusion in the MPI 4.1 standard labels Jul 20, 2022

wesbland removed this from In Progress in MPI 4.1 Oct 19, 2022

wesbland added this to To Do in MPI 5.0 via automation Oct 19, 2022

Wee-Free-Scot mentioned this issue Jun 14, 2023

MPI_Cancel has non-const IN argument pointer #707

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent use: Section 3.8.4 and 3.9. - p. 79 - Rationale for Start and Cancel #270

Consistent use: Section 3.8.4 and 3.9. - p. 79 - Rationale for Start and Cancel #270

wesbland commented Feb 19, 2020 •

edited by Wee-Free-Scot

Loading

wesbland commented Apr 8, 2020

abouteiller commented Sep 10, 2021

Wee-Free-Scot commented Sep 10, 2021

Wee-Free-Scot commented Sep 10, 2021

tonyskjellum commented Sep 12, 2021 via email

Wee-Free-Scot commented Jul 5, 2022

Consistent use: Section 3.8.4 and 3.9. - p. 79 - Rationale for Start and Cancel #270

Consistent use: Section 3.8.4 and 3.9. - p. 79 - Rationale for Start and Cancel #270

Comments

wesbland commented Feb 19, 2020 • edited by Wee-Free-Scot Loading

Problem

Suggested Fix

References

wesbland commented Apr 8, 2020

abouteiller commented Sep 10, 2021

Wee-Free-Scot commented Sep 10, 2021

Wee-Free-Scot commented Sep 10, 2021

tonyskjellum commented Sep 12, 2021 via email

Wee-Free-Scot commented Jul 5, 2022

wesbland commented Feb 19, 2020 •

edited by Wee-Free-Scot

Loading