-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent use: Section 3.8.4 and 3.9. - p. 79 - Rationale for Start and Cancel #270
Comments
Probably an errata. Confirmation needed:
|
After reviewing the code, we have found that Open MPI actually uses request substituting during start, and has been doing so for many years without any user complaining. See https://github.com/open-mpi/ompi/blob/master/ompi/mca/pml/ob1/pml_ob1_start.c The scenario where this is used is the following:
Obviously this could be implemented some other way, but this is beside the point. The fox is out of the bag, and has been for many years. This code has been present in Open MPI for at least 5 years (possibly many more but I'm not going to use SVN). Thus, the proposed change is incompatible with existing state of the practice. It would be a very bad idea to make it an errata, and we should maybe even reconsider altogether under the new information that this has been common practice in one of the major MPI implementation to use that feature for years. |
Oh. |
@abouteiller thanks for taking the time to find this implementation example. I agree that this code in Open MPI changes our perspective and restricts the scope of appropriate responses to the discrepancy between the standardised definitions of MPI_Start and MPI_Cancel. |
All: I find the situation a tiny bit upsetting , so let me offer my best, calmer perspective after a few days. Because we have to be careful not to discover new features never intended in the standard by error, OR fully understand what the errors correction now does. It seems like not everyone thinks this is even in error. Which is legitimate … but an implementation need never do things this way… Marc and Bill et al would certainly have wanted to warn users about aliasing from the beginning of this were intended. They certainly spent a lot of time debating handles in MPI-1.
0) users alias requests and have no warning of this malleability of handles — are user codes failing nowadays because some MPIs do this?
1) I have not found a thread safety case that Dan could not defeat yet :-) to invalidate changing the handle —- MPI doesn’t let one touch a request in MPI in two threads without user level mutual exclusion … at least it always looks illegal…
2) I think malleability applies also to MPI_Test etc and to non persistent requests . Hence, it’s everywhere !
This indeed, if I am right, needs a careful study.
So, if allowed, users evidently must not alias requests under the given rules. No such warning is in the standard that I could find . Are there any ?
Multithreaded applications must critically lock requests between them to ensure the last returned call produced the latest value . That’s probably implied by illegality of simultaneity — but we have to double check all concurrent uses to be sure … probably ok here.
What is the impact on tools, PMPI, etc… continuous translation / mapping between calls must be done from the time of first return of the handle.
Any notion of serialization to help fault tolerance is potentially impacted … still thinking on that. Any notion of checkpoint restart of MPI state has to be checked :-)
So, let’s consider the opposite solution : it’s always legal and users must not alias request handles ever. Sounds like it could be a good rule too! Will it break real application code? Not sure. If so, we are at a stalemate.
I think we need to put a rule on through erratum for partitioned and persistent collective requests for MPI-4: either to state legal to update and no user aliasing of handles or the opposite, users may alias as has been done before in practice and without worry. Point to point persistent we can make the change, if agreed, for MPI-4.1. Either way, something may break :-(
Am I missing any standard warning against request aliasing ?
Regards,
Tony
Anthony Skjellum, PhD
205-807-4968
On Sep 10, 2021, at 8:59 AM, Dan Holmes ***@***.***> wrote:
Oh.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
@jdinan I'd like to review this issue in the HACC WG before the next voting meeting. I just asked myself the question "will a replacement request handle need to be re-exported for use on a device?" MPI_Request req;
MPI_Psend_init(..., &req); // MPI creates request A and returns a handle to it
MPI_Prequest_create(req, &preq); // MPI exports request A for use on a device
loop {
MPI_Start(&req); // MPI creates request B and returns a handle to it (request A is consumed by MPI, preq is stale)
:
<<<kernel code>>>{ MPI_Pready(..., preq); } // error or no-op or UB? use of stale handle to request A
:
MPI_Wait(&req); // deadlock because user is waiting for request B to complete but there are no call to MPI_Pready yet
}
MPI_Request_free(&req); // only executes when loop doesn't I'm thinking the HACC WG needs to get behind the side of this argument that prevents replacement of the request during |
Problem
There is a comment in MPI_Cancel about why a pointer to MPI_Request is passed. This same logic applies to MPI_Start, but is not mentioned there. Should it be for consistency (and they could cross-reference each other)?
Edit:
During the virtual meeting of 4th Nov 2020, @RolfRabenseifner correctly pointed out that the request parameter for the MPI_Start procedure is described as INOUT in the language independent specification, which means that the rationale that applies to MPI_Cancel cannot be applied verbatim to MPI_Start.
It is, arguably, an error that the request parameter for the MPI_Start procedure is described as INOUT in the language independent specification.
Suggested Fix
First, change the LIS description to specify the request as IN instead of INOUT.
Then,
copy the rationale from MPI_Cancel to the description of MPI_Start.
References
First attempted fix: https://github.com/mpi-forum/mpi-standard/pull/301 (closed)
Latest attempted fix: https://github.com/mpi-forum/mpi-standard/pull/619
The text was updated successfully, but these errors were encountered: