-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'Logically concurrent' isn't #117
Comments
This issue has a few layers of principles or definitions that without explicit consensus, discussions can easily devolve into cross-talking. May I suggest the following points for potential consensus?
Sorry that I mixed in my opinions within each point. Also, I edited heavily as I re-read my own comments. |
Defining that MPI operations separated by an infinite amount of wall clock time are "logically concurrent" (and therefore can be arbitrarily ordered in reality) adds no clarity to the standard, serves no practical purpose, and creates confusion about how app developers are supposed to write coherent multi-threaded MPI applications. |
There is no "infinite" amount of wall clock. "infinite" only exist logically and the sentence "operations separated by an infinite time" is the same as "logically in-concurrent", which requires definition -- back to square one. Do you want to refine your "infinite" to a finite one, 1 second or 1 minute? PS: of course the question is a trap. There is no logical distinction between 1 second or 1 minute or 1 femtosecond. Logical concurrency as in its name is logical; physical time separation is irrelevant. We can't really define logical concurrency to physical one -as that is ultimately impossible, which only leaves the choice of completely separating "logical concurrency" from the physical one. |
A different and more relevant question is: Can we come up with a realistic example that the order of In the first example, since the code is willing to call |
If the parallel region actually happens, then |
@mhoemmen I agree that PS: on further thought, |
@mhoemmen
#pragma omp parallel num_threads(1000)
{
#pragma omp critical
{
printf(“you are wrong\n”);
}
}
A more realistic example might involve num_threads(2) and a send-recv pair
of 100MB.
|
Sorry @mhoemmen, I think I was agreeing you in regards to your disagreement
with the previous post. I’m not very good at parsing GitHub comments using
the email interface.
|
@jeffhammond It's cool :D |
@hzhou The current MPI specification text currently allows for an unbounded / infinite amount of time to pass between MPI operations in different threads, and yet will allow the implementation to re-order them. That is flat-out terrible. It does not matter whether you agree with the style of the two examples that were provided. They are valid MPI applications, and demonstrate the issue clearly. You can continue to argue that the text's current definition of "logically concurrent" is fine/good, but that's just academic. The definition in the standard adds no clarity, serves no practical purpose, and creates confusion about how app developers are supposed to write coherent multi-threaded MPI applications. |
If the two threads were assigned completely independent communication resources (HW & SW) and their receivers were also independent, then enforcing any ordering that spans both threads adds overhead that could, in theory, be avoided. The messages from really/properly concurrent threads would need sequence numbers to record the order at the sender MPI process, then after taking independent non-deterministic routes (permitted in either case), the receiver must enforce the original ordering using the sequence numbers to re-create the sender MPI process order in the matching queue. The current text permits that "re-create ordering" overhead to be avoided by allowing logically concurrent messages to be delivered in any order. It has the side-effect of removing any guarantee of ordering, even in pathological cases, such as the examples given. My understanding is that this was known and intended (both the potential optimisation and the side-effect) by the original authors of this text (but I might be wrong). Congestion control could add an arbitrary holding delay to one network route - why should other (concurrent/independent) messages be delayed, even if they left the sender later than the held message? The API design decision here is a trade-off between easy-to-use but lower performance (enforce the order) and hard-to-use but better performance (don't enforce the order). Easy/hard-to-use is subjective and based on a clear explanation of the intended/expected behaviour. Higher/lower performance ought to be objectively measurable. |
In the face-to-face meeting, @schulzm and I decided that:
Thus, I agree with two thirds of @jsquyres' summary:
|
Despite complete different wording, I believe @dholmes-epcc-ed-ac-uk and I have the same understanding (of the problem). On that understanding, my personal opinion is heavily leaning to his first option, but my earlier post is mostly trying to clarify that understanding. |
No, it is not a correct interpretation. (Perhaps it is and I have simply misinterpreted your question.)
|
@dholmes-epcc-ed-ac-uk I meant the same thing -- your sentence may be clearer. I meant that the option for MPI Standard text is to specify an implementation detail rather than a concept description (with logical concurrency or infinite wall time). By "enqueue" I meant to record the order at send time. It could be a literal queue or simply sequence number or some other mechanism that ensures the ability to restore the order -- if there is ordering, there is a meta-queue. However, I don't see anyway to get away from the synchronization. The best we can do is to make the global sequence number or queue atomic, which is more strict than simply thread safety. It is the first time that I learned all MPI function calls are "atomic". I guess I have to accept it or it is another discussion. EDIT: I referenced the text and it says thread-safe, not atomic. Correct me if I am wrong: Thread-safe can be interleaved; atomic strictly can't. It matters when we are talking about ordering where atomic the ordering is well-defined and in an inter-leaved situation, the ordering need be defined. Whether do we need define such ordering and if yes, how to define such ordering is the current point of discussion. |
This seems intuitively false: consider a multi-core socket, a multi-socket node, a multi-rail NIC, a multi-route fabric - it's turtles all the way down. Also, even if the two sender threads are multiplexed through a single bottleneck at some point during transmission, this misses the point of the ordering rule, which talks about message matching order rather than injection, transmission, or delivery order. |
Personally, I think the ordering rule leads inexorably towards channels or streams, which are unastonishingly ordered by definition. However, that is another story. |
"Turtles all the way down" leads to "ultimately there is never physical concurrency" right? If we discuss on the abstract level -- such as turtles all the way down, then the discussion will never end or agree -- depending on which turtle we take the pause. If we abort the philosophical discussion at all and simply define our terms on a technical level, it is will be definite. Defining "concurrency" based on distinct threads or not is one such approaches. Defining the behavior by requiring MPI record the ordering at send invocation time -- the starting point of the function call is another option -- still has ambiguity of which point but enough for our example cases. |
At the moment, point-to-point send and receive in MPI are half-channel operations. A channel is a FIFO queue, so send and receive operations contribute to meta-half-queues. Each {send-thread, receive-thread} pairing constitutes a different meta-queue. For each thread, all of its meta-half-queues (its contributions to all meta-queues formed with all other threads) when taken together form a meta-queue. |
Quite the opposite? Two different processes on two different hardware threads, on two different cores, in two different sockets, on two different nodes, in two different cities, ... eventually you must admit that these could actually execute at the same time, i.e. physically concurrently. |
Honestly, any app that relies on the ordering of messages sent/received from different threads without explicitly enforcing the order is an erroneous code and should be rewritten. So, I would go with option 1. But then, I am explicitly against the current no-overtaking semantic in MPI anyway. |
Because you will never reach the "eventual" or the "last turtle", therefore, you'll never reach the ultimate "concurrency" -- which equates to ultimate in-concurrent. ... But this is a recurring philosophical discussion. Shall we agree that we should understand its never ending or pointless nature and avoid such philosophical discussion? |
@hjelmn the problem is that there is no way to explicitly enforce the order - as shown by the examples. The only option the user has is to marshal all point-to-point calls onto a single thread and rely on the ordering rule, or to use different tags/ranks/communicators. |
@dholmes-epcc-ed-ac-uk True. Though they could force the order using out-of-band methods (which would be extremely ugly). |
@hzhou we have different meanings for the word concurrency. I'm using this one: "at the same time" What do you mean by the word? |
How does that work? I'm not sure I agree - please provide an example. |
To my knowledge, programmers need to mark the data as atomic or use explicit atomic instructions to tell the compiler. The memory models is all about educating programmers so they can tell the compiler precisely. I think we should do the same in MPI. |
That is what |
That's what this thread is about -- we are not agreeing on it and are debating about it. To me, |
This whole discussion is about performance, and if you care about performance, then things are not sequential. So, @jeffhammond , I do not think I buy your arguments. |
My argument is that people need to use proper computer science terminology for discussing shared-memory concurrency. Do you object to that? Once we start using C11 memory model language to discuss things, then we can talk about the consequences of them on MPI. |
And no, performance is irrelevant at this point. You cannot just casually break the MPI-1 semantics for message ordering just because you want If you want to change the semantics of Send-Recv, please submit a ticket to do that. |
There is nothing wrong with the following text. The problem exists only in how people are reading it, because we only address the unordered multithread case.
This is the logical complement of the above text:
I am not adding anything here. The standard currently describes unordered multithreaded execution. We didn't bother defining logically ordered multithreaded execution, probably because it was obvious that it was degenerate with the single-threaded case, but it seems that was a bad assumption. |
The problem lies precisely at the statement that you wanted to dismiss -- The order in a multithreaded case is an intention, not a result. Without telling MPI, and if MPI does not assume it, then MPI cannot tell whether two calls from two threads are ordered or not, even when the two calls are seconds apart in real-time. To tell an order, MPI needs to actively order between threads, and as a result, all calls will be ordered -- not as a result, but as an assumed intention. Note that to my experience, when programmers make parallel code, they usually intend calls by default to be concurrent rather than ordered.
But you are adding something in your previous comments. You are asserting that all MPI calls from multiple threads are logically ordered. Again, "logical" here is an intention. The standard does not have text on how to tell this "logical" order, thus the text about its consequence is ambiguous at best and misleading at worst. |
I'm not saying that but it's clear at this point that you have no intention to engage in good faith reading so I'm going to ignore you from now on. |
What I am saying is what Dan said 4 years ago (#117 (comment)), which is that calls to MPI behave as if they begin with an atomic operation. That's what it means to have some order: atomicity. They are not logically concurrent, any more than The camp B people are welcome to propose modifications to 11.6, but they cannot ignore it. For example, they can propose I suppose this debate was already had in the 331 comments I haven't read yet, but since there has not been a pull request to amend the following...
...we can just stop talking about how the calls themselves are logically concurrent. The standard already says otherwise. It is true that in a multithreaded program where threads are not synchronized, the programmer must reason about MPI calls as if they are logically concurrent but that does not mean they are actually logically concurrent in the implementation. What programmers can infer from code that uses MPI is not the same thing as what implementations are allowed to do. |
Do you mean the non-overtaking rule ?
Because to me, the sentence "The operations are logically concurrent, even if one physically precedes the other. In such a case, the two messages sent can be received in any order." states that the non-overtaking rule is not applicable for MPI_Sends which were issued in different threads.
Julien J.
…________________________________
De : Jeff Hammond ***@***.***>
Envoyé : mercredi 7 décembre 2022 20:04
À : mpi-forum/mpi-issues
Cc : Subscribed
Objet : Re: [mpi-forum/mpi-issues] 'Logically concurrent' isn't (#117)
And no, performance is irrelevant at this point. You cannot just casually break the MPI-1 semantics for message ordering just because you want MPI_THREAD_MULTIPLE to go faster in some specific use case.
If you want to change the semantics of Send-Recv, please submit a ticket to do that.
-
Reply to this email directly, view it on GitHub<#117 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD74CDY5DMLR5ZOTEHBAYRTWMDNSJANCNFSM4GJGYZ5A>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
This statement is more restrictive, than the current standard text defines. The interesting part of "no ordering is given, if the calls are concurrent" means, that it is not relevant at which time during the execution of the MPI function this atomic operation is placed. If the calls are logically ordered, the placement does not matter - it will reflect the ordering in any case. If the calls are not logically ordered, the placement does not matter - no ordering is given anyways. Following up on the enabled discussion yesterday, I think, that actually the enabling of the operation should define the ordering. I think, this would make it more clear for the case of non-blocking communication. |
The first sentence in your quote just states, that logically concurrent is more relaxed than physically concurrent. For people with multi-threading background this should be an axiomatic statement. The second sentence then becomes to "If two send operations are logically concurrent, the two messages sent can be received in any order.". This does not say that ordering between threads is given away. It is only given away, if the operations are logically concurrent. |
I believe https://github.com/mpi-forum/mpi-standard/pull/777 wasn't meant to close this issue, reopening. |
Oops. Sorry I missed that comment. You're right. |
This passed a first vote on 2023-02-08.
|
This passed a 2nd vote.
|
@dholmes-epcc-ed-ac-uk and I were talking about an issue the other night at dinner, and I wanted to record it because it's a serious issue that needs to get fixed in MPI next.
MPI-3.1 section 3.5 p41:10-17 states:
The problematic text states that any operations on different threads are “logically concurrent." Sometimes that is because the thread execution does not define an order. But even if there is a guaranteed order (which is perhaps what the phrase "physical precedes" means?) then MPI still considers them to be “logically concurrent”. For example, even if there is a thread synchronization between the operations, or perhaps an extremely long wall-clock time between the operations, MPI is still permitted to consider those operations “logically concurrent.” This is bad because MPI is permitted to deliver “logically concurrent” messages in any order, which is going to astonish users (and implementors).
Here's an example:
According to MPI-3.1 3.5, these two sends are logically concurrent, and it is permitted for the
b
message to be received at the receiver before thea
message.Note: the
sleep(60)
is actually unnecessary in this example -- it's just insult-added-to-injury to drive home the point.Here's another example (that was sent across the point-to-point working group list):
What does this print: 0, 1, or 1, 0?
According to MPI-3.1 section 3.5, both are possible. 😲
Resources
PR: https://github.com/mpi-forum/mpi-standard/pull/748
The text was updated successfully, but these errors were encountered: