-
Couldn't load subscription status.
- Fork 928
btl/tcp: MPI_Put/Accumulate(): Fix pre-mature completeion callback. #8700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can you give a little more details on what exactly this PR fixes ? My understanding is that for MPI_Put and MPI_Accumulate as we don't expect an answer from the target, the operations are not synchronizing, which means we only need to know about the local completion. |
|
@bosilca that is right. The issue this addresses is osc/rdma relies on the 'completion' callback from btl_put() to know when it was complete for the sync (MPI_Win_fence, ex). Since this callback was only detecting local completion in the tcp/btl layer, osc/rdma thought it was completed remotely when it in fact wasn't, so the fence would return before the data had actually arrived. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few nitpicks. There are also some pretty long lines but I guess clang-format will massage that code anyway...
opal/mca/btl/tcp/btl_tcp_hdr.h
Outdated
| #define MCA_BTL_TCP_HDR_TYPE_PUT 2 | ||
| #define MCA_BTL_TCP_HDR_TYPE_GET 3 | ||
| #define MCA_BTL_TCP_HDR_TYPE_FIN 4 | ||
| #define MCA_BTL_TCP_HDR_TYPE_PUT_ACK 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While at it, can we make this an enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
But then the solution proposed here is very expensive, because it requires a sync for every single put and accumulate. This also requires progress, or the ack will not be extracted from the network. A more efficient approach would be to count the ops on the origin and on the target, and to have an explicit handshake for the match. This would allow all put/accumulate to remain as they are today (aka partially or locally completed) and force a single RTT on the synchronization. |
|
@bosilca I agree your proposal would be better. This solution just makes the put/accumulate via btl/tcp 'correct', which it currently is not. TCP isn't expected to be performant anyway, but it should be correct. I can iterate on improving performance, but this change at least makes the rdma/tcp path viable for osc with Put/Accumulate. |
|
We can always go with this solution so it at least works for v5.0.x, and I can investigate implementing the better performant path, but I suspect it will additional code changes in osc/rdma. |
196ae07 to
254a215
Compare
|
@devreal changes addressed, thanks! |
|
I need to look more at the code to refresh my memory on why we do not have support for AM with acknowledgement. What I had in mind was something similar to the handling of synchronous sends in pt2pt. |
254a215 to
3d3727c
Compare
- Send an ack back to the origin letting it know when it is safe to invoke the completion callback. Otherwise it can be called before the data has arrived. Found with test accfence1.c. Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
3d3727c to
cc449df
Compare
Send an ack back to the origin letting it know when it is
safe to invoke the completion callback. Otherwise it
can be called before the data has arrived.
Found with test accfence1.c.
Signed-off-by: Austen Lauria awlauria@us.ibm.com