-
Notifications
You must be signed in to change notification settings - Fork 931
v4.0.x: Cherry pick ob1 fixes from master #6634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In case of using a btl_put in ob1, the handle of the locally registered memory is sent with a PUT control message. In the current master code the sent handle is necessary the handle in the frag but if the handle has been successfully registered in the request, the frag structure does not have any valid handle and all fragments use the request one. I suggest to check if the handle in the fragment is valid and if not to send the handle from the request. Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net> (cherry picked from commit e630046)
|
The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/90c02b30b0acdf60d49f6eff5f77e769 |
|
The IBM CI (GNU Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/e21e4f2797e937a188a1f6df3cd01bed |
The rdma_frag attached to the send request was not correctly released upon request completion, leaking until MPI_Finalize. A quick solution would have been to add RDMA_FRAG_RETURN at different locations on the send request completion, but it would have unnecessarily made the sendreq completion path more complex. Instead, I added the length to the RDMA fragment so that it can be completed during the remote ack. Be more explicit on the comment. The rdma_frag can only be freed once when the peer forced a protocol change (from RDMA GET to send/recv). Otherwise the fragment will be returned once all data pertaining to it has been trasnferred. NOTE: Had to add a typedef for "opal_atomic_size_t" from master into opal/threads/thread_usage.h into this cherry pick (it is in opal/include/opal_stdatomic.h on master, but that file does not exist here on the v4.0.x branch). Signed-off-by: George Bosilca <bosilca@icl.utk.edu> (cherry picked from commit a16cf0e) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
97bb75c to
48f8243
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Lets discuss on Tuesday. Perhaps we should pull this PR, and then do a separate PR for #6633 content later. |
|
@hppritcha We didn't get to this PR in today's web-ex. Lets bump discussion to next week. |
|
at 6/25/19 devel meeting we decided this can be merged in but it doesn't fix issues with the put protocol pipelining (it fixes a different OB1 PUT protocol issue). |
See #6633 for a full explanation.
NOTE: The cherry-pick of a16cf0e had to include an extra
typedefinopal/threads/thread_usage.hfrom elsewhere in the tree (and not part of a16cf0e), clearly noted in the commit.FYI @EmmanuelBRELLE