Skip to content

Conversation

hkuno
Copy link
Contributor

@hkuno hkuno commented May 12, 2021

The upper 2 bits of an ompi tag encode the synchronize send and
synchronize send ack.
Because the mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag
functions both use ompi_mtl_ofi.sync_proto_mask instead of
ompi_mtl_ofi.sync_send when generating their "ignore" masks, they hide
the ack bit, turning the tag into an "any tag receive"

This is an issue because ssend is implemented by doing a send and
receive internally. So if there happens to be an outstanding posted
receive posted before the ssend, that receive will end up consuming the
internal message intended for the ssend's internal receive.

Updating mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag functions
to use ompi_mtl_ofi.sync_send fixes this.

Authored-by: John L. Byrne john.l.byrne@hpe.com

Signed-off-by: Harumi Kuno harumi.kuno@hpe.com
(cherry picked from commit 18baa5e)

The upper 2 bits of an ompi tag encode the synchronize send and
synchronize send ack.
Because the mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag
functions both use ompi_mtl_ofi.sync_proto_mask instead of
ompi_mtl_ofi.sync_send when generating their "ignore" masks, they hide
the ack bit, turning the tag into an "any tag receive"

This is an issue because ssend is implemented by doing a send and
receive internally.  So if there happens to be an outstanding posted
receive posted before the ssend, that receive will end up consuming the
internal message intended for the ssend's internal receive.

Updating mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag functions
to use ompi_mtl_ofi.sync_send fixes this.

Authored-by: John L. Byrne <john.l.byrne@hpe.com>

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
(cherry picked from commit 18baa5e)
@jsquyres jsquyres added this to the v4.0.6 milestone May 12, 2021
@jsquyres
Copy link
Member

@hkuno Don't forget that you need to get someone to review this before it will be accepted by the RMs.

@hkuno hkuno requested a review from hppritcha May 12, 2021 21:11
@hkuno
Copy link
Contributor Author

hkuno commented May 12, 2021

@jsquyres
Thank you, Jeff. (I'm new to this.) I will ask Howard and Brian to review it since they both reviewed the original PR for the cherry-picked commit.
@hppritcha, @bwbarrett -- the original commit was first submitted here: #8052

@hkuno hkuno requested a review from bwbarrett May 12, 2021 22:10
@hppritcha hppritcha merged commit 40df636 into open-mpi:v4.0.x May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants