Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does using EP_RDM imply in order arrival of messages? #1217

Closed
bturrubiates opened this issue Aug 14, 2015 · 16 comments
Closed

Does using EP_RDM imply in order arrival of messages? #1217

bturrubiates opened this issue Aug 14, 2015 · 16 comments

Comments

@bturrubiates
Copy link
Member

The documentation doesn't specify whether EP_RDM guarantees in order arrival of messages. Since tagged operations seem to be commonly used on EP_RDM this suggests that the endpoint should guarantee ordering. If that's the case, then the man pages should be updated. I can update them based on the outcome of this question.

@shefty @goodell @jsquyres

@jsquyres
Copy link
Member

@shefty This has caused a fair amount of discussion here in Cisco. :-) Some of us thought/assumed that EP_RDM was supposed to be in order, while others of us thought/assumed that the "datagram" part of "RDM" implied that it shared the same ordering characteristics as regular datagrams (i.e., it would be RIDM for "reliable, in-order datagrams" if it was supposed to be in-order).

That being said, both the MPICH and Open MPI implementations tagged OFI implementations are using EP_RDM, and that doesn't make sense unless the messages are guaranteed to be in order.

Which way is intended?

@shefty
Copy link
Member

shefty commented Aug 14, 2015

FI_RDM by itself does not guarantee ordering. None of the endpoint types do. Ordering is defined using the ordering related attributes in fi_info.

@jsquyres
Copy link
Member

Are you referring to the FI_ORDER_* values in fi_endpoint(3)? FWIW:

  1. Neither Open MPI nor MPICH specify an FI_ORDER_* attribute, as far as I can tell.
  2. Since we're only talking about sends / receives for the MPI tagged interfaces (i.e., not atomics), is FI_ORDER_SAS likely the value that the MPI implementations should be using?

@yburette Is this a bug? There is just enough time left to fix this before Open MPI v1.10...

@shefty
Copy link
Member

shefty commented Aug 14, 2015

Yes - FI_ORDER_XXX, along with the msg_order_xax_size (I think that's the right name). SAS sounds right. @sayantansur - can you look at this from MPI's perspective?

@jsquyres
Copy link
Member

From MPI's perspective, tagged messages don't make much sense without ordering.

@sayantansur
Copy link

We just had a hallway discussion. Yes, I was thinking the same thing FI_ORDER_SAS is required by MPI, and it only means something for tagged if it is the matching order. The tag match API doesn't care when messages actually arrive - they can only be acted upon when they are matched.

@jsquyres
Copy link
Member

Ok, we agree: if the tagged messages aren't ordered, then if sender A sends N messages with the same tag to receiver B, they can arrive in a different order -- which would violate MPI ordering guarantees.

@yburette please submit a PR for Open MPI master/v1.10 ASAP. @rhc54 @hppritcha this is a blocker for v1.10.

@shefty
Copy link
Member

shefty commented Aug 14, 2015

Sayantan mentioned that it may even be a good idea to print a warning for apps that request tagged messages but without ordering. Or FI_MSG or FI_RDM, without any ordering.

@yburette
Copy link
Contributor

@jsquyres OK. Just to make sure, we are only interested in the tx_attr->msg_order, right?

@shefty
Copy link
Member

shefty commented Aug 14, 2015

You need ordering on both the transmit and receive side if you want true ordering.

@yburette
Copy link
Contributor

That makes more sense. Thanks.

@yburette
Copy link
Contributor

Is it just me, or the PSM provider doesn't currently check for ordering?

@jsquyres
Copy link
Member

@yburette I don't know anything about PSM, but it's always ordered, isn't it? (or, perhaps more specifically: PSM connections are created in the PSM provider in a way that -- perhaps indirectly -- effects ordering). That's the only way I can assume that the tagged support has worked so far.

@jsquyres
Copy link
Member

I'm wary about printing a warning for apps that use tagged messages but don't ask for ordering.

Either:

  • The app intended that, and it would be wrong to print out a warning because the application is correct (e.g., a BGP-style application), or
  • The app did not intend that, in which case, why isn't it an error (vs. a warning)?

Perhaps there should be an FI_ORDER_NONE, and we require all endpoints to request some FI_ORDER_* value. This would work best if none of the FI_ORDER_* values were 0.

@yburette
Copy link
Contributor

@jsquyres Right, PSM can be considered as a FIFO.

@bturrubiates
Copy link
Member Author

I think this issue is fully discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants