Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libfabric-2.0: Ordering semantics #9012

Open
shefty opened this issue Jun 9, 2023 · 2 comments
Open

libfabric-2.0: Ordering semantics #9012

shefty opened this issue Jun 9, 2023 · 2 comments
Assignees

Comments

@shefty
Copy link
Member

shefty commented Jun 9, 2023

For data ordering:

  • Remove FI_ORDER_NONE (because it's a flag = 0)
  • Only use separate atomic and RMA ordering flags (e.g. RAR, RAW, WAR, WAW)
  • This may remove FI_ORDER_RAR, etc. - or define as only applying to RMA, not atomics
  • Need to update ep attribute values max_order_rar/raw/war/waw_size to separate atomic from RMA
  • May define current max_order_xxx_size for RMA only and add new atomic sizes

For completion ordering:

  • Remove FI_ORDER_STRICT
  • Completion ordering will not be guaranteed
@bcernohous
Copy link
Contributor

What does this mean to remote memory ordering? For RX comp_order, if we remove guarantees and FI_ORDER_STRICT :

comp_order - Completion Ordering
For a description of completion ordering, see the comp_order field in the Transmit Context Attribute section.

FI_ORDER_DATA
When set, this bit indicates that received data is written into memory in order. Data ordering applies to memory accessed as part of a single operation and between operations if message ordering is guaranteed.

FI_ORDER_NONE
No ordering is defined for completed operations. Receive operations may complete in any order, regardless of their submission order.

FI_ORDER_STRICT
Receive operations complete in the order in which they are processed by the receive context, based on the receive side msg_order attribute.

Does libfabric 2.0 then support WAW (without fences) in a way that will support shmem_put_signal_nbi with two (WAW) writes? Is comp_order FI_ORDER_DATA and msg_order FI_ORDER_RMA_WAW sufficient? Is only msg_order FI_ORDER_RMA_WAW sufficient? The combination of 4 fields RX/TX and msg_order/comp_order guarantees isn't 100% clear to me.

Someone related this issue to a UCX issue which states they always require a put/fence/put and that triggered my question.

openucx/ucx#9361

@shefty
Copy link
Member Author

shefty commented Sep 27, 2023

FI_ORDER_DATA requires that data within a single message is committed to memory in order. AFAIK, no transport (including InfiniBand) defines support for this. Trying to support this places significant limits on how data may flow through the fabric, or the target NIC may need to buffer significant amounts of data in order to ensure that it writes the data into memory in order. I'm not aware of any apps that need this.

FI_ORDER_STRICT only makes sense for connected endpoints. It says that if the app posts receive buffers 1, 2, 3, 4, ... then the completions read from the CQ will report the receive completions referencing buffers 1, 2, 3, 4, ... in that same order. For unconnected endpoints, this option isn't usable.

FI_ORDER_RMA_WAW requires that RMA writes complete at the target in order. This means that the message related to the first write must be handled prior to handling the message of the second message. (Here, think of message as the first packet of the write request). However, that doesn't mean that all data from the first message must be written before any data from the second message is written. There's a second attribute for that: fi_ep_attr::max_order_waw_size.

If both writes are smaller than max_order_waw_size and FI_ORDER_RMA_WAW is set, then all data from the first write will be placed at the target before any data from the second write.

I believe UCX can use multiple NICs/ports/QPs between peers. That could be why they require a fence operation, in case the writes take separate paths through the network and arrive at the target out of order.

@shefty shefty self-assigned this Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants