-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fabric: Introduce new mode bit FI_BUFFERED_RECV #4108
Conversation
This bit separates out part of the FI_VARIABLE_MSG feature. As a mode bit, providers that must provide receive side buffering can report receive completions to applications by referencing their buffers directly. This avoids a data on the receive side. The rxm and rxd providers both can use this function with applications that support the new mode bit. Most of the description for FI_BUFFERED_RECV is taken from the FI_VARIABLE_MSG definition. However, some modifications are made to that definition based on mapping to implementation details. Signed-off-by: Sean Hefty <sean.hefty@intel.com>
man/fi_msg.3.md
Outdated
a result, received messages must be copied from the network buffers | ||
into application buffers for processing. However, applications can | ||
avoid this copy if they are able to process the message in place | ||
(directly from the networking buffers). Buffered receives are often |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest removing the reference to utility providers since it is transparent from users and providers other than utility ones can also expose this via the mode bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference removed
This is very similar to the MPI_Arecv proposal that @jsquyres made at the MPI Forum years ago. One of the discussion items at the forum was that how to prevent applications from shooting themselves in the foot by not returning (or discarding) network buffers that were reported up. Some providers might respond to lack of network buffers by registering even more buffer, leading to high memory usage. While I'm sure this can be cast as a "bad application", the thresholds at which the application must react to avoid leaving pinned pages behind can be different on providers. e.g. TCP might just expose virtual memory, vs. IB that might give up pinned pages. Some ideas ... can the provider deliver an out-of-buffer (or out of header space) type event to the receiving app? Can the provider deliver an event to the sender saying that the receiving provider is out of space (since FI_VARIABLE_MSG doesn't require the app to allocate buffer)? |
I think the solution is for the provider to stop allocating buffers. :) Apps that leak buffers paired with providers that continuously allocate them are two layers of stupid that I'm not sure we can fix. I can add notes to the man pages that apps that do not claim or discard messages in a timely manner may stall their network traffic. I don't like the idea of forcing the provider to generate events, since that may require converting to a software based CQ. |
Bleh, yes, no need to force people to software CQ. If you could make that language strong enough, that'd be great. The fear at the Forum was that this will encourage bad network programming. |
The added text is:
|
Variable message support is an application requested capability. Rework the man page description based on the requirements that are defined for FI_BUFFERED_RECV. Variable messages adds support for large message transfers and message notifications, the latter of which is identical for FI_BUFFERED_RECV. Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Added 2nd patch to have the FI_VARIABLE_MSG definition refer to the FI_BUFFERED_RECV definition, with duplicated documentation removed. |
107e5c7
to
41dd7de
Compare
This call handles but op flags and capability bits. However, new op flags will overlap with capabilities. Create separate functions to avoid mis-interpretting bits. Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Also add FI_CLAIM, FI_DISCARD as CQ event flags. Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Can providers which already have a software CQ write an error entry to CQ, in case it runs out of buffers? This can be an opt-in feature. |
ooh, how about an error entry? Error queues are almost all software? Although having the app read an entry sounds blah to me now. Apps are already getting EAGAIN and are supposed to interpret it as network is blocked. |
I think an error entry could be more informative since EAGAIN could have many causes. But perhaps both the ideas should be combined since apps try to progress whenever they encounter EAGAIN. Also if the app never posts a send or recv, error entry is the only way to convey the problem. |
I'm fairly strongly of the opinion that we don't do anything. We don't generate events when an app doesn't post any receive buffers, so why should this be any different? Conceptually, the app isn't reposting the receive buffers. The transport kicks in flow control, the endpoint eventually times out and goes into some sort of error state. If the apps wants to receive messages, hey, return the buffers. Generating run time errors for what is a coding error doesn't help. MPI may not trust their end-users to write proper code, but I trust the libfabric users. :) |
I also believe that it is the right thing to do. Providers are free to write info, debug or other logs to help out applications that otherwise "mysteriously" grind to a halt. |
Sounds good 👍 |
treated as conceptually occurring out of band. No ordering within or | ||
between the data of variable messages is implied. | ||
may indicate the order in which received messages arrived at the | ||
receiver based on the endpoint attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to note that this eliminates the need to register receives that are smaller than the threshold size (when claiming)
This bit separates out part of the FI_VARIABLE_MSG feature.
As a mode bit, providers that must provide receive side
buffering can report receive completions to applications
by referencing their buffers directly. This avoids a
data on the receive side. The rxm and rxd providers both
can use this function with applications that support the
new mode bit.
Most of the description for FI_BUFFERED_RECV is taken
from the FI_VARIABLE_MSG definition. However, some
modifications are made to that definition based on
mapping to implementation details.
Signed-off-by: Sean Hefty sean.hefty@intel.com