fabric: Introduce new mode bit FI_BUFFERED_RECV #4108

shefty · 2018-05-17T23:39:34Z

This bit separates out part of the FI_VARIABLE_MSG feature.
As a mode bit, providers that must provide receive side
buffering can report receive completions to applications
by referencing their buffers directly. This avoids a
data on the receive side. The rxm and rxd providers both
can use this function with applications that support the
new mode bit.

Most of the description for FI_BUFFERED_RECV is taken
from the FI_VARIABLE_MSG definition. However, some
modifications are made to that definition based on
mapping to implementation details.

Signed-off-by: Sean Hefty sean.hefty@intel.com

This bit separates out part of the FI_VARIABLE_MSG feature. As a mode bit, providers that must provide receive side buffering can report receive completions to applications by referencing their buffers directly. This avoids a data on the receive side. The rxm and rxd providers both can use this function with applications that support the new mode bit. Most of the description for FI_BUFFERED_RECV is taken from the FI_VARIABLE_MSG definition. However, some modifications are made to that definition based on mapping to implementation details. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

sayantansur · 2018-05-18T15:06:46Z

man/fi_msg.3.md

+a result, received messages must be copied from the network buffers
+into application buffers for processing.  However, applications can
+avoid this copy if they are able to process the message in place
+(directly from the networking buffers).  Buffered receives are often


I would suggest removing the reference to utility providers since it is transparent from users and providers other than utility ones can also expose this via the mode bit.

reference removed

sayantansur · 2018-05-18T15:47:12Z

This is very similar to the MPI_Arecv proposal that @jsquyres made at the MPI Forum years ago. One of the discussion items at the forum was that how to prevent applications from shooting themselves in the foot by not returning (or discarding) network buffers that were reported up. Some providers might respond to lack of network buffers by registering even more buffer, leading to high memory usage. While I'm sure this can be cast as a "bad application", the thresholds at which the application must react to avoid leaving pinned pages behind can be different on providers. e.g. TCP might just expose virtual memory, vs. IB that might give up pinned pages.

Some ideas ... can the provider deliver an out-of-buffer (or out of header space) type event to the receiving app? Can the provider deliver an event to the sender saying that the receiving provider is out of space (since FI_VARIABLE_MSG doesn't require the app to allocate buffer)?

shefty · 2018-05-18T17:09:50Z

I think the solution is for the provider to stop allocating buffers. :) Apps that leak buffers paired with providers that continuously allocate them are two layers of stupid that I'm not sure we can fix. I can add notes to the man pages that apps that do not claim or discard messages in a timely manner may stall their network traffic. I don't like the idea of forcing the provider to generate events, since that may require converting to a software based CQ.

sayantansur · 2018-05-18T17:24:21Z

Bleh, yes, no need to force people to software CQ. If you could make that language strong enough, that'd be great. The fear at the Forum was that this will encourage bad network programming.

shefty · 2018-05-18T17:27:13Z

The added text is:

IMPORTANT: Buffered receives must be claimed or discarded in a timely manner.  Failure to do so may result in increased memory usage for network buffering or communication stalls.

Variable message support is an application requested capability. Rework the man page description based on the requirements that are defined for FI_BUFFERED_RECV. Variable messages adds support for large message transfers and message notifications, the latter of which is identical for FI_BUFFERED_RECV. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

shefty · 2018-05-18T19:13:10Z

Added 2nd patch to have the FI_VARIABLE_MSG definition refer to the FI_BUFFERED_RECV definition, with duplicated documentation removed.

This call handles but op flags and capability bits. However, new op flags will overlap with capabilities. Create separate functions to avoid mis-interpretting bits. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Also add FI_CLAIM, FI_DISCARD as CQ event flags. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

arn314 · 2018-05-18T20:35:29Z

I don't like the idea of forcing the provider to generate events, since that may require converting to a software based CQ

Can providers which already have a software CQ write an error entry to CQ, in case it runs out of buffers? This can be an opt-in feature.

sayantansur · 2018-05-18T20:39:06Z

ooh, how about an error entry? Error queues are almost all software? Although having the app read an entry sounds blah to me now. Apps are already getting EAGAIN and are supposed to interpret it as network is blocked.

arn314 · 2018-05-18T20:52:20Z

Apps are already getting EAGAIN and are supposed to interpret it as network is blocked.

I think an error entry could be more informative since EAGAIN could have many causes. But perhaps both the ideas should be combined since apps try to progress whenever they encounter EAGAIN.

Also if the app never posts a send or recv, error entry is the only way to convey the problem.

shefty · 2018-05-18T21:02:50Z

I'm fairly strongly of the opinion that we don't do anything. We don't generate events when an app doesn't post any receive buffers, so why should this be any different? Conceptually, the app isn't reposting the receive buffers. The transport kicks in flow control, the endpoint eventually times out and goes into some sort of error state. If the apps wants to receive messages, hey, return the buffers. Generating run time errors for what is a coding error doesn't help.

MPI may not trust their end-users to write proper code, but I trust the libfabric users. :)

sayantansur · 2018-05-18T21:07:31Z

I also believe that it is the right thing to do. Providers are free to write info, debug or other logs to help out applications that otherwise "mysteriously" grind to a halt.

arn314 · 2018-05-18T21:17:22Z

Sounds good 👍

shefty · 2018-05-21T22:53:08Z

man/fi_msg.3.md

-treated as conceptually occurring out of band.  No ordering within or
-between the data of variable messages is implied.
+may indicate the order in which received messages arrived at the
+receiver based on the endpoint attributes.


Need to note that this eliminates the need to register receives that are smaller than the threshold size (when claiming)

shefty force-pushed the master branch from d27ef4e to 1d64c83 Compare May 17, 2018 23:40

shefty changed the title ~~fabric: Introduce new dual cap/mode bit FI_BUFFERED_RECV~~ fabric: Introduce new mode bit FI_BUFFERED_RECV May 17, 2018

sayantansur reviewed May 18, 2018

View reviewed changes

shefty force-pushed the master branch from 1d64c83 to 160b6f4 Compare May 18, 2018 17:22

shefty force-pushed the master branch from 160b6f4 to 12be00f Compare May 18, 2018 17:27

shefty force-pushed the master branch 2 times, most recently from 107e5c7 to 41dd7de Compare May 18, 2018 19:22

shefty added 2 commits May 18, 2018 12:35

core/tostr: Restructure fi_tostr_flags()

c031870

This call handles but op flags and capability bits. However, new op flags will overlap with capabilities. Create separate functions to avoid mis-interpretting bits. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core/tostr: Add FI_BUFFERED_RECV and FI_VARIABLE_MSG to fi_tostr

996d965

Also add FI_CLAIM, FI_DISCARD as CQ event flags. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

shefty commented May 21, 2018

View reviewed changes

shefty merged commit a75f3f1 into ofiwg:master May 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fabric: Introduce new mode bit FI_BUFFERED_RECV #4108

fabric: Introduce new mode bit FI_BUFFERED_RECV #4108

shefty commented May 17, 2018 •

edited

Loading

sayantansur May 18, 2018

shefty May 18, 2018

sayantansur commented May 18, 2018

shefty commented May 18, 2018

sayantansur commented May 18, 2018

shefty commented May 18, 2018

shefty commented May 18, 2018

arn314 commented May 18, 2018

sayantansur commented May 18, 2018

arn314 commented May 18, 2018

shefty commented May 18, 2018

sayantansur commented May 18, 2018

arn314 commented May 18, 2018

shefty May 21, 2018

fabric: Introduce new mode bit FI_BUFFERED_RECV #4108

fabric: Introduce new mode bit FI_BUFFERED_RECV #4108

Conversation

shefty commented May 17, 2018 • edited Loading

sayantansur May 18, 2018

Choose a reason for hiding this comment

shefty May 18, 2018

Choose a reason for hiding this comment

sayantansur commented May 18, 2018

shefty commented May 18, 2018

sayantansur commented May 18, 2018

shefty commented May 18, 2018

shefty commented May 18, 2018

arn314 commented May 18, 2018

sayantansur commented May 18, 2018

arn314 commented May 18, 2018

shefty commented May 18, 2018

sayantansur commented May 18, 2018

arn314 commented May 18, 2018

shefty May 21, 2018

Choose a reason for hiding this comment

shefty commented May 17, 2018 •

edited

Loading