Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fabric: Introduce new mode bit FI_BUFFERED_RECV #4108

Merged
merged 4 commits into from
May 22, 2018
Merged

Conversation

shefty
Copy link
Member

@shefty shefty commented May 17, 2018

This bit separates out part of the FI_VARIABLE_MSG feature.
As a mode bit, providers that must provide receive side
buffering can report receive completions to applications
by referencing their buffers directly. This avoids a
data on the receive side. The rxm and rxd providers both
can use this function with applications that support the
new mode bit.

Most of the description for FI_BUFFERED_RECV is taken
from the FI_VARIABLE_MSG definition. However, some
modifications are made to that definition based on
mapping to implementation details.

Signed-off-by: Sean Hefty sean.hefty@intel.com

This bit separates out part of the FI_VARIABLE_MSG feature.
As a mode bit, providers that must provide receive side
buffering can report receive completions to applications
by referencing their buffers directly.  This avoids a
data on the receive side.  The rxm and rxd providers both
can use this function with applications that support the
new mode bit.

Most of the description for FI_BUFFERED_RECV is taken
from the FI_VARIABLE_MSG definition.  However, some
modifications are made to that definition based on
mapping to implementation details.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
@shefty shefty changed the title fabric: Introduce new dual cap/mode bit FI_BUFFERED_RECV fabric: Introduce new mode bit FI_BUFFERED_RECV May 17, 2018
man/fi_msg.3.md Outdated
a result, received messages must be copied from the network buffers
into application buffers for processing. However, applications can
avoid this copy if they are able to process the message in place
(directly from the networking buffers). Buffered receives are often

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest removing the reference to utility providers since it is transparent from users and providers other than utility ones can also expose this via the mode bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reference removed

@sayantansur
Copy link

This is very similar to the MPI_Arecv proposal that @jsquyres made at the MPI Forum years ago. One of the discussion items at the forum was that how to prevent applications from shooting themselves in the foot by not returning (or discarding) network buffers that were reported up. Some providers might respond to lack of network buffers by registering even more buffer, leading to high memory usage. While I'm sure this can be cast as a "bad application", the thresholds at which the application must react to avoid leaving pinned pages behind can be different on providers. e.g. TCP might just expose virtual memory, vs. IB that might give up pinned pages.

Some ideas ... can the provider deliver an out-of-buffer (or out of header space) type event to the receiving app? Can the provider deliver an event to the sender saying that the receiving provider is out of space (since FI_VARIABLE_MSG doesn't require the app to allocate buffer)?

@shefty
Copy link
Member Author

shefty commented May 18, 2018

I think the solution is for the provider to stop allocating buffers. :) Apps that leak buffers paired with providers that continuously allocate them are two layers of stupid that I'm not sure we can fix. I can add notes to the man pages that apps that do not claim or discard messages in a timely manner may stall their network traffic. I don't like the idea of forcing the provider to generate events, since that may require converting to a software based CQ.

@sayantansur
Copy link

Bleh, yes, no need to force people to software CQ. If you could make that language strong enough, that'd be great. The fear at the Forum was that this will encourage bad network programming.

@shefty
Copy link
Member Author

shefty commented May 18, 2018

The added text is:

IMPORTANT: Buffered receives must be claimed or discarded in a timely manner.  Failure to do so may result in increased memory usage for network buffering or communication stalls. 

Variable message support is an application requested capability.
Rework the man page description based on the requirements that
are defined for FI_BUFFERED_RECV.  Variable messages adds
support for large message transfers and message notifications,
the latter of which is identical for FI_BUFFERED_RECV.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
@shefty
Copy link
Member Author

shefty commented May 18, 2018

Added 2nd patch to have the FI_VARIABLE_MSG definition refer to the FI_BUFFERED_RECV definition, with duplicated documentation removed.

@shefty shefty force-pushed the master branch 2 times, most recently from 107e5c7 to 41dd7de Compare May 18, 2018 19:22
This call handles but op flags and capability bits.  However,
new op flags will overlap with capabilities.  Create separate
functions to avoid mis-interpretting bits.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Also add FI_CLAIM, FI_DISCARD as CQ event flags.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
@arn314
Copy link
Contributor

arn314 commented May 18, 2018

I don't like the idea of forcing the provider to generate events, since that may require converting to a software based CQ

Can providers which already have a software CQ write an error entry to CQ, in case it runs out of buffers? This can be an opt-in feature.

@sayantansur
Copy link

ooh, how about an error entry? Error queues are almost all software? Although having the app read an entry sounds blah to me now. Apps are already getting EAGAIN and are supposed to interpret it as network is blocked.

@arn314
Copy link
Contributor

arn314 commented May 18, 2018

Apps are already getting EAGAIN and are supposed to interpret it as network is blocked.

I think an error entry could be more informative since EAGAIN could have many causes. But perhaps both the ideas should be combined since apps try to progress whenever they encounter EAGAIN.

Also if the app never posts a send or recv, error entry is the only way to convey the problem.

@shefty
Copy link
Member Author

shefty commented May 18, 2018

I'm fairly strongly of the opinion that we don't do anything. We don't generate events when an app doesn't post any receive buffers, so why should this be any different? Conceptually, the app isn't reposting the receive buffers. The transport kicks in flow control, the endpoint eventually times out and goes into some sort of error state. If the apps wants to receive messages, hey, return the buffers. Generating run time errors for what is a coding error doesn't help.

MPI may not trust their end-users to write proper code, but I trust the libfabric users. :)

@sayantansur
Copy link

I also believe that it is the right thing to do. Providers are free to write info, debug or other logs to help out applications that otherwise "mysteriously" grind to a halt.

@arn314
Copy link
Contributor

arn314 commented May 18, 2018

Sounds good 👍

treated as conceptually occurring out of band. No ordering within or
between the data of variable messages is implied.
may indicate the order in which received messages arrived at the
receiver based on the endpoint attributes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to note that this eliminates the need to register receives that are smaller than the threshold size (when claiming)

@shefty shefty merged commit a75f3f1 into ofiwg:master May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants