Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/net: TX operations now support io_uring #1

Closed
wants to merge 11 commits into from

Conversation

sydidelot
Copy link
Owner

No description provided.

The patch adds the argument --with-uring[=DIR] to
detect liburing support in libfabric.

Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
The patch implements the utility functions ofi_{sendv,recv}_socket()
that can be used to send/recv data to/from a socket using an
iovec as input argument. The bsock interface was also updated to
make use of these new functions.

Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
This patch moves all the socket ops to a common API. In a later
patch, the bsock interface will allow the provider to specify
a custom API for the socket ops (required for io_uring support).

Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
We need to differentiate io_uring error codes from other types of
error codes, including:
- FI_EIOURING_PREP: "io_uring operation prepared, waiting for submission"
- FI_EIOURING_FULL: "io_uring full"

Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
The patch adds the file src/iouring.c where are listed all the
definitions related to io_uring.

In addition, configure.ac was modified to export the autoconf
variable HAVE_LIBURING and which is used to determine whether
src/iouring.c must be compiled or not.

Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
io_uring operations in the sockapi interface would return -FI_EIOURING_PREP
if the operation was prepared in the ring and is waiting for submission.

recv() and recvv() require some changes to the bsock API to return how
many bytes were read from the byteq to the caller (similar to what was done
for zerocopy send). I added a couple of assertion failures to detect
unimplemented features related to io_uring.

Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
FI_NET_IO_URING=[1|0] can be used to switch on/off io_uring support
from the net provider if compiled with HAVE_LIBURING.

Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
This patch adds the tx/rx io_urings in the progress instance.
They will be used in a later patch.

Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
io_uring will need to complete cur_tx on its own where a CQE
is received for a given EP. Let's move the completion of ep->cur_tx
to its own function, so we can call it from multiple places.

Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
Copy link

@shefty shefty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fairly quick look over these changes. See comments. Thanks! I think this is getting close to having io_uring support, which I'm hoping will be significant.

@@ -45,6 +45,57 @@

static ssize_t (*xnet_start_op[ofi_op_write + 1])(struct xnet_ep *ep);

#ifdef HAVE_LIBURING
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'll need these ifdef's in the net provider. They already occur in the io_uring code, so all of the calls should be abstracted. Except I guess the calls below are calling io_uring directly? I thought you defined abstractions for these?

Copy link
Owner Author

@sydidelot sydidelot Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you wanted the net provider to manage the io_uring directly (no abstraction).
I think it makes sense to initialize the io_uring in the net provider as it may request specific flags during initialization. I'm not sure we should defined abstractions for that in the common code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was comparing this to the ofi_epoll calls. We make those directly from the providers, but the abstraction hides the implementation, or if it's available. I thought I saw that you had abstractions -- something like ofi_uring_init() -- that the provider could just call.

Copy link
Owner Author

@sydidelot sydidelot Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I initially misunderstood your statement:

We obviously need an OS abstraction for an io_uring. I would actually introduce a typedef for this, similar to epoll. E.g. struct io_uring ofi_io_uring_t. This abstraction would be mostly a wrapper, with more control given to the provider

By OS abstraction you mean an abstraction in src/iouring.c. My bad, will fix it.


/* io_uring rounds up the number of entries to the next power of 2,
* so we could get more entries than initially
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop this comment. It's sufficient to note that the return value may be higher then requested. I dno't see where 'entries' is defined or used.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

#define xnet_init_io_uring(io_uring, entries) -FI_ENOSYS
#define xnet_destroy_io_uring(io_uring) do {} while(0)
#define xnet_io_uring_fd(io_uring) INVALID_SOCKET
#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you move the io_uring abstraction that you had here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only abstraction I had was for send/recv syscalls (in iouring.c).
I just realized that you wanted to abstract all the calls of io_uring in src/iouring.c. My bad, I will fix that.

*/
entries = io_uring_sq_space_left(&io_uring->ring);

io_uring->fid.fclass = XNET_CLASS_IO_URING;
io_uring->credits = io_uring_sq_space_left(&io_uring->ring);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Last patch is modifying a previous patch to fix-up errors. Some of the changes should move into the earlier patch.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix. Thanks.

static bool xnet_get_io_uring_credit(struct xnet_io_uring *io_uring)
{
if (!xnet_io_uring)
return true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks odd. I'd either expect to fail here or assert that we're using io_uring.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If xnet_io_uring is false, io_uring support is disabled and the number of credits is initially set to 0.
The function always returns true to bypass the credit system when io_uring is disabled.

assert(nready <= XNET_MAX_EVENTS);
for (i = 0; i < nready; i++) {
xnet_progress_cqe(io_uring, cqes[i]);
io_uring->credits++;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a potential problem here with this ordering? We have a completion for a send, but the send may not be done. That falls through to xnet_progress_tx(), which may need to post another transfer, which in turns need another credit. Could we temporarily block the send because we haven't updated the credits?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that! You are right and the credit must be released before executing xnet_progress_cqe(). I will update the code.


if (ret == -FI_EINPROGRESS) {
}
else if (ret == -FI_EINPROGRESS) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so you need to distinguish between 2 async 'error' codes based on whether using io_uring or async sockets.

/* We cannot enter blocking if io_uring has entries
* that need submission. */
assert(xnet_io_uring_needs_submit(&progress->tx_io_uring));
assert(xnet_io_uring_needs_submit(&progress->rx_io_uring));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert and comment look like opposites

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aargh, the assert is wrong. Thanks for catching that.

@@ -876,6 +994,8 @@ xnet_handle_events(struct xnet_progress *progress,
}

xnet_handle_event_list(progress);
xnet_submit_io_uring(&progress->tx_io_uring);
xnet_submit_io_uring(&progress->rx_io_uring);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are acting somewhat like a deferred doorbell notification. I would move check for using io_uring here, and wrap both calls.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

@@ -237,12 +345,17 @@ static void xnet_progress_tx(struct xnet_ep *ep)
ssize_t ret;

assert(xnet_progress_locked(xnet_ep2_progress(ep)));
while (ep->cur_tx.entry) {
while (ep->cur_tx.entry && !ep->cur_tx.io_uring_busy) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look at how io_uring_busy is used. I mostly follow what you're doing here. But I want to understand where the flow differs between using io_uring and async sends and why we have those differences.

@sydidelot sydidelot force-pushed the io_uring branch 9 times, most recently from ffa47f0 to 46e49b3 Compare October 27, 2022 09:11
@sydidelot sydidelot force-pushed the io_uring branch 3 times, most recently from 9418fd8 to a287217 Compare November 4, 2022 14:55
@sydidelot sydidelot force-pushed the io_uring branch 3 times, most recently from 16335dc to 9922e47 Compare November 10, 2022 16:01
@sydidelot sydidelot closed this Nov 15, 2022
sydidelot pushed a commit that referenced this pull request Feb 19, 2023
If a posted receive matches with a saved receive, we may need to
increment the rx counter.  Set the rx counter increment callback
to match that of the posted receive.  This fixes an assert in
xnet_cntr_inc() accessing a NULL cntr_inc function pointer.

Program received signal SIGABRT, Aborted.
0x0000155552d4d37f in raise () from /lib64/libc.so.6
#0  0x0000155552d4d37f in raise () from /lib64/libc.so.6
#1  0x0000155552d37db5 in abort () from /lib64/libc.so.6
ofiwg#2  0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
ofiwg#3  0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6
ofiwg#4  0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347
ofiwg#5  0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354
ofiwg#6  0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153
ofiwg#7  0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188
ofiwg#8  0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445
ofiwg#9  0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558
ofiwg#10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91
ofiwg#11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants