-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New kernel polling interface for Linux 4.18 (io_uring)? #1947
Comments
For me, the central part of this interface are io_submit and io_getevents the mechanism of which is exactly what is covered within libuv, with epoll_wait primitive and the algorithms around it. So probably this interface is best leveraged by other platforms to implement non-blocking I/O in a more systematic way, not of much use for libuv and its consumers. However, if the kernel has to offer performance benefits for this interface over the design around epoll_wait primitive, then it makes sense to review - I don't know whether that is the case. |
I sent patches to teach aio poll about more file descriptor types. But then aio poll got temporarily reverted wholesale and I don't think my patches were reapplied again when it was merged back in. It's probably too limited right now to be useful for libuv. I don't have time to work on it but for anyone interested, the ability to embed an epoll fd in an aio pollset is the key, that means you can always use epoll as a fallback. |
@gireeshpunathil The big benefit is that aio lets you use a ring buffer. That means you don't have to make a (slow) system call to check for pending events, you can just pull them from the ring buffer. |
https://lwn.net/ml/linux-fsdevel/20190121201456.28338-1-rpenyaev@suse.de/ - there seems to be some movement on adding a ring buffer to epoll. That would make life a little easier for libuv because it means we won't have to support two completely disparate AIO mechanisms. |
Background: https://lwn.net/Articles/776703/, the discussion on that article may have led to the above. |
I think there are two different interfaces being discussed -- io_uring is in Linux 5.1; I have a proof-of-concept Node.js addon that implements read and write using liburing here, using libuv's idle checks for polling (not sure if that's the best way). 5.1-rc3 is sufficient to test it. From limited benchmarking:
There's more advanced stuff I haven't gotten into with registering fds to reduce overhead, which would be useful for Node.js' streams. (see io_uring_register.2) Windows' overlapped I/O could be used equivalently, but I don't know about any of the other platforms. Would libuv ever use a mix of techniques for disk I/O (async I/O where it's usable, threadpool on platforms where it's not)? |
Note the post date however. :)
Yes, provided it's reliable. (That was always the issue with Linux AIO, it wasn't.) |
Oh good grief 😣. Based on the replies at least I wasn't the only one fooled! |
Linus is famous for a firm commitment to not break user-space, ripping an entire API set out is unlikely, it should set your spidey senses to twingling! |
Some more docs for io_uring have just been released: http://kernel.dk/io_uring.pdf |
Does anyone familiar with libuv have a minute to review the approach in the repo I linked to previously, just to make sure it's reasonable so that I can keep evaluating it and maybe move toward a PR, please? Specifically: I'm polling and draining the completion queue in an |
I think the way to go is to have AIO events reported to an eventfd that's watched by the event loop. You check the ring buffer and only enter epoll_wait() or io_submit() if it's empty. That should be safe because even if events arrive between the check and the system call, the fact that they signal the eventfd means you won't lose them, you'll return from the system call straight away. |
Thanks. It sounds like you're talking about AIO though. io_uring has no |
Isn't the event loop generally blocked on epoll, and won't wake up (aka return from epoll) and go check the uring unless something causes it to wake up? Like a notification on an eventfd. If you see the loop running continuously, its probably because you have a uv_idle_t, so have forced it to busy loop. |
I'm thinking in the context of Node.js where it's almost invariably running. |
So am I! :-) Why would it be constantly running? I guess if node always has outstanding I/O and never quite catchs up it would be always running. |
Maybe I'm wrong there :-) I assumed a busy server always has pending network IO and timers at least. edit - Jens has kindly sent me a patch to try out to add notifications to io_uring. I'll try it out shortly. |
I did send Zach a patch, it's also in my io_uring-next repository. BTW, for poll(2) type checking, it's also very possible to do that on the ring fd. That'll work as well, without having to add support for eventfd. |
edited after prototype code fixed to reduce epoll_ctl calls My test repo is updated to use an eventfd with That benchmark measured how long each read took for each of one thousand 1024B files, read 250 times. I used small files to try to get at the interface overhead and not the I/O throughput.
io_uring_submit on each read/write()
io_uring_submit in uv_prepare_t
I haven't finished the Windows overlapped+IOCP version for comparison yet.
currently possible, or you mean that would be an alternative interface to the eventfd? (Thanks again for the speedy patch!) |
torvalds/linux@9b40284 ("io_uring: add support for eventfd notifications") appears to be on track for Linux 5.2. I'm posting this mostly as a follow-up to the discussion above because I assume libuv would poll the io_uring fd itself, something that's supported in 5.1. @zbjornson W.r.t. libuv integration, I expect you want to feature-detect in You can't change 1 Currently only bit 0 is in use for |
Currently trying to fix a new test failure that arose from only the preparations for io_uring. Ref libuv#1947
@bnoordhuis thanks for the pointers. Is this what you had in mind?
(
I haven't figured out how to make this work. The ring fd added with |
Refactoring works. Need to actually use io_uring now. ref libuv#1947
Can ignore the first part of the previous comment; I opened #2322 for easier discussion. |
I have read through it, and the io_uring parts look fine to me. It's possible to remove some boiler plate code by adding a liburing dependency (like the setup code, or the ring fill-in), but in all fairness, that code is probably never going to get touched once it's merged. So I don't think that is a big deal, and there's nothing wrong with using the raw interface vs going with liburing. The only real upside I can think of is getting rid of the memory barriers. I'm not familiar with the libuv code base so I focused on the specific io_uring bits. |
@axboe It may be worth considering having applications go through liburing provides a convenient boundary for hooking io_uring emulation in userspace for either pre-io_uring kernel support or development/debugging purposes. liburing itself could implement this, enabled via some environment variable, or just get replaced via LD_PRELOAD. |
@vcaputo I don't think you can replace I also don't think it's necessary to pull in |
I think liburing make sens for apps like QEMU or nginx. This way, you have an abstraction layer between the app and io_uring. In our case, I think libuv should deal directly with io_uring because libuv is already an abstraction layer by itself. And one less dependency at compile-time and runtime :) EDIT: After reading @axboe's comment below, I changed my mind, and using liburing is the de-factor way to interact with io_uring. It keeps libuv's code simple. And I would trust @axboe's liburing than any other io_uring code. Thank you :) |
Have you looked at the liburing code? It's just a thin veneer over the syscall interface, it's not even 1000 lines of .c files. The value IMHO of having most stuff go through it is it's a single point for implementing stuff like an emulation for compatibility or debugging, and I strongly disagree with the claim of a "huge amount of impedance mismatch" pertaining to emulation. For ages there was protest to adding such asynchronous syscall interfaces to the Linux kernel because they never did anything that couldn't be done perfectly well from userspace after NPTL landed. Sure, now that syscalls are more expensive it won't be as fast when emulated, but there's no major barriers or contortions necessary to make it work at all. The submission queue maps to threads, threads perform the syscalls instead of the kernel, the results get serialized and pumped out the completion queue. No. Big. Deal. Now applications written for just liburing can work on any kernel, and libuv targeting liburing could be both developed on and tested against either pre-io_uring kernels or hell even non-linux systems like OSX. That sort of layer belongs in liburing, and if it doesn't hurt libuv significantly to go through liburing, my vote is to do the same, since many applications use libuv. Just my $.02, I don't really have a dog in this race otherwise. |
I would say: if you can use liburing, then go for it. liburing has solved some tricky bugs in the past, and there a few surprising edge cases you need to get right if you want to do it yourself. liburing is essentially the test suite for the kernel so it's a pretty good reference implementation. |
What happened here now? nginx got io_uring support already. |
This comment was marked as resolved.
This comment was marked as resolved.
The |
liburing should just be considered a reference implementation. I've written a few things that use the raw interface, so that's quite possible too. Or even a mix - using liburing for the ring setup to avoid a bunch of boiler plate code, then use the raw interface after that. It's possible to be slightly faster with the raw interface. As with any kind of API, liburing will add some fat to the middle and have some indirection, as well as catering to cases that any one specific user may not care about. In terms of API stability, both the kernel and liburing won't change (on purpose...). That said, unless there's a good reason not to, I'd probably just stick with liburing. |
|
Wouldn't this be a big step for libuv? |
libuv needs more maintainers, and maybe those maintainers should be getting paid pretty well given how much money people are making with libuv. The PR could probably use rebasing at this point too, but it can be used as-is by anyone brave enough, so give it a shot. :+ ) |
I’ve recently stumbled upon this https://github.com/axboe/liburing/wiki/io_uring-and-networking-in-2023 and my employer (ISC) would be willing to sponsor io_uring support in libuv. We are not big tech company, so reasonable proposals only ;). I think you might know how to reach me via email. |
@oerdnj I'll be in touch. It's something I've been chipping away at for some time, if ever so slowly. |
There’s #3979 too. As for the network sockets, we are talking about that, but the benefit seems to be smaller as epoll is already very effective. @bnoordhuis is willing to do the work, but as far as I understood it, it would require new APIs. Anybody is welcome to chime in with sponsoring the work too though. |
New APIs - that's right. To work well with io_uring, reading data should be request-based (like writing, connecting, etc. already is) because the "firehose" approach isn't a good fit. Libuv users will be able to opt in to the new behavior. I have a pretty good idea of what needs to change where, but it's going to be a fair amount of work. Sponsorship welcome. |
@bnoordhuis is there any other async operation from libuv that could benefit from migration to io_uring? |
Would libuv be able to benefit from leveraging the new kernel polling interface that was just released with Linux kernel version 4.18?
I'm curious if I should target the kernel API directly or if it would be better to abstract it as part of libuv.
The text was updated successfully, but these errors were encountered: