Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft - io_uring - GSoC 2020 #10356

Closed
wants to merge 106 commits into from
Closed

Draft - io_uring - GSoC 2020 #10356

wants to merge 106 commits into from

Conversation

1Jo1
Copy link
Member

@1Jo1 1Jo1 commented Jun 19, 2020

This draft can be build on Linux Kernel 5.9-rc1 and Linux Kernel 5.8.3
I came up with some ideas on how to implement it, but I'm not sure what is the right way, so feel free to comment.

I created an event Hashmap to keep track of what kind of events are coming off the completion queue, that means you have to save the eventId to the submission queue->user_data to identify events

Write

  • io_uring events are not in order by default but write events in netty should be in order, there is a flag for that IOSQE_IO_LINK (related to one channel)
  • This function doWrite(ChannelOutboundBuffer in) writes until readableBytes is 0, so that means you have to store Bytebuf somewhere

Accept

  • you need the address of the peer socket to create a new ChildChannel. One solution would be to save the filedescriptor in the event ServerChannel because acceptedAddress argument is saved in AbstractIOUringServerChannel to call serverChannel.createNewChildChannel
  • My idea is whenever accept event is executed, a new accept event is started in EventLoop

Read

  • I'm wondering how to get the pipeline instance to fireChannelRead(ByteBuf) in EventLoop and have to save Bytebuf(as mentioned above) or is it possible to get the same Bytebuf reference from ByteAllocatorHandle?
  • as discussed above, save the file descriptor in Event and then invoke pipeline.fireChannelRead, WDYT?
  • How often is Channel.read called or doBeginRead called?

what's the difference between ByteBufUtil.threadLocalDirectBuffer and isDirectBufferPooled?

what about naming? IOUringSocketChannel, IO_UringSocketChannel or UringSocketChannel WDYT?

#10142

Motivation:
to get a better feeling of how it could be implemented

Result:
- cant be build yet -> still work in progress
@netty-bot
Copy link

Can one of the admins verify this patch?

3 similar comments
@netty-bot
Copy link

Can one of the admins verify this patch?

@netty-bot
Copy link

Can one of the admins verify this patch?

@netty-bot
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

@hyperxpro hyperxpro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright year should be 2020, not 2014.

Copy link
Contributor

@hyperxpro hyperxpro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright year should be 2020, not 2014, and missing license header.

@normanmaurer
Copy link
Member

I will have a look tomorrow latest...

Motivation:
licenses and netty dependencies are missing

Modification:
added licenses and netty dependencies

Result:
netty io_uring is buidable
Motivation:

prototype is not buildable and JNI io_uring implementation is missing

Modifications:

-added io_uring implementation(source from https://github.com/axboe/liburing)
-eventloop stores netty io_uring pointer which is used for two ring buffers to execute events like read and write operations in JNI
-memory barriers already included in JNI(will be changed in the future)
-pom file adopted from native epoll

Result:

prototype can finally be built
@1Jo1
Copy link
Member Author

1Jo1 commented Jun 29, 2020

You can already build since I added JNI C part, but this implementation is inflexible as most io_uring ring logic is in C. One solution would be to create an IoUringSubmissionQueue and IoUringCompletionQueue java class model which contains some information from the ring buffer

it looks something like this

public class IoUringSubmissionQueue {
  private long kHeadAddress; // (k -> kernel)
  private long kTailAddress;
  private long kRingMaskAddress;
  private long kRingEntriesAddress;
  private long fFlagsAdress;
  private long kDroppedAddress;
  private long arrayAddress;
  
  private long sqeAddress;
  
  private long sqeHead;
  private long sqeTail;

}

With the help of memory barriers like loadFense or storeFense and unsafe you can access the addresses directly without JNI call , it would also be much more efficient, there would be only 2 JNI calls for io_uring_setup and io_uring_enter syscall

@normanmaurer
Copy link
Member

@chrisvest may be interesting to you as well

Copy link
Contributor

@chrisvest chrisvest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Josef, very exciting to see io_uring support being worked on!

I had a bunch of random comments, please take a look. The highest thing on the list for me, is that I would like to see more of the io_uring integration code be implemented in Java. It's easier to debug the Java code, even if working with the native memory via buffers or Unsafe is awkward compared to C.

(PS. If you don't know what's going on in the pom.xml files, then that makes two of us, so no worries.)

transport-native-epoll/src/main/c/netty_epoll_native.c Outdated Show resolved Hide resolved
transport-native-io_uring/pom.xml Show resolved Hide resolved
</build>
</profile>
<profile>
<id>linux</id>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have two profiles? I guess they serve different purposes, but it's not clear to me what those are.

(or 3 when counting linux-aarch64 as well)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we need only 2 profiles for aarch64 and x86, but we still need maven-enforcer-plugin, so we have to move it somewhere

@normanmaurer normanmaurer linked an issue Jul 2, 2020 that may be closed by this pull request
@normanmaurer
Copy link
Member

@1Jo1 is there any update ? Did you see the comments of @chrisvest ?

Motivation:
JNI ring buffer implementation is inflexible and not really efficient

Modifications:

-RingBuffer instance is created in JNI which contains io_uring ring buffer information
-move the JNI ring buffer logic to Java
-added Todos
-using unsafe memory barriers loadFence and storeFence
-revert epoll file

Result:

this java ring buffer implementation is more flexible and efficient
@hyperxpro
Copy link
Contributor

IMHO, Documentation can be improved for some part of the code.

@1Jo1 1Jo1 requested a review from chrisvest July 7, 2020 11:31
@1Jo1
Copy link
Member Author

1Jo1 commented Jul 7, 2020

@normanmaurer thanks @chrisvest 👍 yeah I saw his comments :)

  • the RingBuffer instance(which contains io_uring information) is created in JNI
  • created a small ring buffer integration test(read operation is still missing)
  • accessing attributes of io_uring_sqe and io_uring_cqe via unsafe
  • move JNI ring buffer logic to Java

next goal will be to implement the IOUringEventLoop & Channels this week(besides the todos)

Copy link
Contributor

@chrisvest chrisvest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look (not a full review) at this again, and had some comments.

Motivation:

missing documentation and read event test

Modifications:

add documentation and read event test

Result:

better documentation and tests
Copy link
Contributor

@chrisvest chrisvest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Josef. I took a closer look at your recent changes (nice work) and had some more comments.

@1Jo1
Copy link
Member Author

1Jo1 commented Jul 15, 2020

@normanmaurer @franz1981 @chrisvest I am trying out an io_uring echo example, netty use reflection to invoke ServerSocketChannel without constructor arguments(https://github.com/netty/netty/blob/4.1/example/src/main/java/io/netty/example/echo/EchoServer.java#L58) but the problem is that ServerSocketChannel and EventLoop need access to same RingBuffer, is it possible to pass arguments to the ServerSocketChannel constructor somehow?

@normanmaurer
Copy link
Member

@1Jo1 why not just "cast" the EventLoop and so get access to it ? We do something similar for the IovArray:

https://github.com/netty/netty/blob/4.1/transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java#L504

@1Jo1
Copy link
Member Author

1Jo1 commented Jul 15, 2020

oh yeah thanks a lot 👍

@normanmaurer
Copy link
Member

Alternative you could also just store it in a FastThreadLocal and access when needed.

1Jo1 and others added 10 commits September 3, 2020 07:48
Motivation:

when sqe is full -> no timeout is added to the sqe

Modifications:

it is called submit() to flush the sqe entries to add timeout

Result:
Motivation:

Due a bug SO_BACKLOG was not supported via ChannelOption when using io_uring. Be

Modification:

- Add SO_BACKLOG to the supported ChannelOptions.
- Merge IOUringServerChannelConfig into IOUringServerSocketChannelConfig

Result:

SO_BACKLOG is supported
Do support SO_BACKLOG in io_uring
Call IOUringEventLoopGroup.shutdownGracefully() after test is done.
Motivation:

We should only reset the RecvByteBufAllocator.Handle when a new "read
loop" starts. Otherwise the handle will not be able to correctly limit
reads.

Modifications:

- Move reset(...) call into pollIn(...)
- Remove all @ignore

Result:

The whole testsuite passes
Fix bug related to reset the RecvByteBufAllocator.Handle on each read
Call handle.readComplete() before fireChannlReadComplete() and also c…
try {
if (curDeadlineNanos != prevDeadlineNanos) {
prevDeadlineNanos = curDeadlineNanos;
submissionQueue.addTimeout(deadlineToDelayNanos(curDeadlineNanos));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see in the doc, the default behaviour of IORING_OP_TIMEOUT is to also complete if any other event completes first, i.e. it won't persist across calls to io_uring_enter with GET_EVENTS.

So something might need to be tweaked, maybe just to pass a very large value for the len field which should make timeouts stick until expired or cancelled.

normanmaurer and others added 10 commits September 4, 2020 17:09
Motivation:

We should submit multiple IO ops at once to reduce the syscall overhead.

Modifications:

- Submit multiple IO ops in batches
- Adjust default ringsize

Result:

Much better performance
Submit IO in batches to reduce overhead
Motivation:

At least in the throughput benchmarks it has shown that IOSQE_ASYNC
gives a lot of performance improvements. Lets enable it by default for
now and maybe make it configurable in the future

Modifications:

Use IOSEQ_ASYNC

Result:

Better performance
Use IOSQE_ASYNC flag when submitting
Motivation:

We should better use JNI to lookup constants so we are sure we not mess
things up

Modifications:

Use JNI calls to lookup constants once

Result:

Less error prone code
Motivation:

There is currently a bug in the kernel that let WRITEV sometimes fail
when IOSEQ_ASYNC is enabled

Modifications:

Don't use IOSEQ_ASYNC for WRITEV for now

Result:

Tests pass
Add workaround for current kernel bug related to WRITEV and IOSEQ_ASYNC
@normanmaurer
Copy link
Member

Just a headsup we will move the io-uring branch and development to the netty repository starting next week and so will close this PR then. The development will continue in the main repos then.

Thanks again to everyone involved so far and especially to @1Jo1 who did an awesome job as part of GSOC 2020

@hyperxpro
Copy link
Contributor

Just a headsup we will move the io-uring branch and development to the netty repository starting next week and so will close this PR then. The development will continue in the main repos then.

Thanks again to everyone involved so far and especially to @1Jo1 who did an awesome job as part of GSOC 2020

Can we expect IO_URING in 4.1.52.Final?

@normanmaurer
Copy link
Member

@hyperxpro clearly no ... it’s still under heavy development. That said if people are brave enough they can build it by their own for now and deploy ;) That said I wouldn’t call it prod ready for sure

@1Jo1
Copy link
Member Author

1Jo1 commented Sep 5, 2020

@chrisvest @njhill @normanmaurer @franz1981 thank you for your valuable feedback :) I really appreciate that

@normanmaurer
Copy link
Member

Let me close this one now as GSOC is officially over. Let me just highlight what an amazing job @1Jo1 did here... Because of him io_uring based transport will be a reality soon in netty.

The development process of io_uring will continue within the netty project from now on https://github.com/netty/netty/tree/io-uring and all new PRs should be created against this branch

@njhill
Copy link
Member

njhill commented Sep 7, 2020

I'll echo that and say congrats and thanks @1Jo1 for the super impressive work. The last bit is easy after everything you've done :)

@axboe
Copy link

axboe commented Sep 7, 2020

The last bit is easy after everything you've done :)

And tons more fun ;-)

@1Jo1
Copy link
Member Author

1Jo1 commented Sep 7, 2020

The last bit is easy after everything you've done :)

And tons more fun ;-)

oh yeah that's true, I’ve never learned as much in such a short amount of time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IO_URING as epoll alternative