New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize writeAndFlush for lots of small messages (do smart autoflush) #1759
Comments
You can use Channel.write(...) if you not want to flush and so do a syscall. I wonder why you not just do this. Can you give more details ? |
@normanmaurer I cannot avoid flushing, becuase otherwize server may stuck. Typical request-response server code is this:
If I avoid flush in this code, then response won't be sent to the network probably indefinitely. And if I do flush, it is expensive. I could flush queue by timer, but it is bad for latency. What I want is smart flush: if there's a flush operation queued, and I call writeAndFlush, netty should insert a message before that queued flush. Suppose, we have outgoing queue with commands:
If we call
And for higher performance, after
And one more
And should be:
|
@stepancheg so basically what you want is to "merge" all flushes as long as it is done within the same method ? |
@normanmaurer no, I want to "merge" all pending flushes in the channel. |
Why don't you just call |
@trustin because of two reasons:
|
@stepancheg sorry I still don't get it... could you give more details ? |
Let have a simple chat where server should broadcast a received message among all users. In simple implementation we have to write&flush on every channel for each message typed. Using BetterWrite.write these messages will be flushed as soon as possible but these flushes will be merged. In high traffic situation this approach will eliminate most of flushes w/o introducing noticeable latency. |
Now I see your point. Need to think about a reasonable solution though. |
Same issue of flush for my apns-netty implementation too. I am sending a flush after every write hoping that will be true streaming write. But it is not . There is an unwanted flush system call called after every 256 bytes. This will waste the buffer in the network interface. Can we do something like this A predicted & force flush . Assume the next byte I am going to write can cause a flush . Add a flush call before this just one byte. I hope in async I/O case the number of such buffers will be high and difficult to deal with :) |
I also am contending with the flushing issues. My proxy server will decode, handle event, encode. This happens on the client channel and the server channel. Both speak at the same time. So both are flushing. Now multiply that with 30-40 clients. All for small 3 Byte payloads every second both ways. (Not my choice) Game I am proxing is meh. EDIT: I will play with channel group flushing actually and flush all channels at one time every second... |
Perhaps what would satisfy this problem is having an optional periodic flush? But please enlighten me, why is it necessary to flush the channel? Does the internal socket implementation rely on external flushing to pass data to the transport layer? Most (if not all) network stacks rely on buffers internally and use algorithms such as nagle to improve batch sending (unless TCP_NO_DELAY is set). Coming from C++ with boost's asio, for example, no flushing is required. |
@Climax777 BTW, better name for |
Thanks for clarifying @stepancheg I would suggest a periodic flusher, with a selectable period. |
@Climax777 periodic flusher is unsuitable for the most tasks: it either flushes too rarely, or consumes too much CPU (or both). |
We can make a nagle in netty. |
@etaty you seems not to understand the issue. The problem is that if user quickly calls It is unrelated to nagle. It is about sending the whole queue in single batch instead of sending with multiple
Netty already knows when a socket is ready for new data. It is the essence of async IO. |
If not a periodic flushing task, what about defining thresholds? Perhaps On Tue, Mar 17, 2015 at 10:13 PM Stepan Koltsov notifications@github.com
|
Problem with threshold, if is that if you network traffic is not constant, then some messages may hang in the queue indefinitely.
Please explain drawbacks of solution I proposed. |
@stepancheg I've been looking at your repository. It seems you're onto something here. Your solution will guarantee that all writes sent before the scheduler starts running the flush task, will get flushed with one flush syscall. That includes calls from different concurrent threads? Also just a question, the listener you add to the write future, does it get called after flushing or after queuing in netty (I know this is a netty internal thing, but I'm not sure now). |
Yes. With one Better implementation could flush when chunks reach like 64k to avoid excessive memory allocations on network stalls.
Sorry, I don't remember, and don't have an IDE right now to look. |
another suggestion, if you want to batch multiple pending writes in a single flush, it would be good to have a config which controls the maximum number of writes in a batch. |
Maybe it would be good, but it does not solve the problem. |
I focus on it, too. |
How about flushing only when writer is idle? IdleStateHandler e.g. application only writes to the channel:
After say 500ms (or whatever latency is preferred), IdleStateHandler fires a writer idle event at which point the flush happens:
Not sure if there's an overhead involved with this approach and or it effects performance negatively.... Feedback is appreciated. |
Easiest way to implement batching is when you have reading and writing channels registered on the same netty thread. Then you could catch a point when read operations have been completed on all channels, and then flush resulting dirty channels. I.e. approach looks like this:
This shall work for proxy like applications which doing a flow like: decodeOnChannel1 -> lightbusinesslogic -> encodeOnChannel2 all the way in same netty thread. This approach shall give noticeble impact on cpu load_avg. |
I build a message counter and a timer to do flush. |
I have written here a ChannelOutboundHandler that implements a queuing system to avoid flooding the socket and to optimize its usage. In theory, this should never exhaust the socket no matter what you do as writes are only done after previous writes are completed. It should be clear, however, that this is not an implementation of batching as was proposed in the comment above. Batching could be integrated with this handler in the following way: when polling from the queue, poll up to X messages or N bytes and write them together. This should reduce syscalls and might further optimize throughput. I haven't tried it though... PS: when writing (from outside this handler) make sure to use |
i have the same problem |
Nobody mentioned FlushConsolidationHandler yet. Looks like it's designed to solve the same problem @stepancheg ? |
LGTM, thanks |
Yeah—does the job for me, too. jchambers/pushy#657 has some before-and-after benchmarks, if anybody's curious. |
Let me close this |
@stepancheg, does FlushConsolidationHandler solve the issue for you? |
@m4ce looks so, hard to say, I didn't use netty for long time. |
Typical server code looks like this:
This code is very ineffective when size of response is small, and number of request/responses are huge. Because this code seems to issue
send
syscall after eachwriteAndFlush
operation.There is an easy way to implement
writeAndFlush
efficiently. I implemented it on top of Netty, but I think it should be implemented inside of Netty.Algorithm in pseudocode is this:
I have sample project:
https://github.com/stepancheg/netty-td/
It has an implementation of this algorithm on top of Netty: BetterWriter.write method.
Tasks class is lock-free helper to schedule not more than one write-queue-and-schedule task.
LockFreeQueue implementation (it is actually lock-free-stack; lock-free-stack is faster, and difference is not important for this issue).
Edit: simpler implementation using atomics.
I did simple test: client sending 4 byte messages and server replying with 4-byte messages.
With default client and server using writeAndFlush test result is about 30k rps (on my notebook), and with BetterWriter.write implementation result is about 200k rps. It is 6 times speed up!
WriteAndFlush is very convenient operation for situation when size of queue is unknown (for example, in server code I don't know whether my request is last or not, and whether I can omit flushing). So IMHO it should work well by default, and such smart buffering should not be delegated to the user of Netty.
The text was updated successfully, but these errors were encountered: