-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX INCLUDED] Improrer use of the "write" flag causes flush() to be ignored, increasing used resources #4205
Comments
I confirmed on production that this brings the number of $Entrys back to ~ 1 per thread instead of thousands,millions etc... |
@ninja- actually the fix is not correct as this will cause unnecessary cpu usage. That said there is a bug and I'm working on a fix as we speak |
💛 remember to bump the 4.1 beta release after you're done |
@ninja- - Thanks for reporting and pointing us in this right direction! |
@Scottmitch :-) punching every bug on my way, I hope it's going to be the last one for some time! |
@normanmaurer note that I wasn't able to get flush() to break outside the first few seconds after a player connects. maybe that will give you some tip. |
@ninja- - @normanmaurer has a PR pending approval. It will be pushed upstream soon. When it is upstream we will ask you to verify the fix :) |
@Scottmitch 😍 it's 1 minute for me to verify the fix after it's commited somewhere |
Honestly I would just wrap doWriteSingle and doWriteMultiple with a while(...). It's not like there is a better way - we can't just return the function before everything is written. No idea what Norman did because I guess you guys are handling this PR on a private repo or something? |
@ninja- just have a bit patience... a pr with a fix will be upstream soon. |
Does it affect also 4.0? I have a too high CPU usage with 4.0 :( |
@WhiteTrashLord yeah it's a very old bug actually. I tried fixing it myself but I increased cpu usage 2x (while fixing outbound entry count) so I quickly reverted. But I doubt this bug increases cpu usage so it's probably not your main problem. @normanmaurer any news or are you going to finish this after weekend? |
@WhiteTrashLord - Yes it impacts 4.0. This bug is more focused on making sure Netty registers for a channel writable event (if the channel is unable to accept all data we try to write...1 case was missed). This bug would likely manifest it self first as memory issues rather than CPU. If you are able to diagnose unexpected Netty behavior feel free to open another issue. |
@Scottmitch so you guys are working on it outside of the public repo or something? |
@normanmaurer @Scottmitch any update after the weekend? |
…to write everything Motivation: writeBytes(...) missed to set EPOLLOUT flag when not all bytes were written. This could lead to have the EpollEventLoop not try to flush the remaining bytes once the socket becomes writable again. Modifications: - Move setting EPOLLOUT flag logic to one point so we are sure we always do it. - Move OP_WRITE flag logic to one point as well. Result: Correctly try to write pending data if socket becomes writable again.
@normanmaurer i'll deploy it tomorrow and check. I tried somthing similar and CPU usage rocketed...is your fix correct in this matter? Is it sane to expect a MAJOR drop in $entry count from 5M? |
@ninja- - I think the expectation is that we should only be registering for the write event from epoll when necessary. I wouldn't expect CPU to spike. Please try and report back. |
…to write everything Motivation: writeBytes(...) missed to set EPOLLOUT flag when not all bytes were written. This could lead to have the EpollEventLoop not try to flush the remaining bytes once the socket becomes writable again. Modifications: - Move setting EPOLLOUT flag logic to one point so we are sure we always do it. - Move OP_WRITE flag logic to one point as well. Result: Correctly try to write pending data if socket becomes writable again.
…fied Motiviation: We need to ensure the actual close to the transport takes place before the promsie of the write is notified that triggered it. This is needed as otherwise Channel.isActive(), isOpen() and isWritable() may return true even if the Channel should be closed already. Modifications: - Ensure the close takes place first Result: ChannelFutureListener will see the correct state of the Channel.
…to write everything Motivation: writeBytes(...) missed to set EPOLLOUT flag when not all bytes were written. This could lead to have the EpollEventLoop not try to flush the remaining bytes once the socket becomes writable again. Modifications: - Move setting EPOLLOUT flag logic to one point so we are sure we always do it. - Move OP_WRITE flag logic to one point as well. Result: Correctly try to write pending data if socket becomes writable again.
…to write everything Motivation: writeBytes(...) missed to set EPOLLOUT flag when not all bytes were written. This could lead to have the EpollEventLoop not try to flush the remaining bytes once the socket becomes writable again. Modifications: - Move setting EPOLLOUT flag logic to one point so we are sure we always do it. - Move OP_WRITE flag logic to one point as well. Result: Correctly try to write pending data if socket becomes writable again.
…to write everything Motivation: writeBytes(...) missed to set EPOLLOUT flag when not all bytes were written. This could lead to have the EpollEventLoop not try to flush the remaining bytes once the socket becomes writable again. Modifications: - Move setting EPOLLOUT flag logic to one point so we are sure we always do it. - Move OP_WRITE flag logic to one point as well. Result: Correctly try to write pending data if socket becomes writable again.
|
OK so in the end... With all probability I can now say goodbye to millions and get back on thousands, because the new version deployed on production already shows a good change in behaviour :) |
@ninja- SGTM. Thanks again for reporting. Please be sure to report if you see more bad behavior. |
@Scottmitch in past few days I audited all the epoll, the recycler, and the outbound buffer. They are 100% fine in their logic. But what I found at this point...few things, maybe you will find that interesting. a) a batch of small tiny packets is a killer for a proxy (in one read from ByteToMessageDecoder until ChannelReadComplete), to the number of outbound entries (growing eventually to millions). b) I tried to limit the number of packets being read by a frame decoder in a single batch to for example 20/100 etc. and then write them to the proxied connection, and wait until the are fully written to the other part to read next batch. That was a disaster as it increased time to fully log in and receive the world data to ~ 15s from ~ instant. c) the part when the proxy application waits with read() before the write to downstream has finished write() is in my opinion adding a bit of visible latency for the user. is that supposed to happen or - is it the best thing that could be done to keep it relatively in sync? should the autoread rule from netty examples apply to all proxy applications even where latency is quite important? d) since I found that the small packets are a killer, I am thinking of creating a cumulator for them. That is, it wouldn't write 1000/2000/7000 etc. small entries to the outbound buffer but would copy the data to a single buffer, and then write. The downside could be that it couldn't then use IovArray to write them in a batch so it may or it may be not a small performance regression. The buffers used are PooledUnsafeDirectByteBuf so that gives me a better idea on how to cumulate them(without copying), but it would be harder to implement, so maybe removing copy() is not worth it.
so create an object with PoolChunk[] + long[] + ByteBuffer[] etc. etc. something like that. recycle the original buffers without deallocating, then create a holder with their memory handles, and free them after write. But that aside I think it would be simpler to hijack the latest entry in outbound buffer and write some data to it so that's what I am going to try now. |
@ninja- - Would you mind opening a new issue so it is easy to keep thing separated? |
@Scottmitch actually, I finally solved the problem in 100% so I am "out of netty bugs". https://gist.github.com/ninja-/91a13b2630210891d429 - MessageSquasher.java I am telling you this because over past few months I involved the netty team into the fight with outbound entry spam and among LOTS of other issues and bugs, the small packets were the last silent killer. So I think this battle is finally over and I don't currently see any performance or latency regressions. As always, time will tell... You may think that this finding might be a good opportunity to improve netty itself :) In that case, I would recommend to the netty team - changing the lifecycle of objectes between they are added to the (outbound buffer) and flushed. So for example, there could be a window( actually, the window is already there in the part when invoker's write() calls outboundBuffer.addEntry so that could be overriden ) for per-transport implementations that could deal with messges when they are pre-added but not yet flushed. Such an implementation could choose to merge them(before a new entry is created) or override the entry.msg with it's magical object containing just "memory adresses" of merged buffers etc. it would leave a nice window for improvements that could benefit everyone. (what's harder to spot is that the entries take the buffers with them...that means that the number of buffers also increases and increases and new ones need to be allocated. so that's another way it could help with "traditional" apps in my opinion.) It's up to you guys if you think that's worth the time but that's a way of doing this that should benefit more types of netty apps and as always microbenchmarks on this would be interesting :) @normanmaurer @trustin @Scottmitch @nmittler If you guys think this improvement would be worth the time to code maybe I could open a new issue with an "improvement" tag and we could have some fun implementing some per-transport stuff to it or maybe let it be a ChannelOption so a use case can be chosen? |
@ninja- - What was your write/flush strategy before the |
@Scottmitch it is flushing in a batch that was equal to what came from the ByteToMessageDecoder. That is - if it received 1000 packets (from the frame decoder), it has then written these 1000 packets to the downstream. Then @ channelReadCompleted it would flush it. Getting rid of batching wasn't any help to the $Entry spam. When using MessageSquasher the number of $Entries stopped growing under high traffic and it can't go higher than 200 in TOTAL, before that - it would quickly get out of control to 200,000 easily and then to maybe millions. Another change I made was in regards to ByteToMessageDecoder. I changed it so original:
to
I don't think that calling them even instantly after decoding, instead of calling the channelRead for the full list would be any degradation to performance. so the buffers could be more quickly reused because when channelRead is over, the message is usually released. |
@ninja- - Any way you can provide a reproducer for the original problem?
What are you referring to when you say "calling them"?
What was the mechanism you used to do this? |
"calling them" to channelRead() in this meaning. allow the buffers to be reused more quickly AS it's usually released afer handling. But that's something that completes the way MessageSquasher work otherwise there would be no improvement Well the reproducer....I wouldn't call that a bug. Just a simple proxy + a frame decoder that decodes frames into buffers + a spam of small Packers in one read |
@Scottmitch hm I don't think it's related, @normanmaurer found the #4275 problem somewhere else. But if you decided to change ByteToMessageDecoder so it would fire channelRead(msg) each by one instead of doing it in a batch maybe it would also help in #4275. |
@ninja- - What I was wondering was how did you call |
@Scottmitch I copied original ByteToMessageDecoder to io.netty.channel package under new name while the package was inside the application. So should be more or less after each callDecode(...). @normanmaurer is evaluating whether this (negatively) could affect performance for some people. |
I'm a bit slow on this one... the line you reference is called after each |
@Scottmitch not care I will check and come back to you. |
@Scottmitch yeah actually I pointed a bad place. At the moment I am using a hack that fires it inside decode() so more like:
so I am not changing the original class. |
@ninja- - Thanks for clarifying. Please open a new issue so we can track. |
@ninja- @normanmaurer - FYI #4284 |
…able to write everything Motivation: writeBytes(...) missed to set EPOLLOUT flag when not all bytes were written. This could lead to have the EpollEventLoop not try to flush the remaining bytes once the socket becomes writable again. Modifications: - Move setting EPOLLOUT flag logic to one point so we are sure we always do it. - Move OP_WRITE flag logic to one point as well. Result: Correctly try to write pending data if socket becomes writable again.
Scenario:
netty 4.1
user calls writeAndFlush(packet)
Expected behaviour:
the packet is ALWAYS written and flushed...
(want a unit test in the future? stress writeAndFlush() + check that outbound buffer is empty)
What can happen:
the explict flush can be ignored because of improrer use of the OP_WRITE/EPOLLOUT flag.......
(it only gets called later on by the event loop)
it's a pretty serious bug because it increases used resources(5 million ChannelOutboundBuffer$Entrys anyone?)
please note that channel.isWritable() etc. etc. is NOT involved in this
the flag is not always turned on when there's work to do, it's the same problem on epoll and nio backends.
example(confirmed working) patch:
nio has the same behaviour and this line at NIO can be blamed
https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/socket/nio/NioSocketChannel.java#L271
or more like https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/socket/nio/NioSocketChannel.java#L283 should also set opwrite
I hope this gives you an idea of what is wrong and you guys can think of a "better" way to fix it(but I will also look for a such a way in the meantime)
The text was updated successfully, but these errors were encountered: