Strange network lag #1720

TsFreddie · 2015-02-01T12:24:37Z

Some players have connection problems in my server. The problem seems strange because those player have no problem on sending data.

We can see those player move, build on time, but they receive any effect like moving, building, activating TNT,chatting with a huge lag, just like they live in the past but affecting present time.

NiLSPACE · 2015-02-01T12:30:21Z

We just had a major network rewrite, so it might have something to do with that. Still it's weird, because when I tested it I didn't have any problems.

TsFreddie · 2015-02-01T12:35:18Z

I'll try to provide a video.

Howaner · 2015-02-01T12:39:09Z

I had this problem with the old network system.
Maybe the same bug exists in the new network system.

TsFreddie · 2015-02-01T12:58:18Z

Video is up http://youtu.be/TPs1RUYN-uw

TsFreddie · 2015-02-01T13:00:24Z

sorry, video link fixed.

LO1ZB · 2015-02-01T17:27:32Z

If I set-up a server on my machine, could you test, if this "lag" happens there, too?

TsFreddie · 2015-02-01T22:52:37Z

We already tried another server, it happened on the same person, he was still out of sync. The data he send to the server seems completely fine.

LO1ZB · 2015-02-01T23:05:16Z

Which server did you tried?

TsFreddie · 2015-02-01T23:09:14Z

minecraft.planetx.com

tigerw · 2015-02-01T23:18:14Z

I expect everything works fine with Vanilla server software?

TsFreddie · 2015-02-01T23:39:48Z

yes, we played some spigot and vanilla server, everything is fine.

tigerw · 2015-02-03T17:42:15Z

Are they connecting through IPv4 or 6?

TsFreddie · 2015-02-03T17:46:09Z

IPv4 I believe.

madmaxoft · 2015-02-04T07:36:03Z

Are you capable of running the server under your platform's debugger (MSVC / gdb)? If so, please PM me on the forum ( http://forum.mc-server.org/private.php?action=send&uid=956 ) and I'll guide you to the things that could help us hunt this issue down. Specifically, I'm interested in the m_State for each connected client, the state of the chunks they're waiting to send and possibly others.

Another way to tackle this could be to let the server log all the communication to the clients; this is simple to set up (just pass /logcomm parameter on the commandline when starting the server) but will be more difficult to process.

TsFreddie · 2015-02-05T04:43:22Z

I don't know if we can reproduce this issue now, it seems like those people occur extreme packets losing. But the only one of those player who I can contact with already fixed his router.It is like some extreme weak connection didn't been kicked and let them stay in the server?

I can try to use gdb, btw, but I think I need some help on that.

madmaxoft · 2015-02-06T20:40:00Z

Unless we get someone else who sees this behavior again, there's not much we could do about it.

Closing; feel free to reopen if there's any new info.

Howaner · 2015-02-06T20:59:32Z

I have the same bug with AntherusCraft.
The network buffer will overload with chunks. This cause a big desync (Often more than one minute)
If he set a viewdistance of 4 chunks, we can play normally.

If AntherusCraft is on my bukkit server, he receive the chunks slower than in MCServer.
But he can play without desyncs.

A second bug:

AntherusCraft have a big desync
/kick AntherusCraft
AntherusCraft has left the server
One minute later -> Masy receives the kick
The server doesn't close the connection. Why?

worktycho · 2015-02-06T22:20:24Z

I think the problem is that only buffering the server has to prevent overloading happens at the point of deciding to send chunks initially. There are several queues between that and the actual sending allowing plenty of bunching if the threads involved aren't getting CPU time frequently enough. What we need is for the protocol to be able to either buffer chunk sending or refuse to send chunks based on how much is buffered at the network layer.

madmaxoft · 2015-02-07T09:00:18Z

I guess we could add a sort of "priority" to packets. All normal packets are queued with "normal", and the chunk data packets are queued with "low". Then in the client tick function, we check the amount of data that has been sent lately and decide whether to send a normal priority or low priority packet next. There will have to be some intelligence in this, since there's not much info we can get about the data that has actually been received.
I'm just afraid that we might run into trouble with priorities, same as Vanilla - due to the priority classes it uses, it is possible to actually crash the client if you send a combination of two packets too quickly upon login: https://bugs.mojang.com/browse/MC-42286

worktycho · 2015-02-07T12:47:10Z

Client tick is too high level. It needs to be at protocol or lower, or we need protocol feedback. The problem with client tick is we end up sending a fixed amount of data per tick which is stupid as it will still fail on slow connections and be inefficient on fast ones.

What we need is to be able to send packets based on how big the network buffer is, so we can adapt to different network speeds. We're sending the data over TCP so the size of the network buffer is an indication of how much data has been received. The OS only removes data from its buffer when the it receives an ACK for the data. So libevent can't write again until data has been received.

madmaxoft · 2015-02-07T13:09:04Z

I don't think it's that bad. A tick means 20 times a second. As far as I know, the official server sends all of its data in ticks as well; we will be sending only the low priority data in ticks, the normal-priority data can be sent immediately. This means that each tick the ClientHandle will decide how many chunks to send, not much else. Whether that decision is based on the buffer sizes or something else is up to the implementation.

We may need to rate-limit the entity movement related packets, too. I have a feeling that the current AI generates a packet for each entity on each tick, this might be another good place to improve network performance.

worktycho · 2015-02-07T13:54:30Z

But how do you decide how many? The clientHandle doesn't know how big the buffer sizes are and that information would have to cross three abstraction layers to reach it.

madmaxoft · 2015-02-07T16:30:34Z

No-one knows how big the buffers are. The OS may support unlimited buffers; LibEvent uses unlimited buffers. But we can be pretty sure that if there is any leftover data in the LibEvent outgoing buffer, then we're sending too fast and should postpone some chunk packets.

worktycho · 2015-02-07T18:57:13Z

Ok, so if we can't use OS buffer sizes what can we use? Unless we can get some information out of the OS scheduler about ACKs and the TCP transfer Window any solution we come up with will have problems with head of line blocking on some combination of line bandwidth/latency/drop rate.

What the real solution is for Mojang to take a real look at performance, and adjust to protocol to something looking more like a low latency event transfer protocol, but that's not something we can control.

This is only a test to try find the cause of #1720.

madmaxoft · 2015-02-07T23:17:54Z

I think the very simple decision "if there's anything still queued in LibEvent, then don't send any chunks", is something that might help.

@Howaner can you please try compiling and running the OutgoingDataLog branch in debug mode? It logs sizes of all outgoing data, including the LibEvent outgoing queue size. This should provide some insight into whether this can be used at all or not.

Howaner · 2015-02-07T23:43:09Z

http://hastebin.com/fatobilote.1c

And this is a big security hole:

     [de0ce2ec8c5f0ee|00:41:15] Writing 15660 bytes of data, there are already 8746276 bytes queued
     [de0ce2ec8c5f0ee|00:41:15] Writing 2901 bytes of data, there are already 8745552 bytes queued
     [de0ce2ec8c5f0ee|00:41:15] Writing 22200 bytes of data, there are already 8748453 bytes queued
     [c54a7827711cee28|00:41:15] Executing console command: "kick Newskater"
info [c54a7827711cee28|00:41:15] Kicking player Newskater for "You have been kicked."
     [c54a7827711cee28|00:41:15] Sending a DC: "You have been kicked."
     [de0ce2ec8c5f0ee|00:41:15] Writing 3615 bytes of data, there are already 8770653 bytes queued
     [de0ce2ec8c5f0ee|00:41:17] Destroying entity #485 (cPickup)
     [de0ce2ec8c5f0ee|00:41:17] Destroying entity #486 (cPickup)
     [de0ce2ec8c5f0ee|00:41:31] Destroying entity #217 (cPickup)
     [de0ce2ec8c5f0ee|00:41:31] Destroying entity #218 (cPickup)
info [de0ce2ec8c5f0ee|00:41:34] [Newskater]: blabla .
     [de0ce2ec8c5f0ee|00:41:52] Destroying entity #721 (cChicken)

I kicked Newskater. The server should clear the queue and send the disconnect packet.
And shouldn't receive any new packets from a kicked client.

worktycho · 2015-02-07T23:52:00Z

@xoft, I think we've been suggesting the same thing, but I've been using the wrong terminoligy because that is what I think.

planetx · 2015-02-08T22:03:14Z

How are you hosting your server?

minecraft.planetx.com and mc.planetx.com are hosted on high speed networks with an MTU of 9000. Maybe that it the problem?

minecraft.planetx.com is on FIOS with a 50 Mbs up and down and mc.planetx.com is in the Amazon Cloud -- not able to measure the speed. ;)

yangm97 · 2016-11-08T22:56:20Z

Having this too, seems like a very intermittent issue. It happens on my server but if I try running on my machine, with same environment, the problem doesn't happen.
Never mind, there's a difference between my machine and the server: bungeecord.

tigerw · 2020-08-23T19:08:12Z

Network connection not actually being closed on kick was fixed in #3999, tracked #1895 and in a bunch of others.

About this issue, unsure if this is related but we have a forced 1 tick / 50ms delay for every batch of packets sent, since ProcessProtocolInOut is only called per tick for some reason. Most visible in the multiplayer server list's ping: note how Java servers have much lower ping than even a local Cuberite instance.

Fix probably involves #4744 (a dedicated async network loop from ASIO) and even better with @worktycho 's serverTick branch so packets are first digested asynchronously by the network thread, and then the parsed action is queued onto the World's tick thread.

TsFreddie closed this as completed Feb 1, 2015

TsFreddie reopened this Feb 1, 2015

madmaxoft closed this as completed Feb 6, 2015

Howaner reopened this Feb 6, 2015

madmaxoft added a commit that referenced this issue Feb 7, 2015

cTCPLink: Added outgoing data size logging.

3ffa2fc

This is only a test to try find the cause of #1720.

worktycho closed this as completed Feb 7, 2015

worktycho reopened this Feb 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange network lag #1720

Strange network lag #1720

TsFreddie commented Feb 1, 2015

NiLSPACE commented Feb 1, 2015

TsFreddie commented Feb 1, 2015

Howaner commented Feb 1, 2015

TsFreddie commented Feb 1, 2015

TsFreddie commented Feb 1, 2015

LO1ZB commented Feb 1, 2015

TsFreddie commented Feb 1, 2015

LO1ZB commented Feb 1, 2015

TsFreddie commented Feb 1, 2015

tigerw commented Feb 1, 2015

TsFreddie commented Feb 1, 2015

tigerw commented Feb 3, 2015

TsFreddie commented Feb 3, 2015

madmaxoft commented Feb 4, 2015

TsFreddie commented Feb 5, 2015

madmaxoft commented Feb 6, 2015

Howaner commented Feb 6, 2015

worktycho commented Feb 6, 2015

madmaxoft commented Feb 7, 2015

worktycho commented Feb 7, 2015

madmaxoft commented Feb 7, 2015

worktycho commented Feb 7, 2015

madmaxoft commented Feb 7, 2015

worktycho commented Feb 7, 2015

madmaxoft commented Feb 7, 2015

Howaner commented Feb 7, 2015

worktycho commented Feb 7, 2015

planetx commented Feb 8, 2015

yangm97 commented Nov 8, 2016 •

edited

tigerw commented Aug 23, 2020

Strange network lag #1720

Strange network lag #1720

Comments

TsFreddie commented Feb 1, 2015

NiLSPACE commented Feb 1, 2015

TsFreddie commented Feb 1, 2015

Howaner commented Feb 1, 2015

TsFreddie commented Feb 1, 2015

TsFreddie commented Feb 1, 2015

LO1ZB commented Feb 1, 2015

TsFreddie commented Feb 1, 2015

LO1ZB commented Feb 1, 2015

TsFreddie commented Feb 1, 2015

tigerw commented Feb 1, 2015

TsFreddie commented Feb 1, 2015

tigerw commented Feb 3, 2015

TsFreddie commented Feb 3, 2015

madmaxoft commented Feb 4, 2015

TsFreddie commented Feb 5, 2015

madmaxoft commented Feb 6, 2015

Howaner commented Feb 6, 2015

worktycho commented Feb 6, 2015

madmaxoft commented Feb 7, 2015

worktycho commented Feb 7, 2015

madmaxoft commented Feb 7, 2015

worktycho commented Feb 7, 2015

madmaxoft commented Feb 7, 2015

worktycho commented Feb 7, 2015

madmaxoft commented Feb 7, 2015

Howaner commented Feb 7, 2015

worktycho commented Feb 7, 2015

planetx commented Feb 8, 2015

yangm97 commented Nov 8, 2016 • edited

tigerw commented Aug 23, 2020

yangm97 commented Nov 8, 2016 •

edited