Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange network lag #1720

Open
TsFreddie opened this issue Feb 1, 2015 · 30 comments
Open

Strange network lag #1720

TsFreddie opened this issue Feb 1, 2015 · 30 comments

Comments

@TsFreddie
Copy link
Contributor

Some players have connection problems in my server. The problem seems strange because those player have no problem on sending data.

We can see those player move, build on time, but they receive any effect like moving, building, activating TNT,chatting with a huge lag, just like they live in the past but affecting present time.

@NiLSPACE
Copy link
Member

NiLSPACE commented Feb 1, 2015

We just had a major network rewrite, so it might have something to do with that. Still it's weird, because when I tested it I didn't have any problems.

@TsFreddie
Copy link
Contributor Author

I'll try to provide a video.

@Howaner
Copy link
Contributor

Howaner commented Feb 1, 2015

I had this problem with the old network system.
Maybe the same bug exists in the new network system.

@TsFreddie
Copy link
Contributor Author

Video is up http://youtu.be/TPs1RUYN-uw

@TsFreddie TsFreddie reopened this Feb 1, 2015
@TsFreddie
Copy link
Contributor Author

sorry, video link fixed.

@LO1ZB
Copy link
Contributor

LO1ZB commented Feb 1, 2015

If I set-up a server on my machine, could you test, if this "lag" happens there, too?

@TsFreddie
Copy link
Contributor Author

We already tried another server, it happened on the same person, he was still out of sync. The data he send to the server seems completely fine.

@LO1ZB
Copy link
Contributor

LO1ZB commented Feb 1, 2015

Which server did you tried?

@TsFreddie
Copy link
Contributor Author

minecraft.planetx.com

@tigerw
Copy link
Member

tigerw commented Feb 1, 2015

I expect everything works fine with Vanilla server software?

@TsFreddie
Copy link
Contributor Author

yes, we played some spigot and vanilla server, everything is fine.

@tigerw
Copy link
Member

tigerw commented Feb 3, 2015

Are they connecting through IPv4 or 6?

@TsFreddie
Copy link
Contributor Author

IPv4 I believe.

@madmaxoft
Copy link
Member

Are you capable of running the server under your platform's debugger (MSVC / gdb)? If so, please PM me on the forum ( http://forum.mc-server.org/private.php?action=send&uid=956 ) and I'll guide you to the things that could help us hunt this issue down. Specifically, I'm interested in the m_State for each connected client, the state of the chunks they're waiting to send and possibly others.

Another way to tackle this could be to let the server log all the communication to the clients; this is simple to set up (just pass /logcomm parameter on the commandline when starting the server) but will be more difficult to process.

@TsFreddie
Copy link
Contributor Author

I don't know if we can reproduce this issue now, it seems like those people occur extreme packets losing. But the only one of those player who I can contact with already fixed his router.It is like some extreme weak connection didn't been kicked and let them stay in the server?

I can try to use gdb, btw, but I think I need some help on that.

@madmaxoft
Copy link
Member

Unless we get someone else who sees this behavior again, there's not much we could do about it.

Closing; feel free to reopen if there's any new info.

@Howaner
Copy link
Contributor

Howaner commented Feb 6, 2015

I have the same bug with AntherusCraft.
The network buffer will overload with chunks. This cause a big desync (Often more than one minute)
If he set a viewdistance of 4 chunks, we can play normally.

If AntherusCraft is on my bukkit server, he receive the chunks slower than in MCServer.
But he can play without desyncs.

A second bug:

  1. AntherusCraft have a big desync
  2. /kick AntherusCraft
  3. AntherusCraft has left the server
  4. One minute later -> Masy receives the kick
    The server doesn't close the connection. Why?

@Howaner Howaner reopened this Feb 6, 2015
@worktycho
Copy link
Member

I think the problem is that only buffering the server has to prevent overloading happens at the point of deciding to send chunks initially. There are several queues between that and the actual sending allowing plenty of bunching if the threads involved aren't getting CPU time frequently enough. What we need is for the protocol to be able to either buffer chunk sending or refuse to send chunks based on how much is buffered at the network layer.

@madmaxoft
Copy link
Member

I guess we could add a sort of "priority" to packets. All normal packets are queued with "normal", and the chunk data packets are queued with "low". Then in the client tick function, we check the amount of data that has been sent lately and decide whether to send a normal priority or low priority packet next. There will have to be some intelligence in this, since there's not much info we can get about the data that has actually been received.
I'm just afraid that we might run into trouble with priorities, same as Vanilla - due to the priority classes it uses, it is possible to actually crash the client if you send a combination of two packets too quickly upon login: https://bugs.mojang.com/browse/MC-42286

@worktycho
Copy link
Member

Client tick is too high level. It needs to be at protocol or lower, or we need protocol feedback. The problem with client tick is we end up sending a fixed amount of data per tick which is stupid as it will still fail on slow connections and be inefficient on fast ones.

What we need is to be able to send packets based on how big the network buffer is, so we can adapt to different network speeds. We're sending the data over TCP so the size of the network buffer is an indication of how much data has been received. The OS only removes data from its buffer when the it receives an ACK for the data. So libevent can't write again until data has been received.

@madmaxoft
Copy link
Member

I don't think it's that bad. A tick means 20 times a second. As far as I know, the official server sends all of its data in ticks as well; we will be sending only the low priority data in ticks, the normal-priority data can be sent immediately. This means that each tick the ClientHandle will decide how many chunks to send, not much else. Whether that decision is based on the buffer sizes or something else is up to the implementation.

We may need to rate-limit the entity movement related packets, too. I have a feeling that the current AI generates a packet for each entity on each tick, this might be another good place to improve network performance.

@worktycho
Copy link
Member

But how do you decide how many? The clientHandle doesn't know how big the buffer sizes are and that information would have to cross three abstraction layers to reach it.

@madmaxoft
Copy link
Member

No-one knows how big the buffers are. The OS may support unlimited buffers; LibEvent uses unlimited buffers. But we can be pretty sure that if there is any leftover data in the LibEvent outgoing buffer, then we're sending too fast and should postpone some chunk packets.

@worktycho
Copy link
Member

Ok, so if we can't use OS buffer sizes what can we use? Unless we can get some information out of the OS scheduler about ACKs and the TCP transfer Window any solution we come up with will have problems with head of line blocking on some combination of line bandwidth/latency/drop rate.

What the real solution is for Mojang to take a real look at performance, and adjust to protocol to something looking more like a low latency event transfer protocol, but that's not something we can control.

madmaxoft added a commit that referenced this issue Feb 7, 2015
This is only a test to try find the cause of #1720.
@madmaxoft
Copy link
Member

I think the very simple decision "if there's anything still queued in LibEvent, then don't send any chunks", is something that might help.

@Howaner can you please try compiling and running the OutgoingDataLog branch in debug mode? It logs sizes of all outgoing data, including the LibEvent outgoing queue size. This should provide some insight into whether this can be used at all or not.

@Howaner
Copy link
Contributor

Howaner commented Feb 7, 2015

http://hastebin.com/fatobilote.1c

And this is a big security hole:

     [de0ce2ec8c5f0ee|00:41:15] Writing 15660 bytes of data, there are already 8746276 bytes queued
     [de0ce2ec8c5f0ee|00:41:15] Writing 2901 bytes of data, there are already 8745552 bytes queued
     [de0ce2ec8c5f0ee|00:41:15] Writing 22200 bytes of data, there are already 8748453 bytes queued
     [c54a7827711cee28|00:41:15] Executing console command: "kick Newskater"
info [c54a7827711cee28|00:41:15] Kicking player Newskater for "You have been kicked."
     [c54a7827711cee28|00:41:15] Sending a DC: "You have been kicked."
     [de0ce2ec8c5f0ee|00:41:15] Writing 3615 bytes of data, there are already 8770653 bytes queued
     [de0ce2ec8c5f0ee|00:41:17] Destroying entity #485 (cPickup)
     [de0ce2ec8c5f0ee|00:41:17] Destroying entity #486 (cPickup)
     [de0ce2ec8c5f0ee|00:41:31] Destroying entity #217 (cPickup)
     [de0ce2ec8c5f0ee|00:41:31] Destroying entity #218 (cPickup)
info [de0ce2ec8c5f0ee|00:41:34] [Newskater]: blabla .
     [de0ce2ec8c5f0ee|00:41:52] Destroying entity #721 (cChicken)

I kicked Newskater. The server should clear the queue and send the disconnect packet.
And shouldn't receive any new packets from a kicked client.

@worktycho worktycho reopened this Feb 7, 2015
@worktycho
Copy link
Member

@xoft, I think we've been suggesting the same thing, but I've been using the wrong terminoligy because that is what I think.

@planetx
Copy link

planetx commented Feb 8, 2015

How are you hosting your server?

minecraft.planetx.com and mc.planetx.com are hosted on high speed networks with an MTU of 9000. Maybe that it the problem?

minecraft.planetx.com is on FIOS with a 50 Mbs up and down and mc.planetx.com is in the Amazon Cloud -- not able to measure the speed. ;)

@yangm97
Copy link

yangm97 commented Nov 8, 2016

Having this too, seems like a very intermittent issue. It happens on my server but if I try running on my machine, with same environment, the problem doesn't happen.
Never mind, there's a difference between my machine and the server: bungeecord.

@tigerw
Copy link
Member

tigerw commented Aug 23, 2020

Network connection not actually being closed on kick was fixed in #3999, tracked #1895 and in a bunch of others.

About this issue, unsure if this is related but we have a forced 1 tick / 50ms delay for every batch of packets sent, since ProcessProtocolInOut is only called per tick for some reason. Most visible in the multiplayer server list's ping: note how Java servers have much lower ping than even a local Cuberite instance.

Fix probably involves #4744 (a dedicated async network loop from ASIO) and even better with @worktycho 's serverTick branch so packets are first digested asynchronously by the network thread, and then the parsed action is queued onto the World's tick thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants