Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sapier's fix for the RESEND RELIABLE problem #4170

Merged
merged 1 commit into from
Jun 3, 2016
Merged

Sapier's fix for the RESEND RELIABLE problem #4170

merged 1 commit into from
Jun 3, 2016

Conversation

OldCoder
Copy link
Contributor

This is Sapier's fix for the RESEND RELIABLE problem, a network issue that has affected MT worlds for at least 1.5 years. The fix is relatively simple and is believed to be correct.

I've used the fix myself for over a year to keep a dozen worlds up and running; for me, MT wouldn't be possible without it. The fix has been tested, recently, in other worlds and seems to improve performance even in problematic cases such as residential hosting.

@kwolekr
Copy link
Contributor

kwolekr commented May 30, 2016

👎
This 'fix' turns supposedly reliable packets into unreliable packets. With TCP, if a packet cannot be acked after trying to update the routes on multiple occasions, it won't simply give up on that packet and move along, but it would reset the entire connection.

@est31
Copy link
Contributor

est31 commented May 30, 2016

it won't simply give up on that packet and move along, but it would reset the entire connection.

That's precisely what's happening here, read the code.

@OldCoder
Copy link
Contributor Author

And, unless you do exactly that, the log fills up with gigabytes, literally, of RESEND RELIABLE messages and worlds go down. Without this patch, some server owners are locked out of Minetest entirely. One of them, in late 2014, was me. The project would have lost a dozen worlds and other contributions without this patch.

@OldCoder
Copy link
Contributor Author

OldCoder commented May 30, 2016

Putting the word "fix" into quotes, @kwolekr, so as to deprecate its status as such, does not change any of what I've said.

@kwolekr
Copy link
Contributor

kwolekr commented May 30, 2016

That's precisely what's happening here, read the code.

D'oh, you're right. Okay then.

@est31
Copy link
Contributor

est31 commented May 30, 2016

Generally I like the change as it makes the server more robust to rogue client behaviour.

But I do think that the underlying issue deserves more attention. It is not just simple network weirdness which would cause this, it can be either rogue or buggy clients, or a bug in the server somewhere.

Look at #4138 for example, it has caused this very RESEND_RELIABLE itself as well (see commit msg of 423d8c1 for more info).

@kwolekr
Copy link
Contributor

kwolekr commented May 30, 2016

I'm okay with this solution as a hacky patchy workaroud because let's face it, connection.cpp is a mess. I think our manhours are best spent making something that's unit-tested for edge cases, better documented, and easier to maintain. If anybody wants to try tracking down the root cause, that's cool too, so we've decided to wait 2 weeks before merging this. But I personally 👍 this, provided the obnoxious style errors are fixed before merging.

@est31
Copy link
Contributor

est31 commented May 30, 2016

It would be cool for some contributor to track it down.

@tenplus1
Copy link
Contributor

We NEED this 'fix' as many server owners are frustrated with their worlds getting stuck and no way to reset server unless they are physically there to do so.

@Fixer-007
Copy link
Contributor

Fixer-007 commented May 30, 2016

You can't believe how this issue is annoying to admins and players, it destroys gameplay, totally.
ESM server admin used build with this patch included, and it does not stall, few curious things I've noticed (not sure it is related to this patch but I'm posting it anyway just to be sure):

  1. From time to time F5 debug graphs shows strange "network packets received" spikes of 50-100-200-500 (!!!) packets at once, it does not affect gameplay it seems, but it is strange, average network packed received number is below 10 usually. Don't know if it is caused by this patch.

  2. After server worked for more than a day I've noticed that some (?) people walking is not smooth but more like fast step-like jiggle of some kind, tried to capture it here, but recording is not that good :( https://i.imgur.com/EYwbkkF.gif I never seen them walking with such jiggle, I remember there is step-like walk when lag is large, player moves slowly and laggy, on the other hand this walking I observed is different, player moves like there is no lag, but has this strange jiggle, like "one step forward, very slight step back and forward to original positional).

I'm not sure if this is related to this patch, but this can give some clues about what is broken in network code. Maybe it was related to ESM only, that has lots of mods.

@est31
Copy link
Contributor

est31 commented May 30, 2016

  1. may be related to the actual underlying bug.
  2. may be Frequent lag spikes on the server side #4022

@0-afflatus
Copy link

0-afflatus commented May 30, 2016

This is an important first step.
It doesn't fix the underlying reasons for the RESEND RELIABLE messages being generated.
I have a hunch that it may be related to area protection checking.

However, tested 👍

@est31
Copy link
Contributor

est31 commented Jun 1, 2016

Well as @tenplus1 is in desparate need for the fix and the chance that the original cause for this bug will be found by an external contributor is near zero, I now think that there should be no 2 week stall anymore.

Its a bit sad for the engine, as the original cause won't be found, but xanadu is a great server so why not support it.

@tenplus1
Copy link
Contributor

tenplus1 commented Jun 1, 2016

@est31 - I'm sure that through time the cause will be found, but for now stopping it from breaking servers is more important.

@est31
Copy link
Contributor

est31 commented Jun 1, 2016

👍

@Fixer-007
Copy link
Contributor

Fixer-007 commented Jun 1, 2016

We lose players because of this problem, two weeks delay is not good.

@sofar
Copy link
Contributor

sofar commented Jun 2, 2016

👍 as workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants