Sapier's fix for the RESEND RELIABLE problem #4170

OldCoder · 2016-05-30T03:17:57Z

This is Sapier's fix for the RESEND RELIABLE problem, a network issue that has affected MT worlds for at least 1.5 years. The fix is relatively simple and is believed to be correct.

I've used the fix myself for over a year to keep a dozen worlds up and running; for me, MT wouldn't be possible without it. The fix has been tested, recently, in other worlds and seems to improve performance even in problematic cases such as residential hosting.

kwolekr · 2016-05-30T03:40:28Z

👎
This 'fix' turns supposedly reliable packets into unreliable packets. With TCP, if a packet cannot be acked after trying to update the routes on multiple occasions, it won't simply give up on that packet and move along, but it would reset the entire connection.

est31 · 2016-05-30T03:49:16Z

it won't simply give up on that packet and move along, but it would reset the entire connection.

That's precisely what's happening here, read the code.

OldCoder · 2016-05-30T03:51:58Z

And, unless you do exactly that, the log fills up with gigabytes, literally, of RESEND RELIABLE messages and worlds go down. Without this patch, some server owners are locked out of Minetest entirely. One of them, in late 2014, was me. The project would have lost a dozen worlds and other contributions without this patch.

OldCoder · 2016-05-30T03:53:19Z

Putting the word "fix" into quotes, @kwolekr, so as to deprecate its status as such, does not change any of what I've said.

kwolekr · 2016-05-30T04:14:39Z

That's precisely what's happening here, read the code.

D'oh, you're right. Okay then.

est31 · 2016-05-30T04:18:13Z

Generally I like the change as it makes the server more robust to rogue client behaviour.

But I do think that the underlying issue deserves more attention. It is not just simple network weirdness which would cause this, it can be either rogue or buggy clients, or a bug in the server somewhere.

Look at #4138 for example, it has caused this very RESEND_RELIABLE itself as well (see commit msg of 423d8c1 for more info).

kwolekr · 2016-05-30T04:35:35Z

I'm okay with this solution as a hacky patchy workaroud because let's face it, connection.cpp is a mess. I think our manhours are best spent making something that's unit-tested for edge cases, better documented, and easier to maintain. If anybody wants to try tracking down the root cause, that's cool too, so we've decided to wait 2 weeks before merging this. But I personally 👍 this, provided the obnoxious style errors are fixed before merging.

est31 · 2016-05-30T05:01:55Z

It would be cool for some contributor to track it down.

tenplus1 · 2016-05-30T08:49:17Z

We NEED this 'fix' as many server owners are frustrated with their worlds getting stuck and no way to reset server unless they are physically there to do so.

Fixer-007 · 2016-05-30T11:28:07Z

You can't believe how this issue is annoying to admins and players, it destroys gameplay, totally.
ESM server admin used build with this patch included, and it does not stall, few curious things I've noticed (not sure it is related to this patch but I'm posting it anyway just to be sure):

From time to time F5 debug graphs shows strange "network packets received" spikes of 50-100-200-500 (!!!) packets at once, it does not affect gameplay it seems, but it is strange, average network packed received number is below 10 usually. Don't know if it is caused by this patch.
After server worked for more than a day I've noticed that some (?) people walking is not smooth but more like fast step-like jiggle of some kind, tried to capture it here, but recording is not that good :( https://i.imgur.com/EYwbkkF.gif I never seen them walking with such jiggle, I remember there is step-like walk when lag is large, player moves slowly and laggy, on the other hand this walking I observed is different, player moves like there is no lag, but has this strange jiggle, like "one step forward, very slight step back and forward to original positional).

I'm not sure if this is related to this patch, but this can give some clues about what is broken in network code. Maybe it was related to ESM only, that has lots of mods.

est31 · 2016-05-30T11:45:11Z

may be related to the actual underlying bug.
may be Frequent lag spikes on the server side #4022

0-afflatus · 2016-05-30T11:49:04Z

This is an important first step.
It doesn't fix the underlying reasons for the RESEND RELIABLE messages being generated.
I have a hunch that it may be related to area protection checking.

However, tested 👍

est31 · 2016-06-01T04:12:05Z

Well as @tenplus1 is in desparate need for the fix and the chance that the original cause for this bug will be found by an external contributor is near zero, I now think that there should be no 2 week stall anymore.

Its a bit sad for the engine, as the original cause won't be found, but xanadu is a great server so why not support it.

tenplus1 · 2016-06-01T07:19:16Z

@est31 - I'm sure that through time the cause will be found, but for now stopping it from breaking servers is more important.

est31 · 2016-06-01T09:36:22Z

👍

Fixer-007 · 2016-06-01T12:26:05Z

We lose players because of this problem, two weeks delay is not good.

sofar · 2016-06-02T21:43:36Z

👍 as workaround.

Sapier's fix for the RESEND RELIABLE problem

0fcd973

OldCoder mentioned this pull request May 30, 2016

Sapier's fix for the RESEND RELIABLE problem #4171

Closed

est31 added @ Network One approval ✅ ◻️ Bugfix 🐛 PRs that fix a bug labels Jun 1, 2016

Zeno- merged commit 7ea4a03 into minetest:master Jun 3, 2016

Zeno- added >= Two approvals ✅ ✅ and removed One approval ✅ ◻️ labels Jun 3, 2016

This was referenced Apr 7, 2017

'RE-SENDING timed-out RELIABLE' causes server timeout/lockup #3307

Closed

Stall Glitch #4151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sapier's fix for the RESEND RELIABLE problem #4170

Sapier's fix for the RESEND RELIABLE problem #4170

OldCoder commented May 30, 2016

kwolekr commented May 30, 2016 •

edited

Loading

est31 commented May 30, 2016

OldCoder commented May 30, 2016

OldCoder commented May 30, 2016 •

edited

Loading

kwolekr commented May 30, 2016 •

edited

Loading

est31 commented May 30, 2016

kwolekr commented May 30, 2016

est31 commented May 30, 2016

tenplus1 commented May 30, 2016

Fixer-007 commented May 30, 2016 •

edited

Loading

est31 commented May 30, 2016

0-afflatus commented May 30, 2016 •

edited

Loading

est31 commented Jun 1, 2016

tenplus1 commented Jun 1, 2016

est31 commented Jun 1, 2016

Fixer-007 commented Jun 1, 2016 •

edited

Loading

sofar commented Jun 2, 2016

Sapier's fix for the RESEND RELIABLE problem #4170

Sapier's fix for the RESEND RELIABLE problem #4170

Conversation

OldCoder commented May 30, 2016

kwolekr commented May 30, 2016 • edited Loading

est31 commented May 30, 2016

OldCoder commented May 30, 2016

OldCoder commented May 30, 2016 • edited Loading

kwolekr commented May 30, 2016 • edited Loading

est31 commented May 30, 2016

kwolekr commented May 30, 2016

est31 commented May 30, 2016

tenplus1 commented May 30, 2016

Fixer-007 commented May 30, 2016 • edited Loading

est31 commented May 30, 2016

0-afflatus commented May 30, 2016 • edited Loading

est31 commented Jun 1, 2016

tenplus1 commented Jun 1, 2016

est31 commented Jun 1, 2016

Fixer-007 commented Jun 1, 2016 • edited Loading

sofar commented Jun 2, 2016

kwolekr commented May 30, 2016 •

edited

Loading

OldCoder commented May 30, 2016 •

edited

Loading

kwolekr commented May 30, 2016 •

edited

Loading

Fixer-007 commented May 30, 2016 •

edited

Loading

0-afflatus commented May 30, 2016 •

edited

Loading

Fixer-007 commented Jun 1, 2016 •

edited

Loading