Skip to content

Propagate peer ID change to outgoing reliables#15680

Merged
sfan5 merged 4 commits intoluanti-org:masterfrom
sfan5:blergh
Feb 1, 2025
Merged

Propagate peer ID change to outgoing reliables#15680
sfan5 merged 4 commits intoluanti-org:masterfrom
sfan5:blergh

Conversation

@sfan5
Copy link
Copy Markdown
Member

@sfan5 sfan5 commented Jan 14, 2025

I hope to never touch the networking code again.

To do

This PR is Ready for Review.

How to test

diff --git a/src/network/mtp/impl.cpp b/src/network/mtp/impl.cpp
--- a/src/network/mtp/impl.cpp
+++ b/src/network/mtp/impl.cpp
@@ -1661,6 +1661,14 @@ void Connection::sendAck(session_t peer_id, u8 channelnum, u16 seqnum)
                        " channel: " << (channelnum & 0xFF) <<
                        " seqnum: " << seqnum << std::endl);
 
+       static bool first = true;
+       if (peer_id != PEER_ID_SERVER && channelnum == 0 && seqnum == SEQNUM_INITIAL) {
+               if (first) {
+                       first = false;
+                       return;
+               }
+       }
+
        SharedBuffer<u8> ack(4);
        writeU8(&ack[0], PACKET_TYPE_CONTROL);
        writeU8(&ack[1], CONTROLTYPE_ACK);
  1. apply patch, reproduce issue on master
  2. switch to PR (keep patch)
  3. issue is gone

the patch makes the server not ack the "connection start" packet the first time its sent. this reliably produces the faulty situation.
with the PR applies you will see the second time the client sends the "connection start" packet it already has the new peer id assigned to by the server.

note: in case you want to test a more complicated setup: the fix (this PR) is client-side, the test repro patch is server-side

Otherwise a desync could ocurr since the server does strict checking.
fixes luanti-org#15627
@sfan5 sfan5 added @ Network Bugfix 🐛 PRs that fix a bug labels Jan 14, 2025
@sfan5 sfan5 added this to the 5.11.0 milestone Jan 14, 2025
@sfan5 sfan5 linked an issue Jan 14, 2025 that may be closed by this pull request
Copy link
Copy Markdown
Member

@SmallJoker SmallJoker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the patch makes the server not ack the "connection start" packet the first time its sent. this reliably produces the faulty situation.

I applied the diff to master and started a local server. Then I connected with another instance (same build) to the server. ACK is indeed skipped exactly once (first joined player). The client can join. I only get the following --trace log on client-side:

... (no earlier mention of "con(" or "Connection")
INFO[Main]: Client::ReceiveAll(): Packet processing budget exceeded.
INFO[Main]: Client::afterContentReceived() started
...
INFO[Main]: getShaderIdDirect(): Returning id=23 for name "nodes_shader"
VERBOSE[ConnectionSend]: con(18/50068)RE-SENDING timed-out RELIABLE to 127.0.0.1(t/o=0.5): count=1, channel=0, seqnum=65500
INFO[Main]: SourceShaderCache::getOrLoad(): No path found for "nodes_shader/opengl_geometry.glsl"
...
... (no earlier mention of "con(" or "Connection")
INFO[Main]: Client::ReceiveAll(): Packet processing budget exceeded.
INFO[Main]: Client::afterContentReceived() started
...
INFO[Main]: getShaderIdDirect(): Returning id=23 for name "nodes_shader"
VERBOSE[ConnectionSend]: con(18/50068)RE-SENDING timed-out RELIABLE to 127.0.0.1(t/o=0.5): count=1, channel=0, seqnum=65500
...
INFO[Main]: - Starting mesh update thread
TRACE[ConnectionSend]: con(18/50068)UDP processing reliable CONNCMD_SEND
TRACE[ConnectionSend]: con(18/50068) processing reliable command for peer id: 1 data size: 36
TRACE[ConnectionSend]: con(18/50068)	 channel: 1, peer quota:1024
TRACE[ConnectionSend]: 			reliables on wire: 0, waiting for ack for 0
TRACE[ConnectionSend]: 			incoming_reliables: 0, next reliable packet: 65500, next queued: 0
TRACE[ConnectionSend]: 			reliables queued : 1
TRACE[ConnectionSend]: 			queued commands  : 0
TRACE[ConnectionSend]: con(18/50068) INFO: sending a queued reliable packet  channel: 1, seqnum: 65503
TRACE[ConnectionReceive]: con(18/50068) [ CONTROLTYPE_ACK: channelnum=1, peer_id=1, seqnum=65503 ]
TRACE[ConnectionReceive]: con(18/50068) Queuing ACK command to peer_id: 1 channel: 0 seqnum: 238
TRACE[ConnectionReceive]: con(18/50068)RECURSIVE, TYPE_RELIABLE peer_id: 1, channel: 0, seqnum: 238
TRACE[ConnectionReceive]: con(18/50068)RETURNING TYPE_ORIGINAL to user

Is this what should be expected in master + diff?

For comparison, PR + diff:

INFO[Main]: Client::ReceiveAll(): Packet processing budget exceeded.
INFO[Main]: Client::afterContentReceived() started
...
INFO[Main]: getShaderIdDirect(): Returning id=24 for name "nodes_shader"
VERBOSE[ConnectionSend]: con(18/23091)RE-SENDING timed-out RELIABLE to 127.0.0.1(t/o=0.5): count=1, channel=0, seqnum=65500
TRACE[ConnectionReceive]: con(18/23091) [ CONTROLTYPE_ACK: channelnum=0, peer_id=1, seqnum=65500 ]
TRACE[ConnectionReceive]: con(18/23091) set resend timeout 1.024 (rtt=0.512) for peer id: 1
...
INFO[Main]: - Starting mesh update thread
TRACE[ConnectionSend]: con(18/23091)UDP processing reliable CONNCMD_SEND
TRACE[ConnectionSend]: con(18/23091) processing reliable command for peer id: 1 data size: 36
TRACE[ConnectionSend]: con(18/23091)	 channel: 1, peer quota:1024
TRACE[ConnectionSend]: 			reliables on wire: 0, waiting for ack for 0
TRACE[ConnectionSend]: 			incoming_reliables: 0, next reliable packet: 65500, next queued: 0
TRACE[ConnectionSend]: 			reliables queued : 1
TRACE[ConnectionSend]: 			queued commands  : 0
TRACE[ConnectionSend]: con(18/23091) INFO: sending a queued reliable packet  channel: 1, seqnum: 65503
TRACE[ConnectionReceive]: con(18/23091) [ CONTROLTYPE_ACK: channelnum=1, peer_id=1, seqnum=65503 ]
TRACE[ConnectionReceive]: con(18/23091) Queuing ACK command to peer_id: 1 channel: 0 seqnum: 238
TRACE[ConnectionReceive]: con(18/23091)RECURSIVE, TYPE_RELIABLE peer_id: 1, channel: 0, seqnum: 238
TRACE[ConnectionReceive]: con(18/23091)RETURNING TYPE_ORIGINAL to user

The changes make sense and do look correct. However, I am not sure how else this could be tested.

Comment thread src/network/mtp/threads.cpp Outdated
@sfan5
Copy link
Copy Markdown
Member Author

sfan5 commented Jan 19, 2025

Is this what should be expected in master + diff?

Not easy to tell from the logs. Your best bet bet is to use Wireshark to look at the packets.

Edit: also to be precise, in the buggy state the client will disconnect with an errors 30s after joining. you can check if that happens.

Comment thread src/network/mtp/threads.cpp Outdated
Comment thread src/network/mtp/threads.cpp Outdated
Comment thread src/network/mtp/threads.cpp Outdated
@SmallJoker
Copy link
Copy Markdown
Member

Edit: also to be precise, in the buggy state the client will disconnect with an errors 30s after joining. you can check if that happens.

Unfortunately, I could not reproduce this error. However, the code does look good, thus I could give my approval regardless.

@sfan5 sfan5 merged commit c0422b1 into luanti-org:master Feb 1, 2025
@sfan5 sfan5 deleted the blergh branch February 1, 2025 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disconnected after 30 seconds on servers upgraded to 5.10 (or even 5.9)

2 participants