Propagate peer ID change to outgoing reliables#15680
Propagate peer ID change to outgoing reliables#15680sfan5 merged 4 commits intoluanti-org:masterfrom
Conversation
Otherwise a desync could ocurr since the server does strict checking. fixes luanti-org#15627
SmallJoker
left a comment
There was a problem hiding this comment.
the patch makes the server not ack the "connection start" packet the first time its sent. this reliably produces the faulty situation.
I applied the diff to master and started a local server. Then I connected with another instance (same build) to the server. ACK is indeed skipped exactly once (first joined player). The client can join. I only get the following --trace log on client-side:
... (no earlier mention of "con(" or "Connection")
INFO[Main]: Client::ReceiveAll(): Packet processing budget exceeded.
INFO[Main]: Client::afterContentReceived() started
...
INFO[Main]: getShaderIdDirect(): Returning id=23 for name "nodes_shader"
VERBOSE[ConnectionSend]: con(18/50068)RE-SENDING timed-out RELIABLE to 127.0.0.1(t/o=0.5): count=1, channel=0, seqnum=65500
INFO[Main]: SourceShaderCache::getOrLoad(): No path found for "nodes_shader/opengl_geometry.glsl"
...
... (no earlier mention of "con(" or "Connection")
INFO[Main]: Client::ReceiveAll(): Packet processing budget exceeded.
INFO[Main]: Client::afterContentReceived() started
...
INFO[Main]: getShaderIdDirect(): Returning id=23 for name "nodes_shader"
VERBOSE[ConnectionSend]: con(18/50068)RE-SENDING timed-out RELIABLE to 127.0.0.1(t/o=0.5): count=1, channel=0, seqnum=65500
...
INFO[Main]: - Starting mesh update thread
TRACE[ConnectionSend]: con(18/50068)UDP processing reliable CONNCMD_SEND
TRACE[ConnectionSend]: con(18/50068) processing reliable command for peer id: 1 data size: 36
TRACE[ConnectionSend]: con(18/50068) channel: 1, peer quota:1024
TRACE[ConnectionSend]: reliables on wire: 0, waiting for ack for 0
TRACE[ConnectionSend]: incoming_reliables: 0, next reliable packet: 65500, next queued: 0
TRACE[ConnectionSend]: reliables queued : 1
TRACE[ConnectionSend]: queued commands : 0
TRACE[ConnectionSend]: con(18/50068) INFO: sending a queued reliable packet channel: 1, seqnum: 65503
TRACE[ConnectionReceive]: con(18/50068) [ CONTROLTYPE_ACK: channelnum=1, peer_id=1, seqnum=65503 ]
TRACE[ConnectionReceive]: con(18/50068) Queuing ACK command to peer_id: 1 channel: 0 seqnum: 238
TRACE[ConnectionReceive]: con(18/50068)RECURSIVE, TYPE_RELIABLE peer_id: 1, channel: 0, seqnum: 238
TRACE[ConnectionReceive]: con(18/50068)RETURNING TYPE_ORIGINAL to user
Is this what should be expected in master + diff?
For comparison, PR + diff:
INFO[Main]: Client::ReceiveAll(): Packet processing budget exceeded.
INFO[Main]: Client::afterContentReceived() started
...
INFO[Main]: getShaderIdDirect(): Returning id=24 for name "nodes_shader"
VERBOSE[ConnectionSend]: con(18/23091)RE-SENDING timed-out RELIABLE to 127.0.0.1(t/o=0.5): count=1, channel=0, seqnum=65500
TRACE[ConnectionReceive]: con(18/23091) [ CONTROLTYPE_ACK: channelnum=0, peer_id=1, seqnum=65500 ]
TRACE[ConnectionReceive]: con(18/23091) set resend timeout 1.024 (rtt=0.512) for peer id: 1
...
INFO[Main]: - Starting mesh update thread
TRACE[ConnectionSend]: con(18/23091)UDP processing reliable CONNCMD_SEND
TRACE[ConnectionSend]: con(18/23091) processing reliable command for peer id: 1 data size: 36
TRACE[ConnectionSend]: con(18/23091) channel: 1, peer quota:1024
TRACE[ConnectionSend]: reliables on wire: 0, waiting for ack for 0
TRACE[ConnectionSend]: incoming_reliables: 0, next reliable packet: 65500, next queued: 0
TRACE[ConnectionSend]: reliables queued : 1
TRACE[ConnectionSend]: queued commands : 0
TRACE[ConnectionSend]: con(18/23091) INFO: sending a queued reliable packet channel: 1, seqnum: 65503
TRACE[ConnectionReceive]: con(18/23091) [ CONTROLTYPE_ACK: channelnum=1, peer_id=1, seqnum=65503 ]
TRACE[ConnectionReceive]: con(18/23091) Queuing ACK command to peer_id: 1 channel: 0 seqnum: 238
TRACE[ConnectionReceive]: con(18/23091)RECURSIVE, TYPE_RELIABLE peer_id: 1, channel: 0, seqnum: 238
TRACE[ConnectionReceive]: con(18/23091)RETURNING TYPE_ORIGINAL to user
The changes make sense and do look correct. However, I am not sure how else this could be tested.
Not easy to tell from the logs. Your best bet bet is to use Wireshark to look at the packets. Edit: also to be precise, in the buggy state the client will disconnect with an errors 30s after joining. you can check if that happens. |
Unfortunately, I could not reproduce this error. However, the code does look good, thus I could give my approval regardless. |
I hope to never touch the networking code again.
To do
This PR is Ready for Review.
How to test
the patch makes the server not ack the "connection start" packet the first time its sent. this reliably produces the faulty situation.
with the PR applies you will see the second time the client sends the "connection start" packet it already has the new peer id assigned to by the server.
note: in case you want to test a more complicated setup: the fix (this PR) is client-side, the test repro patch is server-side