fix(peer): reconnection ping-pongs #2841

dpc · 2023-07-25T21:37:38Z

The root of all evil is state. Especially mutable and shared. Seems to me like peer communication is using a connection where both sides think they are in control. So with higher latencies it is possible to get into a sort of a reconnect ping-pong, when one sides reconnects, starts re-sending all messages, and during that time the other side reconnects, and starts sending messages themselves... just to get new connection from the other side...

Typically the way I'd write is to have two connections (for each side) and the sending side being responsible for (re-)connecting. A bit wasteful, but makes the code easier.

Here, to avoid refactoring too much, I just make the peer with a lower PeerId responsible for reconnections.

Fix #2800

codecov · 2023-07-25T21:45:54Z

Codecov Report

Patch coverage: 93.33% and project coverage change: -0.07% ⚠️

Comparison is base (0403bdb) 63.26% compared to head (474d288) 63.20%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2841      +/-   ##
==========================================
- Coverage   63.26%   63.20%   -0.07%     
==========================================
  Files         211      211              
  Lines       42214    42219       +5     
==========================================
- Hits        26708    26684      -24     
- Misses      15506    15535      +29

Files Changed	Coverage Δ
fedimint-server/src/net/peers.rs	`91.26% <93.33%> (-2.11%)`	⬇️

... and 11 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

The root of all evil is state. Especially mutable and shared. Seems to me like peer communication is using a connection where both sides think they are in control of the connection. So with higher latencies it is possible to get into a sort of a reconnect ping-pong, when one sides reconnects, starts re-sending all messages, and during that time the other side reconnects, and starts sending messages themselves... just to get new connection from the other side... Fix fedimint#2800

douglaz

Tested and it works! Not sure it solves 100% of the problems, but it should at least fix 99%

elsirion · 2023-07-26T16:24:49Z

Interesting! I was aware of that problem but expected exponential backoff+jitter to fix this after a few iterations, apparently not or some variables were chosen poorly :/ Anyway, your solution is quite elegant.

justinmoon · 2023-07-26T16:28:53Z

Fantastic!

dpc · 2023-07-26T17:37:53Z

I was aware of that problem but expected exponential backoff+jitter to fix this after a few iterations,

Right. But I think we've optimized these at some point to improve test times or something, which becomes a problem on longer latency links.

dpc requested a review from a team as a code owner July 25, 2023 21:37

dpc force-pushed the timeouts-dkg branch from 60531ac to 474d288 Compare July 25, 2023 22:19

dpc mentioned this pull request Jul 25, 2023

Weird communication errors while setting up federation #2800

Closed

douglaz approved these changes Jul 25, 2023

View reviewed changes

dpc added this pull request to the merge queue Jul 25, 2023

Merged via the queue into fedimint:master with commit 4919189 Jul 25, 2023
18 checks passed

dpc deleted the timeouts-dkg branch July 25, 2023 23:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(peer): reconnection ping-pongs #2841

fix(peer): reconnection ping-pongs #2841

dpc commented Jul 25, 2023 •

edited

codecov bot commented Jul 25, 2023 •

edited

douglaz left a comment

elsirion commented Jul 26, 2023

justinmoon commented Jul 26, 2023

dpc commented Jul 26, 2023 •

edited

fix(peer): reconnection ping-pongs #2841

fix(peer): reconnection ping-pongs #2841

Conversation

dpc commented Jul 25, 2023 • edited

codecov bot commented Jul 25, 2023 • edited

Codecov Report

douglaz left a comment

Choose a reason for hiding this comment

elsirion commented Jul 26, 2023

justinmoon commented Jul 26, 2023

dpc commented Jul 26, 2023 • edited

dpc commented Jul 25, 2023 •

edited

codecov bot commented Jul 25, 2023 •

edited

dpc commented Jul 26, 2023 •

edited