-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(peer): reconnection ping-pongs #2841
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #2841 +/- ##
==========================================
- Coverage 63.26% 63.20% -0.07%
==========================================
Files 211 211
Lines 42214 42219 +5
==========================================
- Hits 26708 26684 -24
- Misses 15506 15535 +29
☔ View full report in Codecov by Sentry. |
The root of all evil is state. Especially mutable and shared. Seems to me like peer communication is using a connection where both sides think they are in control of the connection. So with higher latencies it is possible to get into a sort of a reconnect ping-pong, when one sides reconnects, starts re-sending all messages, and during that time the other side reconnects, and starts sending messages themselves... just to get new connection from the other side... Fix fedimint#2800
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and it works! Not sure it solves 100% of the problems, but it should at least fix 99%
Interesting! I was aware of that problem but expected exponential backoff+jitter to fix this after a few iterations, apparently not or some variables were chosen poorly :/ Anyway, your solution is quite elegant. |
Fantastic! |
Right. But I think we've optimized these at some point to improve test times or something, which becomes a problem on longer latency links. |
The root of all evil is state. Especially mutable and shared. Seems to me like peer communication is using a connection where both sides think they are in control. So with higher latencies it is possible to get into a sort of a reconnect ping-pong, when one sides reconnects, starts re-sending all messages, and during that time the other side reconnects, and starts sending messages themselves... just to get new connection from the other side...
Typically the way I'd write is to have two connections (for each side) and the sending side being responsible for (re-)connecting. A bit wasteful, but makes the code easier.
Here, to avoid refactoring too much, I just make the peer with a lower
PeerId
responsible for reconnections.Fix #2800