-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race between outbound messages and peer disconnection #2663
Fix race between outbound messages and peer disconnection #2663
Conversation
Codecov ReportAll modified lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2663 +/- ##
==========================================
+ Coverage 89.04% 89.05% +0.01%
==========================================
Files 112 112
Lines 87419 87417 -2
Branches 87419 87417 -2
==========================================
+ Hits 77841 77850 +9
+ Misses 7336 7328 -8
+ Partials 2242 2239 -3
☔ View full report in Codecov by Sentry. |
f389626
to
f1e97f9
Compare
lightning/src/ln/peer_handler.rs
Outdated
|
||
#[test] | ||
#[cfg(feature = "std")] | ||
fn test_rapid_connect_events_order_multithreaded() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran this about 100x with your change reverted and wasn't able to get it to fail :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, yea, sorry, this test isnt even close to right. I'll spend some more time on this but the issue is pretty hard to repro.
f1e97f9
to
2c33d6d
Compare
For now simply removed the test. I think we should land this as-is because writing a generic stress test that isn't super targeted is not trivial and we need to ship this. |
lightning/src/ln/peer_handler.rs
Outdated
return Err(PeerHandleError { }.into()); | ||
} | ||
if let Err(()) = self.message_handler.custom_message_handler.peer_connected(&their_node_id, &msg, peer_lock.inbound_connection) { | ||
log_debug!(self.logger, "Custom Message Handler decided we couldn't communicate with peer {}", log_pubkey!(their_node_id)); | ||
self.message_handler.onion_message_handler.peer_disconnected(&their_node_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about route_handler
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have a peer_disconnected.
lightning-custom-message/src/lib.rs
Outdated
)* | ||
if should_disconnect { | ||
$( | ||
self.$field.peer_disconnected(their_node_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok that peer_disconnected
may be called for some handlers that did not get a peer_connected
first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All handlers get a peer_connected here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is somewhat awkward that we have different semantics on the custom message handler from the other message handlers - the others dont get a peer_disconnected if they return an error from peer_connected, custom does (here). Let me fix that.
Previously, outbound messages held in `process_events` could race with peer disconnection, allowing a message intended for a peer before disconnection to be sent to the same peer after disconnection. The fix is simple - hold the peers read lock while we fetch pending messages from peers (as we disconnect with the write lock).
2c33d6d
to
4366369
Compare
Dropped all the commits but the real fix, since those commits were mostly just for the test that didnt work anyway. I'll get a PR up with the changes and a test at some point in a followup. |
Previously, outbound messages held in
process_events
could racewith peer disconnection, allowing a message intended for a peer
before disconnection to be sent to the same peer after
disconnection.
The fix is simple - hold the peers read lock while we fetch
pending messages from peers (as we disconnect with the write lock).