New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delay broadcasting Channel Updates until connected to peers #2731
Conversation
I took the following approach to tackling this issue:
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2731 +/- ##
==========================================
+ Coverage 89.14% 91.68% +2.53%
==========================================
Files 116 118 +2
Lines 93205 111593 +18388
Branches 93205 111593 +18388
==========================================
+ Hits 83089 102315 +19226
+ Misses 7583 7246 -337
+ Partials 2533 2032 -501 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use a test. Also, I do think we should consider the close-then-shutdown case - how do we get these out if we were shutting down when we closed or if we close on restart but dont keep the node online for long?
Certainly! I am on it 🧑💻
I have some thoughts on this scenario you brought up that I would love to share. In this case, it seems like broadcasting might be a challenge since, during the shutdown, we wouldn't be connected to anyone to relay the message. As far as I know, our node doesn't automatically broadcast the channel graph each time it restarts. To tackle this, it might be worth considering the option of persisting the data to be broadcast later when the node comes back online. However, I'm curious about the importance of the channel update message and whether it's crucial enough to justify persisting the data across multiple node sessions. I'd love to hear your perspective on this matter. |
Updated from pr2731.01 -> pr2731.02 (diff) Changes:
|
Updated from pr2731.02 to pr2731.03 (diff) with the following changes:
These adjustments enhance the reliability of broadcasting pending channel_update messages in situations involving close-then-shutdown, providing a more robust system. |
Updated from pr2731.03 to pr2731.04 (diff): Changes:
|
Updated from pr2731.05 to pr2731.06 (diff): Updates:
|
Updated from pr2731.06 to pr2731.07 (diff):
|
Note Reviews PausedUse the following commands to manage reviews:
WalkthroughThe update improves the reliability of Changes
Assessment against linked issues
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (1)
- lightning/src/ln/channelmanager.rs (8 hunks)
Additional comments: 5
lightning/src/ln/channelmanager.rs (5)
- 1384-1386: The addition of
pending_broadcast_messages
is consistent with the PR's objective to cache unsentchannel_update
messages. Ensure that the Mutex is used correctly throughout the code to prevent data races.- 2462-2462: Initialization of
pending_broadcast_messages
with an empty vector is correct and follows Rust's conventions for initializing state within a struct.- 8207-8208: Appending
pending_broadcast_messages
topending_events
is a key part of the mechanism to ensure cached messages are sent. Ensure that this operation is thread-safe and that the lock is held for the minimum time necessary.- 11105-11105: The initialization of
pending_broadcast_messages
is repeated here, which is consistent with the earlier initialization. This is standard practice and is approved.- 11636-11662: The test case
test_channel_update_cached
correctly verifies the caching behavior of channel updates. It checks that the message is not immediately sent but is cached correctly, which aligns with the PR's objectives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- lightning/src/ln/channelmanager.rs (7 hunks)
Additional comments: 7
lightning/src/ln/channelmanager.rs (7)
- 1384-1387: The addition of
pending_broadcast_messages
is consistent with the PR's objective to cache unsentchannel_update
messages. Ensure that theMutex
is used correctly throughout the code to prevent data races.- 2462-2462: Initialization of
pending_broadcast_messages
with an empty vector is correct and matches the declaration in hunk 1.- 2954-2959: The logic to cache the
channel_update
message when no peers are connected is implemented as described. However, ensure that thelog_trace!
macro is used consistently across the codebase for logging.- 8177-8177: This comment block is a placeholder for the
get_and_clear_pending_msg_events
function. It's important to ensure that the documentation is updated to reflect any new logic related to the broadcasting of messages.- 8206-8207: Appending
pending_broadcast_messages
topending_events
is in line with the intended functionality. However, consider the comment from wpaulino about ensuring that there are connected peers before attempting to broadcast.- 11104-11104: The initialization of
pending_broadcast_messages
in a different constructor is consistent with the previous hunk and is necessary for proper functionality.- 11635-11661: The test
test_channel_update_cached
verifies the caching behavior ofchannel_update
messages. Ensure that the test covers all relevant scenarios, including node restarts and reconnections.
Updated from pr2731.08 to pr2731.09 (diff): Changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 4
Configuration used: CodeRabbit UI
Files selected for processing (2)
- lightning/src/ln/channelmanager.rs (10 hunks)
- lightning/src/ln/reorg_tests.rs (2 hunks)
Additional comments: 7
lightning/src/ln/channelmanager.rs (7)
- 1384-1386: The addition of
pending_broadcast_messages
to cache channel update messages not broadcasted due to lack of peer connections is a good approach to ensure reliability in message broadcasting. Ensure that the mutex is consistently unlocked to prevent deadlocks.- 2954-2958: Caching the channel update message when an update is available but not immediately broadcastable is implemented correctly. However, ensure that the locking mechanism on
pending_broadcast_messages
does not introduce any performance bottlenecks or deadlocks, especially in high-concurrency scenarios.- 8176-8176: The documentation update clarifying the placement of
BroadcastChannelAnnouncement
andBroadcastChannelUpdate
amongMessageSendEvent
s is helpful for understanding the intended behavior. It's important that documentation keeps pace with code changes to aid future maintainability.- 8202-8216: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [8196-8213]
The logic to check for connected peers before appending broadcast messages to the pending events list is sound. However, consider optimizing the iteration over
per_peer_state
to avoid potential performance issues in scenarios with a large number of peers.
- 11642-11689: The test
test_channel_update_cached
effectively verifies the caching and broadcasting behavior of channel update messages under various network conditions. Ensure that edge cases, such as rapid connect/disconnect scenarios, are also covered to prevent any unforeseen issues.- 11714-11722: The test
test_drop_disconnected_peers_when_removing_channels
correctly asserts the behavior of peer state management upon disconnection and force closure of channels. It's crucial to also test the behavior when peers reconnect after being dropped to ensure the system's resilience.- 12426-12430: The test
test_trigger_lnd_force_close
sets up a scenario to test force-closure of channels, which is essential for ensuring the robustness of channel management under adversarial conditions. Consider adding assertions to verify the state of the channel and the broadcast of channel update messages post-force-close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this can be squashed now
Updated from pr2731.13 to pr2731.14 (diff): Changes:
|
Updated from pr2731.14 to pr2731.15 (diff): Updates:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, LGTM with the feedback below addressed. Feel free to squash commits down into a single clean history when you next push.
lightning/src/ln/channelmanager.rs
Outdated
@@ -1988,7 +1992,7 @@ macro_rules! handle_error { | |||
|
|||
$self.finish_close_channel(shutdown_res); | |||
if let Some(update) = update_option { | |||
msg_events.push(events::MessageSendEvent::BroadcastChannelUpdate { | |||
broadcast_event = Some(events::MessageSendEvent::BroadcastChannelUpdate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do the pending_broadcast_messages
lock and push inline here, no need to store it in a temp.
lightning/src/ln/channelmanager.rs
Outdated
@@ -4059,7 +4064,8 @@ where | |||
} | |||
if let ChannelPhase::Funded(channel) = channel_phase { | |||
if let Ok(msg) = self.get_channel_update_for_broadcast(channel) { | |||
peer_state.pending_msg_events.push(events::MessageSendEvent::BroadcastChannelUpdate { msg }); | |||
let pending_broadcast_messages = &mut self.pending_broadcast_messages.lock().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: everywhere you take the pending_broadcast_messages
lock you don't need the &mut
part.
} | ||
|
||
pub fn disconnect_dummy_node<'a, 'b: 'a, 'c: 'b>(node: &Node<'a, 'b, 'c>) { | ||
node.node.peer_disconnected(&PublicKey::from_slice(&[2; 33]).unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be symmetric with connect_dummy_node
, either we don't peer_connected
on the onion_messenger
or we should peer_disconnected
as well.
// Commenting the assignment to remove `unused_assignments` warning. | ||
// dummy_connected = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we need to keep this here.
lightning/src/ln/reorg_tests.rs
Outdated
@@ -763,21 +763,21 @@ fn test_htlc_preimage_claim_prev_counterparty_commitment_after_current_counterpa | |||
fn do_test_retries_own_commitment_broadcast_after_reorg(anchors: bool, revoked_counterparty_commitment: bool) { | |||
// Tests that a node will retry broadcasting its own commitment after seeing a confirmed | |||
// counterparty commitment be reorged out. | |||
let mut chanmon_cfgs = create_chanmon_cfgs(2); | |||
let mut chanmon_cfgs = create_chanmon_cfgs(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems all the changes in this file can be reverted.
#[test] | ||
fn test_drop_disconnected_peers_when_removing_channels() { | ||
let chanmon_cfgs = create_chanmon_cfgs(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems all the test changes in this file from here down can be reverted.
Updated from pr2731.16 to pr2731.17 (diff): Updates:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One small nit otherwise just needs another reviewer.
Updated from pr2731.17 to pr2731.18 (diff): Update:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though let's perhaps have @TheBlueMatt take another look because he suggested the opposite change in tests from the one I suggested.
And thank you very much for the renaming!
I don't feel super strongly about the test changes, but we may have to drop some of the assertions when we eventually move the test out of |
- We might generate channel updates to be broadcast when we are not connected to any peers to broadcast them to. - This PR ensures to cache them and broadcast them only when we are connected to some peers. Other Changes: 1. Introduce a test. 2. Update the relevant current tests affected by this change. 3. Fix a typo. 4. Introduce two functions in functional_utils that optionally connect and disconnect a dummy node during broadcast testing.
Updated from pr2731.19 to pr2731.20 (diff): Update:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
resolves #2711
We might generate channel updates to be broadcasted when we are not connected to any peers to broadcast them to. This PR ensures to cache them and broadcast them only when we are connected to some peers.