Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure successful message propagation in case of disconnection mid-handshake #2725

Merged
merged 2 commits into from Feb 5, 2024

Conversation

shaavan
Copy link
Contributor

@shaavan shaavan commented Nov 10, 2023

Resolves #2096

  • This PR ensures that we don't immediately force-close OutboundV1Channel in case of disconnection mid-handshake.
  • Instead, we rebroadcast the SendOpenChannel message if the peer reconnects within time.

@shaavan
Copy link
Contributor Author

shaavan commented Nov 10, 2023

This PR makes the following interpretation of the issue and follows the solution accordingly:

  1. Don't allow channel creation if the peer is already disconnected.
  2. But if the peer is connected, before calling the "create_channel" and disconnects at the exact time when the function is called. Allow channel creation, and resend the open_channel message again when it reconnects.
  3. Fail the creation after a few timer ticks if the peer fails to connect within the time.

If in case, my interpretation of the problem has been erroneous, do let me and I shall be glad to correct it! :)

@shaavan
Copy link
Contributor Author

shaavan commented Nov 10, 2023

Also, this PR has been set to draft because the tests are incomplete and only partially test the added code.
Any help or suggestion in making the tests work is much welcome!

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the status here. You have it marked draft, do you want feedback? Whay kind of feedback/review is this ready for?

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
@shaavan
Copy link
Contributor Author

shaavan commented Nov 14, 2023

What's the status here. You have it marked draft, do you want feedback? What kind of feedback/review is this ready for?

Hi, @TheBlueMatt
I have completed the implementation in the PR, but the problem I am facing is writing up the test for it.

So, I am facing trouble preparing a test for when the peer is reconnected on time, and hence the open channel message is sent to it because it conflicts with how the rest of the test codebase is set up.

So I wanted to get a general Approach ACK, before I go about hacking in the test to make them work.

@shaavan
Copy link
Contributor Author

shaavan commented Nov 25, 2023

Updated from pr2725.01 -> pr2725.02 (diff)
Addressed @TheBlueMatt comment

Changes:

  1. Rebased on Main.
  2. Used data in Channel to reconstruct the SendOpenChannel message when rebroadcasting.
  3. Clean up the faulty test.

Note:

  1. The PR has been moved from "Draft" to "Ready for Review"
  2. The tests were becoming cumbersome and hacky, so I have temporarily cleaned them up.
  3. I am looking for a general Approach ACK on this PR before working on creating a test for this approach.

@shaavan shaavan marked this pull request as ready for review November 25, 2023 11:11
@shaavan
Copy link
Contributor Author

shaavan commented Nov 25, 2023

Updated from pr2725.02 -> pr2725.03 (diff)

Changes:

  • Updated the peer_disconnected function to keep the unnotified channel around in case of sudden disconnection.
  • Also updated the peer_connected code to handle this change in behavior.

Thanks for the suggestion, @wpaulino!

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here, busy with thanksgiving travel and other stuff.

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
@shaavan
Copy link
Contributor Author

shaavan commented Dec 3, 2023

Updated from pr2725.03 -> pr2725.04 (diff)
Addressed @TheBlueMatt suggestion

Changes:

  1. Rebased on Main.
  2. Revamped the approach. Instead of sending the SendOpenChannel message later, we track if we have received an accepted channel message from them. This simplifies the approach and is in line with the current code architecture.
  3. Added test to verify this new behavior.

Thank you very much, @TheBlueMatt, for this new idea to solve this problem.

@wpaulino
Copy link
Contributor

wpaulino commented Dec 4, 2023

Instead of sending the SendOpenChannel message later, we track if we have received an accepted channel message from them.

We still need to resend open_channel if they reconnect. Also, waiting for accept_channel doesn't seem enough, they can send it and immediately disconnect after, and we're left without a channel anyway.

@shaavan
Copy link
Contributor Author

shaavan commented Dec 5, 2023

@wpaulino

We still need to resend open_channel if they reconnect.

You are right! That was an oversight from my side. Thank you for pointing it out.

Also, waiting for accept_channel doesn't seem enough, they can send it and immediately disconnect after, and we're left without a channel anyway.

You are right. However, the goal of the PR is to ensure proper execution of the create_channel function. Since the scope of create_channel ends at properly sending the SendOpenChannel message, its correct execution can be ensured by confirming that we received the AcceptChannel message.
If, by chance, the peer disconnects after sending AcceptChannel when we are creating the funding, the Outbound Channel will simply be removed because it was an unfunded channel.
Maybe, we will also need to handle the case of proper transmission of funding-created messages for a suddenly disconnected peer, but that seems to be outside the scope of this PR, and we can handle it later.

@wpaulino
Copy link
Contributor

wpaulino commented Dec 5, 2023

However, the goal of the PR is to ensure proper execution of the create_channel function. Since the scope of create_channel ends at properly sending the SendOpenChannel message, its correct execution can be ensured by confirming that we received the AcceptChannel message.

I don't think we necessarily care about that. What we really want is to end up with a funded channel (within a reasonable timeout) if a user requests one while being able to handle the counterparty disconnecting mid-handshake.

Maybe, we will also need to handle the case of proper transmission of funding-created messages for a suddenly disconnected peer, but that seems to be outside the scope of this PR, and we can handle it later.

Typically nodes forget all about channels before sending/receiving funding_signed, so we'll need to retransmit all messages starting from open_channel after a reconnection.

@shaavan
Copy link
Contributor Author

shaavan commented Dec 7, 2023

Thanks, @wpaulino, for the details about the message transmissions!

Seems like it's worth considering extending the PR from fixing the original issue to not failing channel creation mid-handshake due to channel disconnection.

I am tinkering with an approach, and I shall update the PR very soon!

@shaavan
Copy link
Contributor Author

shaavan commented Dec 7, 2023

Update:

Okay, so I have figured out an approach, but this depends on the behavior changes introduced in #2760.

We can track the list of msg_events we have sent during the handshake process, which can be used in case a peer disconnects midway.

Once the funding is signed, we graduate the channel from OutboundV1 to Channel. And so after that, we stop tracking the msg_events.

However, currently, in the main, we graduate the channel as soon as we have created the funding.

let (chan: Channel<SP>, msg_opt) = match peer_state.channel_by_id.remove(temporary_channel_id) {

...

},

Since #2760 is already getting approval and will soon be merged, I shall build this new approach over the changes introduced there.

@TheBlueMatt
Copy link
Collaborator

I don't think we need to explicitly track which message we're ready to send to our counterparty - if we are disconnected from a peer, then reconnect prior to funding, we have to restart from the open_channel step - the counterparty may have forgotten about the channel.

@shaavan
Copy link
Contributor Author

shaavan commented Dec 10, 2023

Updated from pr2725.04 -> pr2725.05 (diff)
Addressed @wpaulino and @TheBlueMatt comments

Updates:

  1. Rebased on main.
  2. Approach update

Logic:

-> Follow the standard handshake routine.

-> If we disconnect mid-handshake from our peer (that is, OutboundV1Channel is not resolved to a funded channel), we don't immediately close the OutboundV1Channel.

-> Instead, we track how long it has been since we disconnected from peers.

-> If we connect back within time, we rebroadcast SendOpenChannel corresponding to OutboundV1Channel to the peer.

-> If we do not connect back within N (=2) timer ticks, we force close and remove the channel.

Note:

  1. To also handle the case of further disconnection mid-handshake, the timer resets when the peer connects back.

@shaavan shaavan changed the title Don't Discard the create_channel for a suddenly disconnected peer Ensure successfully message propagation in case of disconnection mid-handshake Dec 10, 2023
@shaavan shaavan changed the title Ensure successfully message propagation in case of disconnection mid-handshake Ensure successful message propagation in case of disconnection mid-handshake Dec 10, 2023
@shaavan
Copy link
Contributor Author

shaavan commented Dec 11, 2023

Updated from pr2725.05 -> pr2725.06 (diff)

Changes:

  1. Updated the ClosureReason from HolderForceClosed -> PeerDisconnected as it is more apt.
  2. Updated the test introduced in this PR accordingly.
  3. Updated other relevant tests to account for the behavior change introduced in this PR.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM. One comment. Note that the test fixes in the last commit will need to get squashed into the commit that broke tests. We require (but don't actually check in CI) that each individual commit builds and passes tests.

lightning/src/ln/channel.rs Outdated Show resolved Hide resolved
@shaavan
Copy link
Contributor Author

shaavan commented Dec 12, 2023

Updated from pr2725.06 -> pr2725.07 (diff)
Addressed @TheBlueMatt comment

Update:

  1. Squashed the updates introduced in tests with the changes that broke them so that each commit now individually passes all the tests.

@codecov-commenter
Copy link

codecov-commenter commented Dec 12, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (5bf58f0) 89.14% compared to head (0fd3d31) 89.14%.

❗ Current head 0fd3d31 differs from pull request most recent head a6a6b48. Consider uploading reports for the commit a6a6b48 to get more accurate results

Files Patch % Lines
lightning/src/ln/functional_tests.rs 97.95% 2 Missing ⚠️
lightning/src/ln/channelmanager.rs 94.73% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2725      +/-   ##
==========================================
- Coverage   89.14%   89.14%   -0.01%     
==========================================
  Files         116      116              
  Lines       93205    93186      -19     
  Branches    93205    93186      -19     
==========================================
- Hits        83089    83066      -23     
- Misses       7583     7587       +4     
  Partials     2533     2533              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lightning/src/ln/channel.rs Outdated Show resolved Hide resolved
lightning/src/ln/channel.rs Outdated Show resolved Hide resolved
lightning/src/ln/channel.rs Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between fbeb7ac and 59e6af3.
Files selected for processing (2)
  • lightning/src/ln/channelmanager.rs (4 hunks)
  • lightning/src/ln/functional_tests.rs (4 hunks)
Additional comments: 8
lightning/src/ln/functional_tests.rs (4)
  • 3701-3704: The test test_peer_disconnected_before_funding_broadcasted has been correctly marked as ignored, which aligns with the PR's objective to change the behavior of channel closure upon disconnection.
  • 10514-10557: The new test test_channel_close_when_not_timely_accepted is added to simulate the scenario where a channel is not accepted in a timely manner. The test logic appears to follow the intended behavior of the PR, but it should be verified that the test covers all new edge cases introduced by the PR.
  • 10558-10596: The new test test_rebroadcast_open_channel_when_reconnect_mid_handshake is added to ensure that the SendOpenChannel message is rebroadcast upon reconnection. This aligns with the PR's objectives and should be verified for completeness.
Verification successful

The extended context now includes the assertions after the reconnection logic in the test_rebroadcast_open_channel_when_reconnect_mid_handshake test. It confirms that after the peers reconnect, the test checks that the SendOpenChannel message is indeed added to node_0's pending message events, with the assertion assert_eq!(events.len(), 1);. This assertion is critical as it verifies that the SendOpenChannel message is rebroadcast upon reconnection, which aligns with the PR's objectives.

Based on the provided context, the test test_rebroadcast_open_channel_when_reconnect_mid_handshake appears to be complete and covers the rebroadcast logic as intended by the PR.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the new test `test_rebroadcast_open_channel_when_reconnect_mid_handshake` is complete and covers the rebroadcast logic.
rg --type rust "test_rebroadcast_open_channel_when_reconnect_mid_handshake"

Length of output: 179



Script:

#!/bin/bash
# Print the lines of code around the test function to examine the test logic in more detail.
rg --type rust --context 10 "test_rebroadcast_open_channel_when_reconnect_mid_handshake" lightning/src/ln/functional_tests.rs

Length of output: 1061



Script:

#!/bin/bash
# Extend the range of context lines to capture more of the test function.
rg --type rust --context 30 "test_rebroadcast_open_channel_when_reconnect_mid_handshake" lightning/src/ln/functional_tests.rs

Length of output: 2994



Script:

#!/bin/bash
# Request more lines after the last shown to see the assertions related to the rebroadcast of the `SendOpenChannel` message.
rg --type rust --after-context 20 "peer_connected" lightning/src/ln/functional_tests.rs

Length of output: 6881

* 10781-10785: The code block is part of a test that simulates the timeout behavior for channel closure. It should be verified that the timeout value `UNFUNDED_CHANNEL_AGE_LIMIT_TICKS` is consistent with the intended behavior and that the test accurately reflects the new logic.
Verification successful

The timeout value UNFUNDED_CHANNEL_AGE_LIMIT_TICKS is used consistently across the tests in functional_tests.rs to simulate the passage of time and to test the behavior of channel closure due to unfunded channels over time. The tests reflect the new logic introduced in the PR, where channels are expected to be closed if they remain unfunded past the age limit and the peer does not reconnect within the specified timeframe. The test setup and assertions are aligned with the intended behavior of handling disconnections and reconnections during the handshake process.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the timeout value used in the test is consistent with the intended behavior.
rg --type rust "UNFUNDED_CHANNEL_AGE_LIMIT_TICKS"

Length of output: 868



Script:

#!/bin/bash
# Let's examine the surrounding context of the test where `UNFUNDED_CHANNEL_AGE_LIMIT_TICKS` is used.
# This will help us understand how the test is set up and whether it accurately reflects the new logic.
rg --type rust -C 10 "UNFUNDED_CHANNEL_AGE_LIMIT_TICKS" lightning/src/ln/functional_tests.rs

Length of output: 5292

lightning/src/ln/channelmanager.rs (4)
  • 896-896: The addition of a condition to check for UnfundedOutboundV1 channels in the is_live function seems to align with the PR's objective to handle disconnections during the handshake. This change ensures that channels in the UnfundedOutboundV1 phase are considered not live, which is consistent with the intended logic.
  • 8873-8876: The comments added provide clarity on the handling of UnfundedOutboundV1 channels, indicating that these channels are retained for some time in case the peer reconnects. This is a logical change that supports the PR's goal of making the handshake process more robust against unexpected disconnections.
  • 9017-9032: The logic added to get_and_clear_pending_msg_events function to handle UnfundedOutboundV1 channels by pushing a SendOpenChannel event is in line with the PR's objectives. This ensures that if a peer reconnects, the open_channel message is resent, which is necessary since the peer may have forgotten about the channel. However, ensure that there is a mechanism to prevent an infinite loop of reconnection and message rebroadcast if the peer keeps disconnecting and reconnecting.
  • 11149-11149: The import changes in hunk 4 seem unrelated to the PR's objectives. Ensure that these imports are used in the code that follows, otherwise, they should be removed to keep the code clean and maintainable.
Verification successful

The imports PaymentPreimage, PaymentHash, and PaymentSecret are used in the codebase, as evidenced by the search results. Therefore, they are related to the PR's objectives and should not be removed.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the newly added imports are used in the code.
rg --type rust "PaymentPreimage|PaymentHash|PaymentSecret|create_recv_pending_htlc_info|HTLCForwardInfo|inbound_payment|PaymentId|PaymentSendFailure|RecipientOnionFields|InterceptId"

Length of output: 146768

lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/functional_tests.rs Show resolved Hide resolved
lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
@shaavan
Copy link
Contributor Author

shaavan commented Jan 19, 2024

Updated from pr2725.10 -> pr2725.11 (diff)
Addressed @jkczyz and @coderabbitai comments

  • Update comments to fix grammatical mistakes and better express the intended behavior
  • Stop ignoring test_peer_disconnected_before_funding_broadcasted, and instead update it to account for behavior updates.
  • Expand the introduced test to check if node[1] has immediately dropped the Unfunded Inbound Channel

Copy link

coderabbitai bot commented Jan 19, 2024

@shaavan, the updates you've made in response to the comments seem to be well-detailed and focused on improving the PR's clarity and functionality. It's good to see that you've expanded the tests to cover the new behavior thoroughly.

Regarding the mention of @coderabbitai, it seems there might be a mix-up as there are no previous comments from that handle in this thread. If there's anything specific you'd like me to review or if there are any other concerns you have, please let me know!

(\( ⁰⊖⁰)/)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 4

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 59e6af3 and 15792af.
Files selected for processing (2)
  • lightning/src/ln/channelmanager.rs (3 hunks)
  • lightning/src/ln/functional_tests.rs (4 hunks)
Additional comments: 6
lightning/src/ln/functional_tests.rs (3)
  • 3727-3737: The logic for handling disconnection before funding transaction broadcast is clear. However, the test seems to be disabled. Confirm if this is intentional and if so, provide a reason or a TODO comment for future enablement.
  • 10513-10561: The new test test_channel_close_when_not_timely_accepted is well-structured and seems to cover the scenario it's designed for. However, ensure that the test is enabled and verify that it passes in the test suite.
  • 10563-10607: The test test_rebroadcast_open_channel_when_reconnect_mid_handshake appears to correctly simulate the scenario of a peer disconnecting and reconnecting mid-handshake. Verify that the test is enabled and that it passes in the test suite.
lightning/src/ln/channelmanager.rs (3)
  • 895-897: The logic here checks if any channel is in the Funded or UnfundedOutboundV1 phase. Ensure that this logic aligns with the intended behavior of the is_live function, especially considering the new UnfundedOutboundV1 state.
  • 8876-8877: The UnfundedOutboundV1 channel phase is set to always return true, which implies that these channels are considered live even if the peer is disconnected. Confirm that this behavior is consistent with the overall system logic and that it won't lead to any unexpected side effects.
  • 9028-9032: The addition of logic to push a SendOpenChannel message for UnfundedOutboundV1 channels is consistent with the PR's objective to allow rebroadcasting if the peer reconnects. Ensure that the get_open_channel function generates the correct message and that this behavior is tested.

lightning/src/ln/channelmanager.rs Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than some comment re-phrasing.

lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
@shaavan
Copy link
Contributor Author

shaavan commented Jan 22, 2024

Updated from pr2725.11 -> pr2725.12 (diff)
Addressed @jkczyz comments

  • Improve the comment messages.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 59e6af3 and f9fbdca.
Files selected for processing (2)
  • lightning/src/ln/channelmanager.rs (3 hunks)
  • lightning/src/ln/functional_tests.rs (4 hunks)
Files skipped from review as they are similar to previous changes (2)
  • lightning/src/ln/channelmanager.rs
  • lightning/src/ln/functional_tests.rs

jkczyz
jkczyz previously approved these changes Jan 22, 2024
@shaavan
Copy link
Contributor Author

shaavan commented Jan 26, 2024

@TheBlueMatt
A gentle ping.
I think this PR is ready for a (potentially final) review :)

@shaavan
Copy link
Contributor Author

shaavan commented Jan 27, 2024

Updated from pr2725.12 -> pr2725.13 (diff)
Addressed @TheBlueMatt comments

  1. Repurpose the test_disconnect_in_funding_batch to check if all the channels of the batch close if one of them is closed.
  2. Use nodes[0].node.list_channels() in introduced test for cleaner code.
  3. Made the added test more precise by adding a test for correct msg-type along with the number of msgs.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 51d9ee3 and 0fd3d31.
Files selected for processing (2)
  • lightning/src/ln/channelmanager.rs (3 hunks)
  • lightning/src/ln/functional_tests.rs (6 hunks)
Files skipped from review as they are similar to previous changes (1)
  • lightning/src/ln/channelmanager.rs
Additional comments: 6
lightning/src/ln/functional_tests.rs (6)
  • 3727-3739: The logic for handling peer disconnection before funding is broadcasted seems to correctly simulate the disconnection and checks for the expected closure reasons. However, it's important to ensure that the UNFUNDED_CHANNEL_AGE_LIMIT_TICKS constant is appropriately defined and used across the test to simulate the timeout accurately.

Ensure UNFUNDED_CHANNEL_AGE_LIMIT_TICKS is defined with a value that accurately represents the intended timeout duration for the test scenario.

  • 10515-10559: The test test_channel_close_when_not_timely_accepted simulates a scenario where peers disconnect mid-handshake, and the channel is not timely accepted. The test setup and the disconnection simulation are correctly implemented. However, the assertion that checks the channel's state after disconnection (line 10534) and the assertion for the channel's closure (line 10550) are critical to validate the intended behavior. It's essential to ensure that these assertions accurately reflect the expected state changes in the system under test.

Verify that the assertions accurately reflect the expected outcomes and that the test covers all relevant scenarios for the feature being tested.

  • 10561-10604: The test test_rebroadcast_open_channel_when_reconnect_mid_handshake correctly simulates a peer disconnection and reconnection mid-handshake. The test ensures that the SendOpenChannel message is rebroadcast upon reconnection (lines 10598-10603). This behavior aligns with the PR's objective to improve the robustness of the channel handshake process. However, it's crucial to verify that the rebroadcast logic is implemented as intended in the actual system code and not just within the test environment.

Confirm that the rebroadcast logic for the SendOpenChannel message upon peer reconnection is correctly implemented in the system code and not solely within the test.

  • 10762-10764: The introduction of the test test_close_in_funding_batch aims to ensure that if one channel in a batch closes, the entire batch is closed. This test is crucial for validating the robustness of batch processing in channel funding. It's important to ensure that the test setup correctly simulates the batch funding scenario and that the logic for triggering a channel close within the batch is accurately implemented.

Ensure the test accurately simulates batch funding scenarios and correctly implements the logic for closing a channel within a batch.

  • 10788-10820: The logic within test_close_in_funding_batch for force-closing a channel and verifying the closure of all channels in the batch (lines 10794-10820) is critical for ensuring the intended behavior of batch processing. The assertions and checks (lines 10797-10803, 10805-10809, and 10811-10818) are essential for validating the state of the system after a force-close operation. It's important to verify that these checks accurately reflect the expected outcomes and that the test covers all relevant scenarios for batch processing in channel funding.

Verify that the assertions and checks within the test accurately reflect the expected outcomes for batch processing in channel funding and that all relevant scenarios are covered.

  • 10820-10820: The final assertion in test_close_in_funding_batch that checks for the immediate closure of all channels in the batch upon a single channel's force-close (line 10820) is a key part of validating the intended behavior. However, it's crucial to ensure that this behavior aligns with the system's design and that the test accurately reflects the real-world scenario it intends to simulate.

Confirm that the immediate closure of all channels in a batch upon a single channel's force-close aligns with the system's design and that the test accurately simulates this scenario.

lightning/src/ln/functional_tests.rs Show resolved Hide resolved
lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
- Do not remove channel immediately when peer_disconnect, instead
  removed it after some time if peer doesn't reconnect soon (handled in
previous commit).
- Do not mark per ok_to_remove if we have some OutboundV1Channels too.
- Rebroadcast SendOpenChannel for outboundV1Channel when peer
  reconnects.
- Update the relevant tests to account for the behavior change.
- Repurpose the test_disconnect_in_funding_batch to test that all
  channels in the batch close when one them closes.
- The first test make sure that the OutboundV1Channel is not
immediately removed when peers disconnect, but is removed after N timer
ticks.
- The second test makes sure that the SendOpenChannel is rebroadcasted
for the OutboundV1Channel if peer reconnects within time.
@shaavan
Copy link
Contributor Author

shaavan commented Jan 31, 2024

Updated from pr2725.13 -> pr2725.14 (diff)
Addressed @TheBlueMatt comments

  • Use list_channels in the introduced test wherever necessary.
  • Made the check for msg_events more thorough by checking for the specific expected message.
  • Clean up commit structure.

Updated from pr2725.14 -> pr2725.15

Range-diff
git range-diff fbeb7ac9e1256f5d69b0cda580b56c5025ffa987..411462cce2c23d5a3faa72641107f4d2e43cf5a5 5bf58f0d33b13bd3d9f88c4f2021fbd77745274a..ddf75afd167ac4adb5824180b4125d87563fa31a
1:  411462cc ! 1:  ddf75afd Do not remove Outbound Channel immediately when peer disconnects
    @@ lightning/src/ln/functional_tests.rs: fn test_disconnect_in_funding_batch() {
     +  // Force-close the channel for which we've completed the initial monitor.
        let funding_txo_1 = OutPoint { txid: tx.txid(), index: 0 };
        let funding_txo_2 = OutPoint { txid: tx.txid(), index: 1 };
    -   let channel_id_1 = funding_txo_1.to_channel_id();
    -   let channel_id_2 = funding_txo_2.to_channel_id();
    +   let channel_id_1 = ChannelId::v1_from_funding_outpoint(funding_txo_1);
    +   let channel_id_2 = ChannelId::v1_from_funding_outpoint(funding_txo_2);
     +
     +  nodes[0].node.force_close_broadcasting_latest_txn(&channel_id_1, &nodes[1].node.get_our_node_id()).unwrap();
     +
  • Rebased over main

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 5bf58f0 and a6a6b48.
Files selected for processing (2)
  • lightning/src/ln/channelmanager.rs (3 hunks)
  • lightning/src/ln/functional_tests.rs (6 hunks)
Files skipped from review as they are similar to previous changes (1)
  • lightning/src/ln/channelmanager.rs
Additional comments: 7
lightning/src/ln/functional_tests.rs (7)
  • 3698-3698: The test description for test_peer_disconnected_before_funding_broadcasted is clear and sets the context well for what the test aims to achieve. However, ensure that the test implementation fully covers the scenario described, including both the disconnection and the failure to reconnect within the specified time.
  • 3727-3739: The logic to simulate peer disconnection before funding is broadcasted and to check the channel closure with the appropriate ClosureReason is implemented correctly. However, consider adding a comment explaining why UNFUNDED_CHANNEL_AGE_LIMIT_TICKS is used to simulate the passage of time and its significance in the context of this test.
  • 10517-10557: The test test_channel_close_when_not_timely_accepted correctly simulates a scenario where a peer disconnects mid-handshake and checks the state of channels and peer state after a specified time has passed. This test effectively covers the new behavior introduced in the PR. Ensure that the constants used, like UNFUNDED_CHANNEL_AGE_LIMIT_TICKS, are well-documented and their values are justified within the context of this test.
  • 10560-10598: The test test_rebroadcast_open_channel_when_reconnect_mid_handshake accurately simulates the scenario of peer disconnection and reconnection during the handshake process. It checks that the SendOpenChannel message is rebroadcast upon reconnection, aligning with the PR's objectives. This test is well-structured and covers the critical functionality introduced. Ensure that the test includes assertions for the state of both nodes after reconnection to fully validate the rebroadcast logic.
  • 10756-10756: The introduction of test_close_in_funding_batch aims to test the behavior when one of the channels in a batch closes. This is a good addition to ensure that batch processing of channel closures behaves as expected. However, the test description could be expanded to detail the expected behavior of the batch closure process for clarity.
  • 10782-10813: In test_close_in_funding_batch, the logic to force-close a channel and check the resulting state, including monitor updates and message events, is implemented correctly. This test effectively validates the behavior when a channel in a funding batch is closed. Ensure that the test also verifies the state of other channels in the batch to confirm that they are affected as expected by the batch closure process.
  • 10814-10814: The assertion that all channels in the batch should close immediately after one channel is force-closed is a critical part of test_close_in_funding_batch. This ensures that the batch processing logic is working as intended. Consider adding more detailed assertions to verify the closure reasons for each channel in the batch to ensure they align with the expected outcomes.

_ => panic!("Unexpected message."),
}

// We broadcast the commitment transaction as part of the force-close.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, this is kinda dumb, maybe we should fix that, but its not super critical and certainly unrelated to this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super interested in understanding the issue here! And probably might give it a try if it's not a super biggie!

@TheBlueMatt TheBlueMatt merged commit 8d9d099 into lightningdevkit:main Feb 5, 2024
14 of 15 checks passed
@shaavan shaavan deleted the issue2096 branch February 6, 2024 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't discard create _channel if a peer goes away
5 participants