Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix case where Opening Channels get stuck forever. #8406

Merged

Conversation

ziggie1984
Copy link
Collaborator

@ziggie1984 ziggie1984 commented Jan 20, 2024

Fixes #8362 and I also #8251

So basically the removePendingChannel channel was not initialized causing our reservationCoordinator to get stuck forever. So after one call of the failFundingFlow we would not be able to accept or open any new channels.

It did not panic because nil channels are acutally quite useful in golang to merge 2 channels for more information read:

https://medium.com/justforfunc/why-are-there-nil-channels-in-go-9877cc0b2308

I will add an itest to test the successful removal of a failed channel opening.

Summary by CodeRabbit

  • New Features

    • Introduced new configuration options for improved channel management.
    • Added the ability to set custom timeouts for reservations and intervals for zombie channel sweeping.
  • Bug Fixes

    • Fixed an issue with the removal of failed channels in the Lightning Network Daemon to enhance error handling.
  • Tests

    • Implemented new test cases to validate the failure of channel funding and the removal of pending channels.
  • Documentation

    • Updated release notes with details on the latest fixes and configuration options.

Copy link

coderabbitai bot commented Jan 20, 2024

Important

Auto Review Skipped

Auto reviews are limited to the following labels: llm-review. Please add one of these labels to enable auto reviews.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

The recent update to the Lightning Network Daemon (lnd) introduces new configuration parameters to improve channel management and error handling. Notably, ReservationTimeout and ZombieSweeperInterval have been added to enhance robustness in channel operations. The codebase also addresses an issue with hanging batchopenchannel commands and includes new test cases to ensure the stability of PSBT-based channel funding. Overall, these changes aim to bolster reliability and user experience in managing Lightning Network channels.

Changes

File(s) Change Summary
config.go, lncfg/config.go, sample-lnd.conf Introduced ReservationTimeout and ZombieSweeperInterval fields; moved BackupFilePath; reformatted field descriptions; added constants and config options.
docs/release-notes/release-notes-0.18.0.md Documented the fix for the removal of failed channels.
funding/manager.go Introduces new fields ReservationTimeout and ZombieSweeperInterval to the DevConfig struct; updates methods to use these new fields conditionally.
itest/.../list_on_test.go, itest/.../lnd_psbt_test.go Added new test case and function to simulate and test PSBT funding flow failure.
lntest/harness_assertion.go Added ReceiveOpenChannelError function for open channel error handling.
peer/brontide.go, peer/brontide_test.go Modified NewBrontide and added test for removing pending channels.
server.go Updated newServer function to use new config values instead of hardcoded ones.

Assessment against linked issues

Objective Addressed Explanation
Identify if the batchopenchannel hang is a bug or misconfiguration (#8362) The changes do not explicitly address the root cause of the batchopenchannel hang.
Ensure lncli batchopenchannel functions correctly (#8362) The changes do not explicitly state improvements or fixes related to the lncli batchopenchannel command.
Investigate the cause of the indefinite hang in batchopenchannel (#8362) The changes do not explicitly address the investigation of the indefinite hang in batchopenchannel.

Poem

When channels fail and funds do stick,
CodeRabbit hops in, with a fix real quick.
With tweaks in the code and tests anew,
Lightning strikes true, as channels renew. 🐰⚡️

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@ziggie1984 ziggie1984 added this to the v0.17.4 milestone Jan 20, 2024
@ziggie1984 ziggie1984 self-assigned this Jan 20, 2024
@ziggie1984 ziggie1984 added the bug Unintended code behaviour label Jan 20, 2024
C-Otto added a commit to C-Otto/lnd that referenced this pull request Jan 20, 2024
@ziggie1984 ziggie1984 marked this pull request as ready for review January 21, 2024 16:34
sample-lnd.conf Outdated
@@ -293,6 +293,15 @@
; The maximum number of incoming pending channels permitted per peer.
; maxpendingchannels=1

; The maximum time until we delete a pending channel opening flow from memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't put test-only (or rather -mostly?) information in the public sample configuration. Assuming developers are clever enough to find those settings, I'd rather avoid confusing "plebs" with this. The defaults are fine, I guess?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Maybe we can make them a development only config by adding the flags to DevConfig in lncfg/dev_integration.go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lemmie know what you think with the current approach.

@saubyk saubyk added the channel management The management of the nodes channels label Jan 21, 2024
@saubyk saubyk linked an issue Jan 21, 2024 that may be closed by this pull request
Copy link
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great catch and great test additions 🔥 LGTM!

peer/brontide.go Show resolved Hide resolved
peer/brontide_test.go Outdated Show resolved Hide resolved
peer/brontide_test.go Show resolved Hide resolved
config.go Outdated Show resolved Hide resolved
itest/lnd_psbt_test.go Outdated Show resolved Hide resolved
docs/release-notes/release-notes-0.18.0.md Outdated Show resolved Hide resolved
@ziggie1984
Copy link
Collaborator Author

@coderabbitai review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 5

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 0a29b37 and c61b346.
Files selected for processing (10)
  • config.go (2 hunks)
  • docs/release-notes/release-notes-0.18.0.md (1 hunks)
  • itest/list_on_test.go (1 hunks)
  • itest/lnd_psbt_test.go (2 hunks)
  • lncfg/config.go (2 hunks)
  • lntest/harness_assertion.go (1 hunks)
  • peer/brontide.go (1 hunks)
  • peer/brontide_test.go (3 hunks)
  • sample-lnd.conf (1 hunks)
  • server.go (1 hunks)
Additional comments: 16
lncfg/config.go (1)
  • 20-26: The addition of DefaultReservationTimeout and DefaultZombieSweeperInterval constants introduces new default durations for channel management. Ensure that these values are appropriate and consistent with the intended behavior described in the PR.
docs/release-notes/release-notes-0.18.0.md (1)
  • 68-70: The release notes have been updated to include a fix for the removal of failed channels. This entry should be verified to ensure it accurately reflects the changes made and the issues addressed in the PR.
Verification successful

Verifying the Release Notes Entry

The release notes entry for the fix related to the removal of failed channels in the Lightning Network Daemon (lnd) is as follows:

The comment requests verification to ensure that this entry accurately reflects the changes made and the issues addressed in the PR. To verify this, I retrieved the title and body of the PR linked in the release notes using the provided shell script.

The output of the script shows that the PR title is "Fix case where Opening Channels get stuck forever." The body of the PR explains the issue and the changes made to address it. It also mentions the addition of an integration test to ensure the successful removal of a failed channel opening.

The PR title and body align with the release notes entry, confirming that the release notes accurately reflect the changes made and the issues addressed in the PR.

Therefore, the release notes entry for the fix related to the removal of failed channels in the Lightning Network Daemon (lnd) is accurate.

Conclusion

The release notes entry for the fix related to the removal of failed channels in the Lightning Network Daemon (lnd) accurately reflects the changes made and the issues addressed in the PR.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the linked PR in the release notes correctly addresses the issue.
curl -s https://api.github.com/repos/lightningnetwork/lnd/pulls/8406 | jq '.title,.body'

Length of output: 1472

itest/list_on_test.go (1)
  • 565-568: A new test case named "fail funding flow psbt" has been added. Verify that this test case is included in the test suite and that it effectively tests the failure of the PSBT funding flow as intended.
itest/lnd_psbt_test.go (2)
  • 1352-1357: The arguments --reservationtimeout=1s and --zombiesweeperinterval=1s are set to ensure the remote peer fails the funding process quickly. This is a logical approach for testing failure scenarios.
  • 1386-1391: The error handling in ht.ReceiveOpenChannelError uses fmt.Errorf to create an expected error, but the fmt package is not used elsewhere in the test. This is the correct usage of fmt.Errorf for simulating an expected error.
peer/brontide_test.go (1)
  • 1457-1547: The new test function TestRemovePendingChannel has been added to ensure that pending channels can be removed successfully and that the removePendingChannel channel is initialized properly. This test function appears to be well-structured and covers the necessary assertions to validate the behavior of adding and then removing a pending channel. The use of wait.NoError from the lntest/wait package is appropriate for waiting on conditions within a timeout period.
lntest/harness_assertion.go (1)
  • 292-300: The function ReceiveOpenChannelError correctly waits for an expected error during the open channel flow and uses an internal function receiveOpenChannelUpdate to receive updates. Ensure that the error message includes sufficient context for debugging.
config.go (7)
  • 349-354: The Config struct has been updated with new fields ReservationTimeout and ZombieSweeperInterval. Ensure that these new fields are properly documented, initialized, and used throughout the codebase.
  • 349-354: The BackupFilePath field has been moved within the Config struct. Ensure that this change does not affect the initialization or usage of this field, potentially causing file path issues.
  • 349-354: The descriptions for UnsafeDisconnect, UnsafeReplay, and MaxPendingChannels have been updated. Confirm that these changes are accurate and reflect the intended functionality.
  • 580-588: Default values for ReservationTimeout and ZombieSweeperInterval are set in the DefaultConfig function. Verify that these defaults are sensible and align with the system's expectations.
Verification successful

It's clear that the default values for ReservationTimeout and ZombieSweeperInterval are set in the lncfg/config.go file. The DefaultReservationTimeout is set to 10 minutes, and the DefaultZombieSweeperInterval is set to 1 minute. These values seem sensible for the system's expectations, as they provide reasonable timeouts for channel reservations and zombie sweeper intervals.

Given this information, the defaults appear to be aligned with the system's expectations. Therefore, the verification is successful.

Conclusion

The default values for ReservationTimeout and ZombieSweeperInterval are set to 10 minutes and 1 minute, respectively, in the lncfg/config.go file, and they align with the system's expectations. The verification is successful.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Check for the default values of ReservationTimeout and ZombieSweeperInterval.
rg --type go "DefaultReservationTimeout|DefaultZombieSweeperInterval"

Length of output: 513

* 577-591: > 📝 **NOTE** > This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [349-588]

The ValidateConfig function has been updated. Ensure that the new configuration fields are validated correctly and that any new constraints or checks are properly implemented.

  • 580-588: The LoadConfig function has been updated. Ensure that the new configuration fields are loaded correctly and that any new logic does not introduce errors or misconfigurations.
  • 580-588: The ValidateConfig function has been updated. Ensure that the new configuration fields are validated correctly and that any new constraints or checks are properly implemented.
peer/brontide.go (1)
  • 509-511: Initialization of the removePendingChannel channel is correct and aligns with the PR objectives to fix hanging channels.
server.go (1)
  • 1439-1440: The newServer function now uses the ZombieSweeperInterval and ReservationTimeout values from the cfg parameter instead of hardcoding them. This change allows for more flexibility and customization of these parameters.

peer/brontide_test.go Show resolved Hide resolved
sample-lnd.conf Outdated Show resolved Hide resolved
sample-lnd.conf Outdated Show resolved Hide resolved
itest/lnd_psbt_test.go Outdated Show resolved Hide resolved
itest/lnd_psbt_test.go Outdated Show resolved Hide resolved
Add a unit test for the removal of a pending channel.
@ziggie1984 ziggie1984 force-pushed the fix-channel-opening-issue branch 3 times, most recently from dc7f070 to b67cb6f Compare January 22, 2024 13:04
lncfg/dev_integration.go Outdated Show resolved Hide resolved
@ziggie1984
Copy link
Collaborator Author

ziggie1984 commented Jan 22, 2024

@coderabbitai review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 4

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between e1259cd and b67cb6f.
Files selected for processing (11)
  • docs/release-notes/release-notes-0.17.4.md (1 hunks)
  • funding/manager.go (4 hunks)
  • itest/list_on_test.go (1 hunks)
  • itest/lnd_psbt_test.go (2 hunks)
  • lncfg/config.go (2 hunks)
  • lncfg/dev.go (2 hunks)
  • lncfg/dev_integration.go (2 hunks)
  • lntest/harness_assertion.go (1 hunks)
  • peer/brontide.go (1 hunks)
  • peer/brontide_test.go (3 hunks)
  • server.go (2 hunks)
Files skipped from review as they are similar to previous changes (7)
  • itest/list_on_test.go
  • itest/lnd_psbt_test.go
  • lncfg/config.go
  • lntest/harness_assertion.go
  • peer/brontide.go
  • peer/brontide_test.go
  • server.go
Additional comments: 8
lncfg/dev.go (2)
  • 5-7: The import statement is correct and follows Go conventions.
  • 27-30: The method GetReservationTimeout is correctly retrieving the default reservation timeout.
docs/release-notes/release-notes-0.17.4.md (1)
  • 22-26: The bug fix entry in the release notes is clear and informative, correctly summarizing the changes made in the PR.
lncfg/dev_integration.go (4)
  • 5-7: The import statement is correct and follows Go conventions.
  • 23-24: The new fields ReservationTimeout and ZombieSweeperInterval in the DevConfig struct are well-defined and include descriptive comments.
  • 32-38: The method GetReservationTimeout correctly provides a default value if the field is not set. However, ensure that the default value aligns with the expected behavior in the integration environment.
  • 41-47: There is a typo in the method name GeTZombieSweeperInterval; it should be GetZombieSweeperInterval. Additionally, the method correctly provides a default value if the field is not set. Verify that the default value is appropriate for the integration environment.
- func (d *DevConfig) GeTZombieSweeperInterval() time.Duration {
+ func (d *DevConfig) GetZombieSweeperInterval() time.Duration {
funding/manager.go (1)
  • 4851-4851: The check for PSBT funding reservations should be documented to explain why these are not pruned.

funding/manager.go Outdated Show resolved Hide resolved
lncfg/dev.go Outdated Show resolved Hide resolved
funding/manager.go Outdated Show resolved Hide resolved
funding/manager.go Outdated Show resolved Hide resolved
@ziggie1984 ziggie1984 force-pushed the fix-channel-opening-issue branch 3 times, most recently from ef06433 to 2cb21f0 Compare January 22, 2024 13:42
Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Just one small detail, otherwise looks good to me 💯

@@ -4809,6 +4828,13 @@ func (f *Manager) handleErrorMsg(peer lnpeer.Peer, msg *lnwire.Error) {
func (f *Manager) pruneZombieReservations() {
zombieReservations := make(pendingChannels)

reservationTimeout := f.cfg.ReservationTimeout
if f.cfg.Dev != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure there is an actual value in the dev config?
E.g. if f.cfg.Dev != nil && f.cfg.Dev.ReservationTimeout != 0 {?
Same with the sweeper interval below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or use the f.cfg.Dev.GetReservationTimeout() method that always returns a non-zero value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I will restructure to use yy's approach.

server.go Outdated Show resolved Hide resolved
// We received the AcceptChannel msg from our peer but we are not going
// to fund this channel but instead wait for our peer to fail the
// funding workflow with an internal error.
ht.ReceiveOpenChannelError(chanUpdates, chanfunding.ErrRemoteCanceled)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice itest! I think there's no more cleanup needed here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think so, the nodes are shut down when the context of the testcase cancels and if the test fails the CleanUp of the harness shuts them down.

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks 🙏

funding/manager.go Outdated Show resolved Hide resolved
This adds an itest for a failed funding flow by our peer.
Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM 🎉

@saubyk
Copy link
Collaborator

saubyk commented Jan 22, 2024

@coderabbitai resolve

@yyforyongyu yyforyongyu merged commit ec5b824 into lightningnetwork:master Jan 22, 2024
24 of 25 checks passed
michael1011 pushed a commit to michael1011/lnd that referenced this pull request Jan 22, 2024
…opening-issue

Fix case where Opening Channels get stuck forever.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unintended code behaviour channel management The management of the nodes channels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug?]: batchopenchannel hangs indefinitely After opening the channel, the node balance is lost
6 participants