New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix case where Opening Channels get stuck forever. #8406
Fix case where Opening Channels get stuck forever. #8406
Conversation
Important Auto Review SkippedAuto reviews are limited to the following labels: llm-review. Please add one of these labels to enable auto reviews. Please check the settings in the CodeRabbit UI or the To trigger a single review, invoke the WalkthroughThe recent update to the Lightning Network Daemon (lnd) introduces new configuration parameters to improve channel management and error handling. Notably, Changes
Assessment against linked issues
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
e62f5f1
to
c61b346
Compare
sample-lnd.conf
Outdated
@@ -293,6 +293,15 @@ | |||
; The maximum number of incoming pending channels permitted per peer. | |||
; maxpendingchannels=1 | |||
|
|||
; The maximum time until we delete a pending channel opening flow from memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't put test-only (or rather -mostly?) information in the public sample configuration. Assuming developers are clever enough to find those settings, I'd rather avoid confusing "plebs" with this. The defaults are fine, I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Maybe we can make them a development only config by adding the flags to DevConfig
in lncfg/dev_integration.go
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lemmie know what you think with the current approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great catch and great test additions 🔥 LGTM!
@coderabbitai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 5
Configuration used: CodeRabbit UI
Files selected for processing (10)
- config.go (2 hunks)
- docs/release-notes/release-notes-0.18.0.md (1 hunks)
- itest/list_on_test.go (1 hunks)
- itest/lnd_psbt_test.go (2 hunks)
- lncfg/config.go (2 hunks)
- lntest/harness_assertion.go (1 hunks)
- peer/brontide.go (1 hunks)
- peer/brontide_test.go (3 hunks)
- sample-lnd.conf (1 hunks)
- server.go (1 hunks)
Additional comments: 16
lncfg/config.go (1)
- 20-26: The addition of
DefaultReservationTimeout
andDefaultZombieSweeperInterval
constants introduces new default durations for channel management. Ensure that these values are appropriate and consistent with the intended behavior described in the PR.docs/release-notes/release-notes-0.18.0.md (1)
- 68-70: The release notes have been updated to include a fix for the removal of failed channels. This entry should be verified to ensure it accurately reflects the changes made and the issues addressed in the PR.
Verification successful
Verifying the Release Notes Entry
The release notes entry for the fix related to the removal of failed channels in the Lightning Network Daemon (lnd) is as follows:
The comment requests verification to ensure that this entry accurately reflects the changes made and the issues addressed in the PR. To verify this, I retrieved the title and body of the PR linked in the release notes using the provided shell script.
The output of the script shows that the PR title is "Fix case where Opening Channels get stuck forever." The body of the PR explains the issue and the changes made to address it. It also mentions the addition of an integration test to ensure the successful removal of a failed channel opening.
The PR title and body align with the release notes entry, confirming that the release notes accurately reflect the changes made and the issues addressed in the PR.
Therefore, the release notes entry for the fix related to the removal of failed channels in the Lightning Network Daemon (lnd) is accurate.
Conclusion
The release notes entry for the fix related to the removal of failed channels in the Lightning Network Daemon (lnd) accurately reflects the changes made and the issues addressed in the PR.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the linked PR in the release notes correctly addresses the issue. curl -s https://api.github.com/repos/lightningnetwork/lnd/pulls/8406 | jq '.title,.body'Length of output: 1472
itest/list_on_test.go (1)
- 565-568: A new test case named "fail funding flow psbt" has been added. Verify that this test case is included in the test suite and that it effectively tests the failure of the PSBT funding flow as intended.
itest/lnd_psbt_test.go (2)
- 1352-1357: The arguments
--reservationtimeout=1s
and--zombiesweeperinterval=1s
are set to ensure the remote peer fails the funding process quickly. This is a logical approach for testing failure scenarios.- 1386-1391: The error handling in
ht.ReceiveOpenChannelError
usesfmt.Errorf
to create an expected error, but thefmt
package is not used elsewhere in the test. This is the correct usage offmt.Errorf
for simulating an expected error.peer/brontide_test.go (1)
- 1457-1547: The new test function
TestRemovePendingChannel
has been added to ensure that pending channels can be removed successfully and that theremovePendingChannel
channel is initialized properly. This test function appears to be well-structured and covers the necessary assertions to validate the behavior of adding and then removing a pending channel. The use ofwait.NoError
from thelntest/wait
package is appropriate for waiting on conditions within a timeout period.lntest/harness_assertion.go (1)
- 292-300: The function
ReceiveOpenChannelError
correctly waits for an expected error during the open channel flow and uses an internal functionreceiveOpenChannelUpdate
to receive updates. Ensure that the error message includes sufficient context for debugging.config.go (7)
- 349-354: The
Config
struct has been updated with new fieldsReservationTimeout
andZombieSweeperInterval
. Ensure that these new fields are properly documented, initialized, and used throughout the codebase.- 349-354: The
BackupFilePath
field has been moved within theConfig
struct. Ensure that this change does not affect the initialization or usage of this field, potentially causing file path issues.- 349-354: The descriptions for
UnsafeDisconnect
,UnsafeReplay
, andMaxPendingChannels
have been updated. Confirm that these changes are accurate and reflect the intended functionality.- 580-588: Default values for
ReservationTimeout
andZombieSweeperInterval
are set in theDefaultConfig
function. Verify that these defaults are sensible and align with the system's expectations.Verification successful
It's clear that the default values for
ReservationTimeout
andZombieSweeperInterval
are set in thelncfg/config.go
file. TheDefaultReservationTimeout
is set to 10 minutes, and theDefaultZombieSweeperInterval
is set to 1 minute. These values seem sensible for the system's expectations, as they provide reasonable timeouts for channel reservations and zombie sweeper intervals.Given this information, the defaults appear to be aligned with the system's expectations. Therefore, the verification is successful.
Conclusion
The default values for
ReservationTimeout
andZombieSweeperInterval
are set to 10 minutes and 1 minute, respectively, in thelncfg/config.go
file, and they align with the system's expectations. The verification is successful.* 577-591: > 📝 **NOTE** > This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [349-588]Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for the default values of ReservationTimeout and ZombieSweeperInterval. rg --type go "DefaultReservationTimeout|DefaultZombieSweeperInterval"Length of output: 513
The
ValidateConfig
function has been updated. Ensure that the new configuration fields are validated correctly and that any new constraints or checks are properly implemented.
- 580-588: The
LoadConfig
function has been updated. Ensure that the new configuration fields are loaded correctly and that any new logic does not introduce errors or misconfigurations.- 580-588: The
ValidateConfig
function has been updated. Ensure that the new configuration fields are validated correctly and that any new constraints or checks are properly implemented.peer/brontide.go (1)
- 509-511: Initialization of the
removePendingChannel
channel is correct and aligns with the PR objectives to fix hanging channels.server.go (1)
- 1439-1440: The
newServer
function now uses theZombieSweeperInterval
andReservationTimeout
values from thecfg
parameter instead of hardcoding them. This change allows for more flexibility and customization of these parameters.
Add a unit test for the removal of a pending channel.
dc7f070
to
b67cb6f
Compare
@coderabbitai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 4
Configuration used: CodeRabbit UI
Files selected for processing (11)
- docs/release-notes/release-notes-0.17.4.md (1 hunks)
- funding/manager.go (4 hunks)
- itest/list_on_test.go (1 hunks)
- itest/lnd_psbt_test.go (2 hunks)
- lncfg/config.go (2 hunks)
- lncfg/dev.go (2 hunks)
- lncfg/dev_integration.go (2 hunks)
- lntest/harness_assertion.go (1 hunks)
- peer/brontide.go (1 hunks)
- peer/brontide_test.go (3 hunks)
- server.go (2 hunks)
Files skipped from review as they are similar to previous changes (7)
- itest/list_on_test.go
- itest/lnd_psbt_test.go
- lncfg/config.go
- lntest/harness_assertion.go
- peer/brontide.go
- peer/brontide_test.go
- server.go
Additional comments: 8
lncfg/dev.go (2)
- 5-7: The import statement is correct and follows Go conventions.
- 27-30: The method
GetReservationTimeout
is correctly retrieving the default reservation timeout.docs/release-notes/release-notes-0.17.4.md (1)
- 22-26: The bug fix entry in the release notes is clear and informative, correctly summarizing the changes made in the PR.
lncfg/dev_integration.go (4)
- 5-7: The import statement is correct and follows Go conventions.
- 23-24: The new fields
ReservationTimeout
andZombieSweeperInterval
in theDevConfig
struct are well-defined and include descriptive comments.- 32-38: The method
GetReservationTimeout
correctly provides a default value if the field is not set. However, ensure that the default value aligns with the expected behavior in the integration environment.- 41-47: There is a typo in the method name
GeTZombieSweeperInterval
; it should beGetZombieSweeperInterval
. Additionally, the method correctly provides a default value if the field is not set. Verify that the default value is appropriate for the integration environment.- func (d *DevConfig) GeTZombieSweeperInterval() time.Duration { + func (d *DevConfig) GetZombieSweeperInterval() time.Duration {funding/manager.go (1)
- 4851-4851: The check for PSBT funding reservations should be documented to explain why these are not pruned.
ef06433
to
2cb21f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! Just one small detail, otherwise looks good to me 💯
funding/manager.go
Outdated
@@ -4809,6 +4828,13 @@ func (f *Manager) handleErrorMsg(peer lnpeer.Peer, msg *lnwire.Error) { | |||
func (f *Manager) pruneZombieReservations() { | |||
zombieReservations := make(pendingChannels) | |||
|
|||
reservationTimeout := f.cfg.ReservationTimeout | |||
if f.cfg.Dev != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also make sure there is an actual value in the dev config?
E.g. if f.cfg.Dev != nil && f.cfg.Dev.ReservationTimeout != 0 {
?
Same with the sweeper interval below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or use the f.cfg.Dev.GetReservationTimeout()
method that always returns a non-zero value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I will restructure to use yy's approach.
// We received the AcceptChannel msg from our peer but we are not going | ||
// to fund this channel but instead wait for our peer to fail the | ||
// funding workflow with an internal error. | ||
ht.ReceiveOpenChannelError(chanUpdates, chanfunding.ErrRemoteCanceled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice itest! I think there's no more cleanup needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I think so, the nodes are shut down when the context of the testcase cancels and if the test fails the CleanUp of the harness shuts them down.
2cb21f0
to
4065171
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! thanks 🙏
This adds an itest for a failed funding flow by our peer.
4065171
to
13e557d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM 🎉
@coderabbitai resolve |
…opening-issue Fix case where Opening Channels get stuck forever.
Fixes #8362 and I also #8251
So basically the
removePendingChannel
channel was not initialized causing ourreservationCoordinator
to get stuck forever. So after one call of thefailFundingFlow
we would not be able to accept or open any new channels.It did not panic because
nil
channels are acutally quite useful in golang to merge 2 channels for more information read:https://medium.com/justforfunc/why-are-there-nil-channels-in-go-9877cc0b2308
I will add an itest to test the successful removal of a failed channel opening.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation