Skip to content

Improve channel force close logging #5324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
C-Otto opened this issue May 22, 2021 · 24 comments
Open

Improve channel force close logging #5324

C-Otto opened this issue May 22, 2021 · 24 comments
Labels
beginner Issues suitable for new developers channels help wanted logging Related to the logging / debug output functionality

Comments

@C-Otto
Copy link
Contributor

C-Otto commented May 22, 2021

Background

My node force-closed a channel and the logs don't help me understand why this happened. I'd like this issue to be resolved by updating the log statements to contain the necessary information. In addition to this, I'd appreciate an explanation.

Your environment

  • lnd 0.12.1
  • Linux server 4.19.0-14-amd64 Fix name typo in README #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux
  • bitcoind 0.21.0 (bitcoin core)

Expected behaviour

The logs tell me why the channel was force-closed.

Actual behaviour

Note that the force-close transaction has two outputs, one already swept by my peer, the other timelocked for me. This makes me believe that my peer was online (but not connected to my node?).

  1. HTLCs, failing with "unable to complete dance error: remote unresponsive"
  2. disabling channel
  3. Updating link policy?!
  4. GossipSyncer is active with peer?!
  5. Force-close

(BST)

2021-05-20 03:16:15.988 [DBG] HSWC: ChannelLink(__SHORT_CHAN_ID_UNRELATED_NODE1__): forwarding 1 packets to switch
2021-05-20 03:16:16.076 [DBG] HSWC: ChannelLink(__SHORT_CHAN_ID__): queueing keystone of ADD open circuit: (Chan ID=__SHORT_CHAN_ID_UNRELATED_NODE1__, HTLC ID=1835)->(Chan ID=__SHORT_CHAN_ID__, HTLC ID=36)
2021-05-20 03:16:16.437 [DBG] HSWC: ChannelLink(__SHORT_CHAN_ID__): removing Add packet (Chan ID=__SHORT_CHAN_ID_UNRELATED_NODE1__, HTLC ID=1835) from mailbox
--
2021-05-20 03:33:01.864 [DBG] HSWC: ChannelLink(__SHORT_CHAN_ID_UNRELATED_NODE2__): forwarding 1 packets to switch
2021-05-20 03:33:01.952 [DBG] HSWC: ChannelLink(__SHORT_CHAN_ID__): queueing keystone of ADD open circuit: (Chan ID=__SHORT_CHAN_ID_UNRELATED_NODE2__, HTLC ID=542)->(Chan ID=__SHORT_CHAN_ID__, HTLC ID=37)
2021-05-20 03:34:02.080 [ERR] HSWC: ChannelLink(__SHORT_CHAN_ID__): failing link: unable to complete dance with error: remote unresponsive
2021-05-20 03:34:02.081 [INF] HSWC: ChannelLink(__SHORT_CHAN_ID__): exited
2021-05-20 03:34:02.082 [INF] HSWC: Removing channel link with ChannelID(__CHANNEL_ID__)
2021-05-20 03:34:02.084 [INF] HSWC: ChannelLink(__SHORT_CHAN_ID__): stopping
2021-05-20 03:34:02.086 [DBG] HSWC: ChannelLink(__SHORT_CHAN_ID_UNRELATED_NODE2__): queueing removal of FAIL closed circuit: (Chan ID=__SHORT_CHAN_ID_UNRELATED_NODE2__, HTLC ID=542)->(Chan ID=0:0:0, HTLC ID=0)
2021-05-20 03:34:02.320 [DBG] HSWC: ChannelLink(__SHORT_CHAN_ID_UNRELATED_NODE2__): removing Fail/Settle packet (Chan ID=__SHORT_CHAN_ID_UNRELATED_NODE2__, HTLC ID=542) from mailbox
--
2021-05-20 03:54:48.643 [INF] NANN: Announcing channel(__CHAN_POINT__) disabled [detected]
---
2021-05-20 15:04:00.704 [DBG] HSWC: Unable to find ChannelPoint(__CHAN_POINT__) to update link policy
---
2021-05-20 15:46:09.083 [INF] DISC: GossipSyncer(__REMOTE_PUBKEY__): applying gossipFilter(start=2021-05-20 15:46:09.083263967 +0100 BST m=+1810838.985025483, end=2157-06-26 21:14:24.083263967 +0000 GMT)
---
2021-05-20 15:52:06.400 [DBG] HSWC: Unable to find ChannelPoint(__CHAN_POINT__) to update link policy
---
2021-05-22 14:03:37.887 [INF] CRTR: Pruning channel graph using block 00000000000000000001cec503af2292875dc44437f4875f877cb498e7e4d675 (height=684537)
2021-05-22 14:03:38.091 [INF] CRTR: Block 00000000000000000001cec503af2292875dc44437f4875f877cb498e7e4d675 (height=684537) closed 1 channels
2021-05-22 14:03:38.971 [INF] NTFN: New block: height=684537, sha=00000000000000000001cec503af2292875dc44437f4875f877cb498e7e4d675
2021-05-22 14:03:38.972 [INF] UTXN: Attempting to graduate height=684537: num_kids=0, num_babies=0
2021-05-22 14:03:38.973 [ERR] LTND: Unable to lookup witness: no witnesses
2021-05-22 14:03:39.060 [INF] CNCT: ChannelArbitrator(__CHAN_POINT__): force closing chan
2021-05-22 14:03:39.172 [INF] HSWC: Removing channel link with ChannelID(__CHANNEL_ID__)
2021-05-22 14:03:39.308 [INF] CNCT: Broadcasting force close transaction [...]
2021-05-22 14:03:39.315 [INF] LNWL: Inserting unconfirmed transaction __CLOSE_TX_HASH__
2021-05-22 14:03:39.598 [INF] CNCT: ChannelArbitrator(__CHAN_POINT__): trigger chainTrigger moving from state StateCommitmentBroadcasted to StateCommitmentBroadcasted

Related questions:

  • Why didn't lnd reconnect to the peer?
  • Why do I see policy/gossip messages for that peer, after the error?
@C-Otto
Copy link
Contributor Author

C-Otto commented May 22, 2021

I'm in contact with the peer node admin and we're comparing logs. It seems our nodes where connected the entire time, well before the "dance" issue, up to the force-close.

@C-Otto
Copy link
Contributor Author

C-Otto commented May 22, 2021

From my peer's logs:

(UTC-5)

2021-05-19 21:39:12.988 [ERR] HSWC: ChannelLink(__SHORT_CHAN_ID__): failing link: unable to complete dance with error: remote unresponsive
2021-05-19 21:39:12.988 [INF] HSWC: ChannelLink(__SHORT_CHAN_ID__): exited
2021-05-19 21:39:12.988 [INF] HSWC: Removing channel link with ChannelID(__CHAN_ID__)
2021-05-19 21:39:12.988 [INF] HSWC: ChannelLink(__SHORT_CHAN_ID__): stopping

@Roasbeef
Copy link
Member

Is this a duplicate of that other instance? Easier to keep track of things if everything is reported to a single issue.

2021-05-19 21:39:12.988 [ERR] HSWC: ChannelLink(SHORT_CHAN_ID): failing link: unable to complete dance with error: remote unresponsive

This means the remote peer didn't respond w/ a commitment signature when it needed to do so. We added this to detect stale connections, so something was up w/ their node, exactly what I'm not sure but it was effectively "stuck" for 1 minute (the current default timeout). If the node was an ACINQ node, then they've recently fixed a force close divergence related to a divergence in the way nodes sort HTLCs.

@Roasbeef Roasbeef added commitments Commitment transactions containing the state of the channel needs investigation node-management labels May 24, 2021
@C-Otto
Copy link
Contributor Author

C-Otto commented May 24, 2021

You mean #5313? Those are/were different channels and issues.

The remote node is lnd 0.13.0 rc something, YTMND, @Max in slack

@Crypt-iQ
Copy link
Collaborator

Crypt-iQ commented Jun 7, 2021

Your version (v0.12.1) and your peer's version (v0.13.0) would force close on each other in certain cases. Two v0.13.0 nodes should not, if they do please file an issue

@C-Otto
Copy link
Contributor Author

C-Otto commented Jun 7, 2021

Alright. Could you share some details on this? I guess this shouldn't happen, no matter the versions.

@C-Otto
Copy link
Contributor Author

C-Otto commented Jun 7, 2021

Just to clarify: I raised this issue so that the log output can be improved. As a user I want (need?) to know the reason for a force-close. In this specific case I still have no clue.

@Crypt-iQ
Copy link
Collaborator

Crypt-iQ commented Jun 7, 2021

Alright. Could you share some details on this? I guess this shouldn't happen, no matter the versions.

We had some faulty logic in our channel state machine that would cause a force close when going down and up, that's about all I can say without getting too deep into the inner workings of the state machine. See here: #5231

@C-Otto
Copy link
Contributor Author

C-Otto commented Jun 7, 2021

Ok. Assuming something like this might happen again, would it be possible to add some information to the logs? Something like "X happened, should not happen, force-closing channel just to be on the safe side" is better than nothing...

@Crypt-iQ Crypt-iQ added channels logging Related to the logging / debug output functionality and removed commitments Commitment transactions containing the state of the channel needs investigation node-management labels Jun 7, 2021
@wpaulino wpaulino changed the title Force-closed channel, I don't know why Improve channel force close logging Jun 8, 2021
@wpaulino wpaulino added beginner Issues suitable for new developers help wanted labels Jun 8, 2021
@Crypt-iQ
Copy link
Collaborator

There isn't a way to know if it's an erroneous force close or not. It could either be a bug in our implementation or a buggy channel peer. So it's not really possible to add useful information to the logs.

@C-Otto
Copy link
Contributor Author

C-Otto commented Jun 23, 2021

I'm not limiting this to erroneous force closes.

@C-Otto
Copy link
Contributor Author

C-Otto commented Jun 24, 2021

To clarify: aside from user triggered force-closes (lncli closechannel --force and the like), the reason for the force-close should be communicated to the user so that the underlying issue can be resolved. I don't know the specific situations that might lead to a force-close, but I believe that adding a bit of context would be possible in virtual all of them. Was it triggered by the remote? Did the remote node send a message? Was it a disagreement about something specific? Was it a protocl breach, or something where the spec requires the channel to be force-closed?

In the case that lead me to open this issue, the bug fixed in #5231 seemed to be the reason for the force-close. Having some additional information somewhere in the log ("unexpected remote commit chain" or something like that?) might have helped pin this down, and/or identify/fix the underlying issue earlier.

@Crypt-iQ
Copy link
Collaborator

If cause of force close was #5231 it would have had a log message in the link. Did the channel have active htlcs on it? Usually the cause of force close is pretty clear from the logs so not sure whats going on

@C-Otto
Copy link
Contributor Author

C-Otto commented Jun 24, 2021

I posted the logs I found above. Maybe 5231 is not the root cause, maybe it is. No matter what, a simple log message "force closing channel X because Y" is all I want, where "Y" should be as specific as possible.

@Crypt-iQ
Copy link
Collaborator

thats usually the case which is why I want to pinpoint why you don't have a log message

@cryptosharks131
Copy link
Contributor

cryptosharks131 commented Aug 19, 2021

I have also experienced this same situation recently with very little information given for context of the force closure by my node.

[ERR] LTND: Unable to lookup witness: no witnesses
[INF] CNCT: ChannelArbitrator(xxxxxxxxxxxx): force closing chan
[INF] HSWC: Removing channel link with ChannelID(xxxxxxxxxxxx)
[INF] CNCT: Broadcasting force close transaction...........

I have been on v0.13.1 (v0.12.1 previously) for a week or so already but am unsure about my peer's version on the other side.

@augustresende
Copy link

Same issue.

@C-Otto
Copy link
Contributor Author

C-Otto commented Nov 29, 2021

My node force-closed a channel to a peer that, according to LN explorer websites, also had other channels force-closed. I think this is some kind of recovery attempt from my peer. However, the logs don't indicate why the force-close was issued.

This was referenced Dec 31, 2021
@C-Otto
Copy link
Contributor Author

C-Otto commented Dec 31, 2021

Today my node force-closed a channel. The logs don't give ANY information about the reasons.

@BhaagBoseDK
Copy link
Contributor

It will be really great to have additional logging for the reasons when a force close is initiated.

@babikpatient6
Copy link

Sorry to bug. What is considered normal frequency of spurious force closures? Now they make ~67% of all closures on my node. Just over last week there was 8 of them. Both local/remote randomly. I am noob, but tried to check logs for clues and didnt see the cause. Extremely demoralising & makes operating node at net loss.

@C-Otto
Copy link
Contributor Author

C-Otto commented Jan 23, 2022

The changes I made for this ticket should help you understand the reasons for the closes. Until then I'm afraid you're a bit out of luck. Usually force-closes indicate that one of the nodes is offline for too long, or that there are connection issues. I suggest you dig through the logs and join the Slack chat. This issue isn't the place to discuss the details.

@BhaagBoseDK
Copy link
Contributor

BhaagBoseDK commented Apr 25, 2022

Today I had a force closure with Diamond Hand. In my lnd.log I could not find any useful information

lnd.log.267-2022-04-25 02:39:46.691 [INF] UTXN: Attempting to graduate height=733427: num_kids=0, num_babies=0 lnd.log.267-2022-04-25 02:39:46.693 [ERR] LTND: Unable to lookup witness: no witnesses lnd.log.267-2022-04-25 02:39:46.719 [INF] HSWC: Garbage collected 29 shared secret hashes at height=733427 lnd.log.267:2022-04-25 02:39:46.727 [INF] CNCT: ChannelArbitrator(786945c839f4818d9fff0e0ff189953368c15bd81ce158b732652afdaeef343d:1): force closing chan lnd.log.267-2022-04-25 02:39:46.748 [INF] HSWC: Removing channel link with ChannelID(3d34efaefd2a6532b758e11cd85bc168339589f10f0eff9f8d81f439c8456979) lnd.log.267:2022-04-25 02:39:46.748 [INF] HSWC: ChannelLink(786945c839f4818d9fff0e0ff189953368c15bd81ce158b732652afdaeef343d:1): stopping lnd.log.267:2022-04-25 02:39:46.749 [INF] HSWC: ChannelLink(786945c839f4818d9fff0e0ff189953368c15bd81ce158b732652afdaeef343d:1): exited

The force close transaction had one outgoing HTLC. This would have timed out in ~50 blocks so not sure why force close was required.

lnd.log.267:2022-04-25 02:39:46.813 [INF] CNCT: Broadcasting force close transaction ac4e47b80906f69b1cb8d637128401fc3b4bae757d82f94d7b2c153f5beb1f4e, ChannelPoint(786945c839f4818d9fff0e0ff189953368c15bd81ce158b732652afdaeef343d:1): (*wire.MsgTx)(0x400cbb7bc0)({
lnd.log.267- Version: (int32) 2,
lnd.log.267- TxIn: ([]*wire.TxIn) (len=1 cap=1) {
lnd.log.267- (wire.TxIn)(0x4016c831a0)({
lnd.log.267: PreviousOutPoint: (wire.OutPoint) 786945c839f4818d9fff0e0ff189953368c15bd81ce158b732652afdaeef343d:1,
lnd.log.267- SignatureScript: ([]uint8) ,
lnd.log.267- Witness: (wire.TxWitness) (len=4 cap=4) {
lnd.log.267- ([]uint8) ,
lnd.log.267- ([]uint8) (len=71 cap=144) {
lnd.log.267- 00000000 30 44 02 20 17 0a 19 4c 96 04 31 d0 c4 b6 62 e7 |0D. ...L..1...b.|
lnd.log.267- 00000010 51 db dc 79 43 0c 9c 83 83 b4 44 5f f0 4c d0 da |Q..yC.....D_.L..|
lnd.log.267- 00000020 b9 08 cc 70 02 20 6e ee 9b 90 7e d0 ce e0 a6 43 |...p. n...~....C|
lnd.log.267- 00000030 cd 18 8e 10 3e 72 84 df e4 63 35 51 6e c3 a3 2a |....>r...c5Qn..
|
lnd.log.267- 00000040 4b 9f e7 60 8c b2 01 |K..`...|
lnd.log.267- },
lnd.log.267- ([]uint8) (len=72 cap=144) {
lnd.log.267- 00000000 30 45 02 21 00 b0 34 dd 7b ef 08 27 7b 9b 53 70 |0E.!..4.{..'{.Sp|
lnd.log.267- 00000010 20 43 9c a2 14 2e e1 9e ed 17 c4 52 c5 de 2c e0 | C.........R..,.|
lnd.log.267- 00000020 6c 85 fe 9a 46 02 20 6d ac 86 1d 88 86 6d 2f f9 |l...F. m.....m/.|
lnd.log.267- 00000030 31 43 a4 4f 1b f3 7c 52 fc 35 17 74 fb 98 69 33 |1C.O..|R.5.t..i3|
lnd.log.267- 00000040 c4 9c 9a 8b 72 92 15 01 |....r...|
lnd.log.267- },
lnd.log.267- ([]uint8) (len=71 cap=500) {
lnd.log.267- 00000000 52 21 03 c2 03 fc 9a 24 6d 43 0c c7 8f 91 ec f5 |R!.....$mC......|
lnd.log.267- 00000010 83 7b cd 36 17 83 c9 ef fe d0 2e 2f 35 ab af c6 |.{.6......./5...|
lnd.log.267- 00000020 ac 88 fc 21 03 e1 53 ae 01 53 a7 62 68 d8 a0 4e |...!..S..S.bh..N|
lnd.log.267- 00000030 0d 6f 07 94 72 9d 26 53 33 75 c5 af 70 44 f4 e9 |.o..r.&S3u..pD..|
lnd.log.267- 00000040 9a bc e3 d0 2b 52 ae |....+R.|
lnd.log.267- }
lnd.log.267- },
lnd.log.267- Sequence: (uint32) 2157310309
lnd.log.267- })
lnd.log.267- },
lnd.log.267- TxOut: ([]*wire.TxOut) (len=5 cap=5) {
lnd.log.267- (wire.TxOut)(0x400be6cf40)({
lnd.log.267- Value: (int64) 330,
lnd.log.267- PkScript: ([]uint8) (len=34 cap=34) {
lnd.log.267- 00000000 00 20 25 14 8e 4a b4 62 f3 42 d4 12 d1 84 2e 2a |. %..J.b.B.....
|
lnd.log.267- 00000010 66 1b 5a 58 c0 bc d4 9c c3 10 0d 85 ec a3 38 e1 |f.ZX..........8.|
lnd.log.267- 00000020 5e 0a |^.|
lnd.log.267- }
lnd.log.267- }),
lnd.log.267- (*wire.TxOut)(0x400be6cfc0)({
lnd.log.267- Value: (int64) 330,
lnd.log.267- PkScript: ([]uint8) (len=34 cap=34) {
lnd.log.267- 00000000 00 20 5f 8c b3 da 52 cf f2 cb 3f c5 0c c5 6d 02 |. _...R...?...m.|
lnd.log.267- 00000010 af f1 10 28 47 73 e4 21 f2 ed 0f 7a bb 4a bf c2 |...(Gs.!...z.J..|
lnd.log.267- 00000020 5e 12 |^.|
lnd.log.267- }
lnd.log.267- }),
lnd.log.267- (*wire.TxOut)(0x400be6cfe0)({
lnd.log.267- Value: (int64) 8177,
lnd.log.267- PkScript: ([]uint8) (len=34 cap=34) {
lnd.log.267- 00000000 00 20 a9 15 ad 5d 99 e4 31 b5 69 0d 9c 1f ce 52 |. ...]..1.i....R|
lnd.log.267- 00000010 01 6b 1b 13 07 ce 1d e1 f0 86 6f d0 cd bf c2 1e |.k........o.....|
lnd.log.267- 00000020 8e 3d |.=|
lnd.log.267- }
lnd.log.267- }),
lnd.log.267- (*wire.TxOut)(0x400be6d000)({
lnd.log.267- Value: (int64) 715180,
lnd.log.267- PkScript: ([]uint8) (len=34 cap=34) {
lnd.log.267- 00000000 00 20 3c a4 49 4e d1 cd 41 d3 63 02 41 83 41 8b |. <.IN..A.c.A.A.|
lnd.log.267- 00000010 a4 af c8 97 6b 23 96 c4 d1 64 94 17 0d b9 5d a1 |....k#...d....].|
lnd.log.267- 00000020 05 b7 |..|
lnd.log.267- }
lnd.log.267- }),
lnd.log.267- (*wire.TxOut)(0x400be6d020)({
lnd.log.267- Value: (int64) 4744777,
lnd.log.267- PkScript: ([]uint8) (len=34 cap=34) {
lnd.log.267- 00000000 00 20 40 6d a3 35 ab b4 05 e3 bd 7d cd 15 62 a6 |. @m.5.....}..b.|
lnd.log.267- 00000010 43 64 7d ad 81 58 0f e3 c7 7e 39 89 28 1d 08 a6 |Cd}..X...~9.(...|
lnd.log.267- 00000020 9e fa |..|
lnd.log.267- }
lnd.log.267- })
lnd.log.267- },
lnd.log.267- LockTime: (uint32) 553439807
lnd.log.267-})

@Roasbeef
Copy link
Member

This PR fixes a known class of "unable to complete dance" errors: #6518

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginner Issues suitable for new developers channels help wanted logging Related to the logging / debug output functionality
Projects
None yet
Development

No branches or pull requests

8 participants