Identify and log information that might help peers debugging force-close issues #6363

C-Otto · 2022-03-24T14:42:41Z

Background

My node just force-closed a channel due to a HTLC that timed out. I'd love to figure out what exactly caused my peer to not fail the HTLC. In order to do this, I'd like to help my peer dig through the logs and find the issue.

What kind of information would I need to send to a peer running lnd? c-lightning? other implementations? Where do I get this information from?

Your environment

version of lnd: 0.14.1-beta

The text was updated successfully, but these errors were encountered:

C-Otto · 2022-03-24T14:43:08Z

This might help with ElementsProject/lightning#4649

Roasbeef · 2022-03-24T17:53:45Z

My node just force-closed a channel due to a HTLC that timed out. I'd love to figure out what exactly caused my peer to not fail the HTLC.

So if your node was the one that force closed, then the issue was actually downstream. Like if you had a channel with Bob, sent an HTLC, then either they or someone else downstream blew up, you'd eventually force close.

C-Otto · 2022-03-24T19:35:03Z

Yes. I'm pretty sure my peer (next hop) didn't blow up. This issue is about providing helpful information so that my peer can figure out what went wrong on their end. I would like lnd to provide users (including myself) some useful information that can be handed to the peer at the other end of the force-closed channel, so that said peer has enough information to investigate the reasons of the force close.

My motivation is that there are too many force-closes that shouldn't happen, like the one mentioned in the c-lightning issue above. I don't mind investing time into looking into each and every force-close on my node, but I also see that collaborating with peers is tricky. I don't know what they need to know, and possibly helpful information like the HTLC ID and creation time are hidden from me.

Roasbeef · 2022-03-24T23:02:27Z

My motivation is that there are too many force-closes that shouldn't happen,

Agreed, I think one way to start to shuffle around more debug information would be using the warning message to pass back a notification that you or your peer had to go on chain. There's some overlap here with the old "unwrap the onion" idea, but in this case we just want some basic timing information w.r.t what actually happened (if there's an identifiable reason other than a node going offline for a period of time).

Re force closes, one thing that would be interesting to tease out is: the the force closes happening in only one area or the route, or are they actually cascading? You'd expect them to just happen in one area, with the others cancelling HTLCs back in most cases. However a cascade event would indicate some implementation level bug.

On our end, we recently fixed an issue where some messages would be lost internally when we go on chain, leading to the incoming channel not immediately cancelling the HTLC: #6250.

C-Otto · 2022-03-28T09:04:51Z

One idea: add a reason/description entry for channels that are (force-)closed, but also in state waiting close, or pending. I'm aware that things might change before the channel is fully closed (breach/punishment being the prime example), which is why this information might need to be updated accordingly.

User story:
As a node admin with access to the information obtained from lncli pendingchannels and lncli closedchannels I'd like to see why exactly a channel is closed/closing, so that I can investigate further. The returned information should contain a field "close reason" (or similar), that might have values "initiated by user", "remote closed on-chain in tx XXX", "outgoing HTLC XXX timed out", "breach detected in tx XXX". The reason should include additional information (XXX) that I can use to investigate further, possibly together with my peer, or to help developers identify bugs that lead to the force-close. For HTLC details this should include information about the ID seen by my peer, the date of when the HTLC was initiated, information about the timeout (block/delta), and possibly also the amount.

As a node admin I need to have access to this information even while the close is still pending, as waiting (up to 14 days?) for everything to be resolved wastes time that could be spent investigating.

BhaagBoseDK · 2022-04-05T10:47:49Z

we definitely need better logging when a channel is force closed
Local unilateral close of ChannelPoint
does not tell anything of value or further investigation.

C-Otto · 2022-04-05T10:50:00Z

@BhaagBoseDK see #5324. In here (6363) I'd like to concentrate on information that can be used collaborate with a peer, i.e. this is more than just having better debug logs.

fiatjaf · 2022-04-08T23:13:05Z

Should channels really be force-closed in case of trimmed HTLC timeouts?

Roasbeef · 2022-06-06T17:53:27Z

Suggestion for the output here: #6615 (comment)

Roasbeef · 2022-06-06T17:56:45Z

Should channels really be force-closed in case of trimmed HTLC timeouts?

IMO no, see this issue: #1226

An ideal set up IMO would be: always be 100% economically rational and only go to chain if the fee revenue for going on-chain exceeds the fee revenue plus the HTLC amount. This would also mean mostly not caring about dust HTLCs, since usually you pay more in chain fees to force close then sweep than the amount plus the fees.

C-Otto changed the title ~~Identify and log information that might help debugging force-close issues~~ Identify and log information that might help peers debugging force-close issues Mar 24, 2022

Roasbeef added brainstorming Long term ideas/discussion/requests for feedback p2p Code related to the peer-to-peer behaviour logging Related to the logging / debug output functionality chain handling force closes labels Mar 24, 2022

BhaagBoseDK mentioned this issue Apr 27, 2022

Outgoing Channel force closed without any HTLC on channel. #6467

Closed

C-Otto mentioned this issue Jun 6, 2022

[Feature] Add reason to force closes #6615

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identify and log information that might help peers debugging force-close issues #6363

Identify and log information that might help peers debugging force-close issues #6363

C-Otto commented Mar 24, 2022

C-Otto commented Mar 24, 2022

Roasbeef commented Mar 24, 2022

C-Otto commented Mar 24, 2022 •

edited

Loading

Roasbeef commented Mar 24, 2022

C-Otto commented Mar 28, 2022 •

edited

Loading

BhaagBoseDK commented Apr 5, 2022

C-Otto commented Apr 5, 2022 •

edited

Loading

fiatjaf commented Apr 8, 2022

Roasbeef commented Jun 6, 2022

Roasbeef commented Jun 6, 2022

Identify and log information that might help peers debugging force-close issues #6363

Identify and log information that might help peers debugging force-close issues #6363

Comments

C-Otto commented Mar 24, 2022

Background

Your environment

C-Otto commented Mar 24, 2022

Roasbeef commented Mar 24, 2022

C-Otto commented Mar 24, 2022 • edited Loading

Roasbeef commented Mar 24, 2022

C-Otto commented Mar 28, 2022 • edited Loading

BhaagBoseDK commented Apr 5, 2022

C-Otto commented Apr 5, 2022 • edited Loading

fiatjaf commented Apr 8, 2022

Roasbeef commented Jun 6, 2022

Roasbeef commented Jun 6, 2022

C-Otto commented Mar 24, 2022 •

edited

Loading

C-Otto commented Mar 28, 2022 •

edited

Loading

C-Otto commented Apr 5, 2022 •

edited

Loading