Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify and log information that might help peers debugging force-close issues #6363

Open
C-Otto opened this issue Mar 24, 2022 · 10 comments
Open
Labels
brainstorming Long term ideas/discussion/requests for feedback chain handling force closes logging Related to the logging / debug output functionality p2p Code related to the peer-to-peer behaviour

Comments

@C-Otto
Copy link
Contributor

C-Otto commented Mar 24, 2022

Background

My node just force-closed a channel due to a HTLC that timed out. I'd love to figure out what exactly caused my peer to not fail the HTLC. In order to do this, I'd like to help my peer dig through the logs and find the issue.

What kind of information would I need to send to a peer running lnd? c-lightning? other implementations? Where do I get this information from?

Your environment

  • version of lnd: 0.14.1-beta
@C-Otto
Copy link
Contributor Author

C-Otto commented Mar 24, 2022

This might help with ElementsProject/lightning#4649

@C-Otto C-Otto changed the title Identify and log information that might help debugging force-close issues Identify and log information that might help peers debugging force-close issues Mar 24, 2022
@Roasbeef
Copy link
Member

My node just force-closed a channel due to a HTLC that timed out. I'd love to figure out what exactly caused my peer to not fail the HTLC.

So if your node was the one that force closed, then the issue was actually downstream. Like if you had a channel with Bob, sent an HTLC, then either they or someone else downstream blew up, you'd eventually force close.

@C-Otto
Copy link
Contributor Author

C-Otto commented Mar 24, 2022

Yes. I'm pretty sure my peer (next hop) didn't blow up. This issue is about providing helpful information so that my peer can figure out what went wrong on their end. I would like lnd to provide users (including myself) some useful information that can be handed to the peer at the other end of the force-closed channel, so that said peer has enough information to investigate the reasons of the force close.

My motivation is that there are too many force-closes that shouldn't happen, like the one mentioned in the c-lightning issue above. I don't mind investing time into looking into each and every force-close on my node, but I also see that collaborating with peers is tricky. I don't know what they need to know, and possibly helpful information like the HTLC ID and creation time are hidden from me.

@Roasbeef
Copy link
Member

My motivation is that there are too many force-closes that shouldn't happen,

Agreed, I think one way to start to shuffle around more debug information would be using the warning message to pass back a notification that you or your peer had to go on chain. There's some overlap here with the old "unwrap the onion" idea, but in this case we just want some basic timing information w.r.t what actually happened (if there's an identifiable reason other than a node going offline for a period of time).

Re force closes, one thing that would be interesting to tease out is: the the force closes happening in only one area or the route, or are they actually cascading? You'd expect them to just happen in one area, with the others cancelling HTLCs back in most cases. However a cascade event would indicate some implementation level bug.

On our end, we recently fixed an issue where some messages would be lost internally when we go on chain, leading to the incoming channel not immediately cancelling the HTLC: #6250.

@Roasbeef Roasbeef added brainstorming Long term ideas/discussion/requests for feedback p2p Code related to the peer-to-peer behaviour logging Related to the logging / debug output functionality chain handling force closes labels Mar 24, 2022
@C-Otto
Copy link
Contributor Author

C-Otto commented Mar 28, 2022

One idea: add a reason/description entry for channels that are (force-)closed, but also in state waiting close, or pending. I'm aware that things might change before the channel is fully closed (breach/punishment being the prime example), which is why this information might need to be updated accordingly.

User story:
As a node admin with access to the information obtained from lncli pendingchannels and lncli closedchannels I'd like to see why exactly a channel is closed/closing, so that I can investigate further. The returned information should contain a field "close reason" (or similar), that might have values "initiated by user", "remote closed on-chain in tx XXX", "outgoing HTLC XXX timed out", "breach detected in tx XXX". The reason should include additional information (XXX) that I can use to investigate further, possibly together with my peer, or to help developers identify bugs that lead to the force-close. For HTLC details this should include information about the ID seen by my peer, the date of when the HTLC was initiated, information about the timeout (block/delta), and possibly also the amount.

As a node admin I need to have access to this information even while the close is still pending, as waiting (up to 14 days?) for everything to be resolved wastes time that could be spent investigating.

@BhaagBoseDK
Copy link
Contributor

we definitely need better logging when a channel is force closed
Local unilateral close of ChannelPoint
does not tell anything of value or further investigation.

@C-Otto
Copy link
Contributor Author

C-Otto commented Apr 5, 2022

@BhaagBoseDK see #5324. In here (6363) I'd like to concentrate on information that can be used collaborate with a peer, i.e. this is more than just having better debug logs.

@fiatjaf
Copy link
Contributor

fiatjaf commented Apr 8, 2022

Should channels really be force-closed in case of trimmed HTLC timeouts?

@Roasbeef
Copy link
Member

Roasbeef commented Jun 6, 2022

Suggestion for the output here: #6615 (comment)

@Roasbeef
Copy link
Member

Roasbeef commented Jun 6, 2022

Should channels really be force-closed in case of trimmed HTLC timeouts?

IMO no, see this issue: #1226

An ideal set up IMO would be: always be 100% economically rational and only go to chain if the fee revenue for going on-chain exceeds the fee revenue plus the HTLC amount. This would also mean mostly not caring about dust HTLCs, since usually you pay more in chain fees to force close then sweep than the amount plus the fees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
brainstorming Long term ideas/discussion/requests for feedback chain handling force closes logging Related to the logging / debug output functionality p2p Code related to the peer-to-peer behaviour
Projects
None yet
Development

No branches or pull requests

4 participants