-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Improve channel force close logging #5324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm in contact with the peer node admin and we're comparing logs. It seems our nodes where connected the entire time, well before the "dance" issue, up to the force-close. |
From my peer's logs: (UTC-5)
|
Is this a duplicate of that other instance? Easier to keep track of things if everything is reported to a single issue.
This means the remote peer didn't respond w/ a commitment signature when it needed to do so. We added this to detect stale connections, so something was up w/ their node, exactly what I'm not sure but it was effectively "stuck" for 1 minute (the current default timeout). If the node was an ACINQ node, then they've recently fixed a force close divergence related to a divergence in the way nodes sort HTLCs. |
You mean #5313? Those are/were different channels and issues. The remote node is lnd 0.13.0 rc something, YTMND, |
Your version (v0.12.1) and your peer's version (v0.13.0) would force close on each other in certain cases. Two v0.13.0 nodes should not, if they do please file an issue |
Alright. Could you share some details on this? I guess this shouldn't happen, no matter the versions. |
Just to clarify: I raised this issue so that the log output can be improved. As a user I want (need?) to know the reason for a force-close. In this specific case I still have no clue. |
We had some faulty logic in our channel state machine that would cause a force close when going down and up, that's about all I can say without getting too deep into the inner workings of the state machine. See here: #5231 |
Ok. Assuming something like this might happen again, would it be possible to add some information to the logs? Something like "X happened, should not happen, force-closing channel just to be on the safe side" is better than nothing... |
There isn't a way to know if it's an erroneous force close or not. It could either be a bug in our implementation or a buggy channel peer. So it's not really possible to add useful information to the logs. |
I'm not limiting this to erroneous force closes. |
To clarify: aside from user triggered force-closes (lncli closechannel --force and the like), the reason for the force-close should be communicated to the user so that the underlying issue can be resolved. I don't know the specific situations that might lead to a force-close, but I believe that adding a bit of context would be possible in virtual all of them. Was it triggered by the remote? Did the remote node send a message? Was it a disagreement about something specific? Was it a protocl breach, or something where the spec requires the channel to be force-closed? In the case that lead me to open this issue, the bug fixed in #5231 seemed to be the reason for the force-close. Having some additional information somewhere in the log ("unexpected remote commit chain" or something like that?) might have helped pin this down, and/or identify/fix the underlying issue earlier. |
If cause of force close was #5231 it would have had a log message in the link. Did the channel have active htlcs on it? Usually the cause of force close is pretty clear from the logs so not sure whats going on |
I posted the logs I found above. Maybe 5231 is not the root cause, maybe it is. No matter what, a simple log message "force closing channel X because Y" is all I want, where "Y" should be as specific as possible. |
thats usually the case which is why I want to pinpoint why you don't have a log message |
I have also experienced this same situation recently with very little information given for context of the force closure by my node.
I have been on v0.13.1 (v0.12.1 previously) for a week or so already but am unsure about my peer's version on the other side. |
Same issue. |
My node force-closed a channel to a peer that, according to LN explorer websites, also had other channels force-closed. I think this is some kind of recovery attempt from my peer. However, the logs don't indicate why the force-close was issued. |
Today my node force-closed a channel. The logs don't give ANY information about the reasons. |
It will be really great to have additional logging for the reasons when a force close is initiated. |
Sorry to bug. What is considered normal frequency of spurious force closures? Now they make ~67% of all closures on my node. Just over last week there was 8 of them. Both local/remote randomly. I am noob, but tried to check logs for clues and didnt see the cause. Extremely demoralising & makes operating node at net loss. |
The changes I made for this ticket should help you understand the reasons for the closes. Until then I'm afraid you're a bit out of luck. Usually force-closes indicate that one of the nodes is offline for too long, or that there are connection issues. I suggest you dig through the logs and join the Slack chat. This issue isn't the place to discuss the details. |
Today I had a force closure with Diamond Hand. In my lnd.log I could not find any useful information
The force close transaction had one outgoing HTLC. This would have timed out in ~50 blocks so not sure why force close was required.
|
This PR fixes a known class of "unable to complete dance" errors: #6518 |
Background
My node force-closed a channel and the logs don't help me understand why this happened. I'd like this issue to be resolved by updating the log statements to contain the necessary information. In addition to this, I'd appreciate an explanation.
Your environment
Expected behaviour
The logs tell me why the channel was force-closed.
Actual behaviour
Note that the force-close transaction has two outputs, one already swept by my peer, the other timelocked for me. This makes me believe that my peer was online (but not connected to my node?).
(BST)
Related questions:
The text was updated successfully, but these errors were encountered: