-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't Always Broadcast latest state on startup if we'd previously closed-without-broadcast #1563
Comments
When we receive a `channel_reestablish` with a `data_loss_protect` that proves we're running with a stale state, instead of force-closing the channel, we immediately panic. This lines up with our refusal to run if we find a `ChannelMonitor` which is stale compared to our `ChannelManager` during `ChannelManager` deserialization. Ultimately both are an indication of the same thing - that the API requirements on `chain::Watch` were violated. In the "running with outdated state but ChannelMonitor(s) and ChannelManager lined up" case specifically its likely we're running off of an old backup, in which case connecting to peers with channels still live is explicitly dangerous. That said, because this could be an operator error that is correctable, panicing instead of force-closing may allow for normal operation again in the future (cc lightningdevkit#1207). In any case, we provide instructions in the panic message for how to force-close channels prior to peer connection, as well as a note on how to broadcast the latest state if users are willing to take the risk. Note that this is still somewhat unsafe until we resolve lightningdevkit#1563.
When we receive a `channel_reestablish` with a `data_loss_protect` that proves we're running with a stale state, instead of force-closing the channel, we immediately panic. This lines up with our refusal to run if we find a `ChannelMonitor` which is stale compared to our `ChannelManager` during `ChannelManager` deserialization. Ultimately both are an indication of the same thing - that the API requirements on `chain::Watch` were violated. In the "running with outdated state but ChannelMonitor(s) and ChannelManager lined up" case specifically its likely we're running off of an old backup, in which case connecting to peers with channels still live is explicitly dangerous. That said, because this could be an operator error that is correctable, panicing instead of force-closing may allow for normal operation again in the future (cc lightningdevkit#1207). In any case, we provide instructions in the panic message for how to force-close channels prior to peer connection, as well as a note on how to broadcast the latest state if users are willing to take the risk. Note that this is still somewhat unsafe until we resolve lightningdevkit#1563.
When we receive a `channel_reestablish` with a `data_loss_protect` that proves we're running with a stale state, instead of force-closing the channel, we immediately panic. This lines up with our refusal to run if we find a `ChannelMonitor` which is stale compared to our `ChannelManager` during `ChannelManager` deserialization. Ultimately both are an indication of the same thing - that the API requirements on `chain::Watch` were violated. In the "running with outdated state but ChannelMonitor(s) and ChannelManager lined up" case specifically its likely we're running off of an old backup, in which case connecting to peers with channels still live is explicitly dangerous. That said, because this could be an operator error that is correctable, panicing instead of force-closing may allow for normal operation again in the future (cc lightningdevkit#1207). In any case, we provide instructions in the panic message for how to force-close channels prior to peer connection, as well as a note on how to broadcast the latest state if users are willing to take the risk. Note that this is still somewhat unsafe until we resolve lightningdevkit#1563.
I want to take this as my first issue. When you say "On startup the ChannelManager will broadcast the latest state transaction(s) for any ChannelMonitor it has for which it has no Channel" I assume you mean channelmanager.rs:6823-6828 where it checks for each channel monitor if it's funding transaction is not in the funding transactions set, right? Then if I understand this correctly, when de-serializing the persisted state, every
Some questions I have:
|
When we receive a `channel_reestablish` with a `data_loss_protect` that proves we're running with a stale state, instead of force-closing the channel, we immediately panic. This lines up with our refusal to run if we find a `ChannelMonitor` which is stale compared to our `ChannelManager` during `ChannelManager` deserialization. Ultimately both are an indication of the same thing - that the API requirements on `chain::Watch` were violated. In the "running with outdated state but ChannelMonitor(s) and ChannelManager lined up" case specifically its likely we're running off of an old backup, in which case connecting to peers with channels still live is explicitly dangerous. That said, because this could be an operator error that is correctable, panicing instead of force-closing may allow for normal operation again in the future (cc lightningdevkit#1207). In any case, we provide instructions in the panic message for how to force-close channels prior to peer connection, as well as a note on how to broadcast the latest state if users are willing to take the risk. Note that this is still somewhat unsafe until we resolve lightningdevkit#1563.
Heh, 1564 didn't resolve this, it just referenced it... |
Yes, your understanding is correct, as for your questions:
Probably the second, its still important that users be able to broadcast the latest state if they want to, so we at least need an option to broadcast the state, though it also wouldn't be unreasonable to split it into two methods -
I think in that case, yes, the flag should stop us (though of course generally the flag is not set).
No, once its unsafe its unsafe. Unsafe here means we've lost some data, there's no way to recover that data. |
Great. I'll get to work into the PR then. Thanks! |
I'm having a hard time trying to test the code. I'll keep working on it but I'm posting my progress here in case anyone wants to throw help/comments/ideas. My idea would be to cover this new behavior with a simple unit test that:
My plan was to copy and adapt some other other unit test for
|
I left a small pointer on your PR and let's continue there but in general its probably simplest and most-coverage to add a new functional test based on an existing one. |
On startup the ChannelManager will broadcast the latest state transaction(s) for any ChannelMonitor it has for which it has no Channel. This is all fine and great except if we'd previously closed the ChannelMonitor with a
ChannelForceClosed { should_broadcast: false }
indicating it may be unsafe to broadcast. In this case, we should disable automated broadcasting. This should just require tracking theshould_broadcast
state in theChannelMonitor
itself and checking it before broadcasting (maybe a newautomated
flag onbroadcast_Latest_holder_commitment_txn
?)This is really "the" issue on #775 so I'm gonna close that in favor of this one.
The text was updated successfully, but these errors were encountered: