Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChannelSigner::sign_holder_commitment_and_htlcs causes panic when returning Err #2520

Closed
waterson opened this issue Aug 24, 2023 · 7 comments · Fixed by #2816
Closed

ChannelSigner::sign_holder_commitment_and_htlcs causes panic when returning Err #2520

waterson opened this issue Aug 24, 2023 · 7 comments · Fixed by #2816

Comments

@waterson
Copy link
Contributor

There are currently several expects in OnChainTxHandler (e.g., get_fully_signed_holder_tx) that make it such that any implementation of ChannelSigner::sign_holder_commitment_and_htlcs that returns Err will panic LDK.

I am wondering if it is possible to rethink how this works (e.g., caching the signatures when we advance the channel state?) so that the existing callers in OnChainTxHandler might be able to access the commitment and HTLC signatures without querying the signer.

@TheBlueMatt
Copy link
Collaborator

We absolutely should! Ideally all the signer methods are fallible/async in some way (#2088), but this may well be a ticker part of that overall work. We do have some retry logic for broadcasting - OnchainTxHandler::rebroadcast_pending_claims, so we may be able to get away with just kinda using that here, but will need to check the call tree to see where we may run into issues.

@wpaulino
Copy link
Contributor

There are currently several expects in OnChainTxHandler (e.g., get_fully_signed_holder_tx) that make it such that any implementation of ChannelSigner::sign_holder_commitment_and_htlcs that returns Err will panic LDK.

Looks like we'll also panic on failures from ChannelSigner::sign_justice_revoked_output, ChannelSigner::sign_justice_revoked_htlc, and ChannelSigner::sign_counterparty_htlc_transaction via:

let transaction = cached_request.finalize_malleable_package(
cur_height, self, output_value, self.destination_script.clone(), logger
).unwrap();

So I think the idea here would be to get rid of the unwrap/expect calls, and allow the failure to propagate up the call stack. If the signer fails to sign, you can either:

  1. Track the failure, wait for the signer to recover, and manually trigger re-signing of all pending claims via ChannelMonitor::rebroadcast_pending_claims or
  2. Allow the signer to recover (no need to manually track when it does) and rely on the BackgroundProcessor to invoke ChannelMonitor::rebroadcast_pending_claims for you every 30 seconds

As for OnchainTxHandler::get_fully_signed_holder_tx, there are two existing cases not covered by the ChannelMonitor::rebroadcast_pending_claims approach:

  1. self.broadcast_latest_holder_commitment_txn(broadcaster, logger);

Instead of broadcasting manually within the monitor on pre-anchors channels, we should just queue a claim to the OnchainTxHandler as we do with anchors (only caveat is that for pre-anchors channels we should also queue the HTLC claim requests as they're not CSV'd). This would allow us to handle the retry via ChannelMonitor::rebroadcast_pending_claims.

  1. let commitment_tx = self.onchain_tx_handler.get_fully_signed_holder_tx(&self.funding_redeemscript);

Here, we already queue a claim to the OnchainTxHandler to sign and broadcast the current commitment, so there's no need to call this. It's currently being used to pass the current commitment transaction to get_broadcasted_holder_watch_outputs further below, but that doesn't require the transaction to be signed.

@rmalonson
Copy link
Contributor

One issue I'm seeing with using ChannelMonitor::rebroadcast_pending_claims in its current state is that it regenerates the claim each time which would require a new signature. I'm not sure how this would work with the async signature flow we're trying to achieve. It won't be like we will know if a remote signer is "online" or not - it will probably be that we send a request to the remote signer for a signature and we set up a handler for accepting the signature when the remote signer sends it. If we just try to queue this claim back into rebroadcast_pending_claims we'll just retrigger another request for a signature and get into a cycle.

It seems like we'll either need to make some changes to rebroadcast_pending_claims to not always regenerate the claim (i.e. using a new param or something) or add a separate function that rebroadcasts the claim without regenerating. I know there is probably some risk that fee rates may have changed when broadcasting the old transaction without regenerating but they seem fairly stable so not sure how big that risk is?

Am I understanding correctly? WDYT?

@wpaulino
Copy link
Contributor

It seems like we'll either need to make some changes to rebroadcast_pending_claims to not always regenerate the claim (i.e. using a new param or something) or add a separate function that rebroadcasts the claim without regenerating

Hmm, I guess we could go with the latter approach. The only issue is that, since the signer only knows about the transaction itself, we can't easily map back to the originating request without doing a linear search or adding an additional tracking map for requests that failed with something like ErrSignerUnavailable.

Another approach would be to expose a future as similarly done in #1980. When the future resolves with the signature, we'd call back into LDK to do any remaining work. This could be generally applied to all signing functions, but would also result in a lot more complexity. Any thoughts @TheBlueMatt?

@wpaulino
Copy link
Contributor

I know there is probably some risk that fee rates may have changed when broadcasting the old transaction without regenerating but they seem fairly stable so not sure how big that risk is?

If we have a signing request that doesn't resolve and a new block is mined leading to a transaction bump, we'll just have another signing request to resolve. As long as we handle both as normal, it should be fine.

@waterson
Copy link
Contributor Author

So I'm off on a tangent in #2487 which is trying to address the same problem except for get_per_commitment_point and release_commitment_secret.

After some discussion there, we decided to go for an approach of soft-failing out and explicitly requiring the signer to retry. The implications of doing this is that you end up with a bunch of "retry points" for each channel. I think I can make this work for get_per_commitment_point and release_commitment_secret but I haven't even begun to think about the other methods that request a signature.

We could obviously try to extending this approach to every single method in the channel signer. That's probably not going to be super great, but if it is what we opt to do then I think doing it with some sort of generic continuation-passing / future mechanism would be the way to go for example. (Maybe this overlaps with the stuff in ln/util/wakers.rs, not sure.) This was kind of the direction I was going before @devrandom and @TheBlueMatt suggested we just do retry.

@TheBlueMatt
Copy link
Collaborator

After some discussion there, we decided to go for an approach of soft-failing out and explicitly requiring the signer to retry. The implications of doing this is that you end up with a bunch of "retry points" for each channel. I think I can make this work for get_per_commitment_point and release_commitment_secret but I haven't even begun to think about the other methods that request a signature.

I think those may be the harder ones to get working. signing the counterparty commitment already has retry points (because we do it on reconnect or on monitor update completion). I think we should see what happens if we keep going down the path you're on before we fall back to trying something else. Sadly we can't "just" use rust futures since we support non-async environments (and Rust async functions are colored functions), not to mention such futures require pin and restarting them is a pain.

We don't currently have any such infrastructure in ChannelMonitor, as you note. I do think a custom future type would be pretty easy to deal with - in most cases we're not talking about actually doing anything with the signature except adding it to the transaction and broadcasting - put the transaction and a clone of the reference to the transactionbroadcaster in the future, when the future completes broadcast the transaction. In cases where we want a copy of the transaction for rebroadcast later we should stop doing that and just force the user to re-sign every time we want to rebroadcast. This is useful for transaction propagation anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants