ChannelSigner::sign_holder_commitment_and_htlcs causes panic when returning Err #2520

waterson · 2023-08-24T21:30:45Z

There are currently several expects in OnChainTxHandler (e.g., get_fully_signed_holder_tx) that make it such that any implementation of ChannelSigner::sign_holder_commitment_and_htlcs that returns Err will panic LDK.

I am wondering if it is possible to rethink how this works (e.g., caching the signatures when we advance the channel state?) so that the existing callers in OnChainTxHandler might be able to access the commitment and HTLC signatures without querying the signer.

The text was updated successfully, but these errors were encountered:

TheBlueMatt · 2023-08-25T00:03:06Z

We absolutely should! Ideally all the signer methods are fallible/async in some way (#2088), but this may well be a ticker part of that overall work. We do have some retry logic for broadcasting - OnchainTxHandler::rebroadcast_pending_claims, so we may be able to get away with just kinda using that here, but will need to check the call tree to see where we may run into issues.

wpaulino · 2023-08-25T20:15:38Z

There are currently several expects in OnChainTxHandler (e.g., get_fully_signed_holder_tx) that make it such that any implementation of ChannelSigner::sign_holder_commitment_and_htlcs that returns Err will panic LDK.

Looks like we'll also panic on failures from ChannelSigner::sign_justice_revoked_output, ChannelSigner::sign_justice_revoked_htlc, and ChannelSigner::sign_counterparty_htlc_transaction via:

rust-lightning/lightning/src/chain/onchaintx.rs

Lines 607 to 609 in afdcd1c

    
           let transaction = cached_request.finalize_malleable_package( 
        
           	cur_height, self, output_value, self.destination_script.clone(), logger 
        
           ).unwrap();

So I think the idea here would be to get rid of the unwrap/expect calls, and allow the failure to propagate up the call stack. If the signer fails to sign, you can either:

Track the failure, wait for the signer to recover, and manually trigger re-signing of all pending claims via ChannelMonitor::rebroadcast_pending_claims or
Allow the signer to recover (no need to manually track when it does) and rely on the BackgroundProcessor to invoke ChannelMonitor::rebroadcast_pending_claims for you every 30 seconds

As for OnchainTxHandler::get_fully_signed_holder_tx, there are two existing cases not covered by the ChannelMonitor::rebroadcast_pending_claims approach:

rust-lightning/lightning/src/chain/channelmonitor.rs

Line 2623 in afdcd1c

self.broadcast_latest_holder_commitment_txn(broadcaster, logger);

Instead of broadcasting manually within the monitor on pre-anchors channels, we should just queue a claim to the OnchainTxHandler as we do with anchors (only caveat is that for pre-anchors channels we should also queue the HTLC claim requests as they're not CSV'd). This would allow us to handle the retry via ChannelMonitor::rebroadcast_pending_claims.

rust-lightning/lightning/src/chain/channelmonitor.rs

Line 3494 in afdcd1c

let commitment_tx = self.onchain_tx_handler.get_fully_signed_holder_tx(&self.funding_redeemscript);

Here, we already queue a claim to the OnchainTxHandler to sign and broadcast the current commitment, so there's no need to call this. It's currently being used to pass the current commitment transaction to get_broadcasted_holder_watch_outputs further below, but that doesn't require the transaction to be signed.

rmalonson · 2023-08-31T18:30:40Z

One issue I'm seeing with using ChannelMonitor::rebroadcast_pending_claims in its current state is that it regenerates the claim each time which would require a new signature. I'm not sure how this would work with the async signature flow we're trying to achieve. It won't be like we will know if a remote signer is "online" or not - it will probably be that we send a request to the remote signer for a signature and we set up a handler for accepting the signature when the remote signer sends it. If we just try to queue this claim back into rebroadcast_pending_claims we'll just retrigger another request for a signature and get into a cycle.

It seems like we'll either need to make some changes to rebroadcast_pending_claims to not always regenerate the claim (i.e. using a new param or something) or add a separate function that rebroadcasts the claim without regenerating. I know there is probably some risk that fee rates may have changed when broadcasting the old transaction without regenerating but they seem fairly stable so not sure how big that risk is?

Am I understanding correctly? WDYT?

wpaulino · 2023-08-31T20:52:34Z

It seems like we'll either need to make some changes to rebroadcast_pending_claims to not always regenerate the claim (i.e. using a new param or something) or add a separate function that rebroadcasts the claim without regenerating

Hmm, I guess we could go with the latter approach. The only issue is that, since the signer only knows about the transaction itself, we can't easily map back to the originating request without doing a linear search or adding an additional tracking map for requests that failed with something like ErrSignerUnavailable.

Another approach would be to expose a future as similarly done in #1980. When the future resolves with the signature, we'd call back into LDK to do any remaining work. This could be generally applied to all signing functions, but would also result in a lot more complexity. Any thoughts @TheBlueMatt?

wpaulino · 2023-08-31T20:57:33Z

I know there is probably some risk that fee rates may have changed when broadcasting the old transaction without regenerating but they seem fairly stable so not sure how big that risk is?

If we have a signing request that doesn't resolve and a new block is mined leading to a transaction bump, we'll just have another signing request to resolve. As long as we handle both as normal, it should be fine.

waterson · 2023-08-31T21:51:49Z

So I'm off on a tangent in #2487 which is trying to address the same problem except for get_per_commitment_point and release_commitment_secret.

After some discussion there, we decided to go for an approach of soft-failing out and explicitly requiring the signer to retry. The implications of doing this is that you end up with a bunch of "retry points" for each channel. I think I can make this work for get_per_commitment_point and release_commitment_secret but I haven't even begun to think about the other methods that request a signature.

We could obviously try to extending this approach to every single method in the channel signer. That's probably not going to be super great, but if it is what we opt to do then I think doing it with some sort of generic continuation-passing / future mechanism would be the way to go for example. (Maybe this overlaps with the stuff in ln/util/wakers.rs, not sure.) This was kind of the direction I was going before @devrandom and @TheBlueMatt suggested we just do retry.

TheBlueMatt · 2023-09-03T19:55:35Z

After some discussion there, we decided to go for an approach of soft-failing out and explicitly requiring the signer to retry. The implications of doing this is that you end up with a bunch of "retry points" for each channel. I think I can make this work for get_per_commitment_point and release_commitment_secret but I haven't even begun to think about the other methods that request a signature.

I think those may be the harder ones to get working. signing the counterparty commitment already has retry points (because we do it on reconnect or on monitor update completion). I think we should see what happens if we keep going down the path you're on before we fall back to trying something else. Sadly we can't "just" use rust futures since we support non-async environments (and Rust async functions are colored functions), not to mention such futures require pin and restarting them is a pain.

We don't currently have any such infrastructure in ChannelMonitor, as you note. I do think a custom future type would be pretty easy to deal with - in most cases we're not talking about actually doing anything with the signature except adding it to the transaction and broadcasting - put the transaction and a clone of the reference to the transactionbroadcaster in the future, when the future completes broadcast the transaction. In cases where we want a copy of the transaction for rebroadcast later we should stop doing that and just force the user to re-sign every time we want to rebroadcast. This is useful for transaction propagation anyway.

rmalonson mentioned this issue Sep 15, 2023

Remove one unnecessary call for sign_holder_commitment_and_htlcs #2579

Merged

wpaulino mentioned this issue Nov 3, 2023

Refactor commitment broadcast to always go through OnchainTxHandler #2703

Merged

wpaulino mentioned this issue Jan 8, 2024

Allow holder commitment and HTLC signature requests to fail #2816

Merged

TheBlueMatt closed this as completed in #2816 Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChannelSigner::sign_holder_commitment_and_htlcs causes panic when returning Err #2520

ChannelSigner::sign_holder_commitment_and_htlcs causes panic when returning Err #2520

waterson commented Aug 24, 2023

TheBlueMatt commented Aug 25, 2023

wpaulino commented Aug 25, 2023

rmalonson commented Aug 31, 2023

wpaulino commented Aug 31, 2023

wpaulino commented Aug 31, 2023

waterson commented Aug 31, 2023

TheBlueMatt commented Sep 3, 2023

ChannelSigner::sign_holder_commitment_and_htlcs causes panic when returning Err #2520

ChannelSigner::sign_holder_commitment_and_htlcs causes panic when returning Err #2520

Comments

waterson commented Aug 24, 2023

TheBlueMatt commented Aug 25, 2023

wpaulino commented Aug 25, 2023

rmalonson commented Aug 31, 2023

wpaulino commented Aug 31, 2023

wpaulino commented Aug 31, 2023

waterson commented Aug 31, 2023

TheBlueMatt commented Sep 3, 2023