Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

htlcswitch: final settle signal #6517

Merged
merged 4 commits into from
Oct 27, 2022
Merged

Conversation

joostjager
Copy link
Contributor

@joostjager joostjager commented May 9, 2022

This PR adds a new final settle state for incoming htlcs. An htlc reaches this state after the update_settle_htlc update has been fully locked in, off-chain or on-chain.

A new rpc LookupHtlc is added that can be used to check whether an htlc reached the new state.

Additionally an event is added to the existing SubscribeHtlcEvents stream. This allows users to wait for final settlement without polling LookupHtlc. It is important to have a non-polling mechanism because waiting for final settlement is likely to be part of the critical path of a lightning payment.

There is no FINAL_SETTLE state on the invoice level yet. To use this api to wait for final settlement of an invoice, the following steps can be used:

  1. Wait for invoice to be settled on invoices subscription stream
  2. Extract htlc ids from settle update event.
  3. Subscribe to the htlc notifier api SubscribeHtlcEvents.
  4. Wait for the htlc notifier SubscribedEvent to be received. Otherwise we may not really be subscribed yet. It is important to do this before looking up htlcs to avoid a race condition.
  5. Lookup all htlcs retrieved in step 2 via LookupHtlc
  6. If they are all settled, the invoice is settled
  7. If some are not yet settled, wait for the final settle events for those to come in through the htlc notifier api.

More background info and design discussion: https://groups.google.com/a/lightning.engineering/g/lnd/c/Sfy_M9avQoU

Fixes #6208

@otech47
Copy link
Contributor

otech47 commented May 25, 2022

Concept ACK

Is there perhaps any unconsidered complexity for situations where multiple HTLCs exist in the commitment transaction for a single channel? Or would the invoice status not depend at all on any other HTLCs in the channel?

@joostjager
Copy link
Contributor Author

joostjager commented May 25, 2022

Is there perhaps any unconsidered complexity for situations where multiple HTLCs exist in the commitment transaction for a single channel?

There should not be such complexity. The code as is should record all final settlements of htlcs, regardless of whether they are grouped together on the same commitment.

Or would the invoice status not depend at all on any other HTLCs in the channel?

For a multi-part payment, the invoice status could depend on multiple htlcs in the same channel. But the scope of this PR does not include invoices at all.

To determine whether a received payment is final, one should retrieve all the settled htlcs from the invoice database and then look them up one by one via for example the api that is added in this PR to see if the settlements are final.

Obviously this can be streamlined later on for invoices specifically.

@joostjager joostjager force-pushed the lookup-htlc branch 4 times, most recently from e03c8d1 to 0be85ca Compare July 11, 2022 14:08
@joostjager joostjager marked this pull request as ready for review July 11, 2022 14:13
@joostjager joostjager requested a review from Roasbeef July 11, 2022 14:19
@Roasbeef Roasbeef added this to the v0.16.0 milestone Jul 14, 2022
@Roasbeef Roasbeef added safety General label for issues/PRs related to the safety of using the software htlcswitch invoices labels Jul 14, 2022
@joostjager
Copy link
Contributor Author

joostjager commented Jul 20, 2022

Waiting for design ack for the direction set out in this PR. In particular storing the settled htlc ids in a new bucket settled-htlcs and doing so atomically in the channel state update transaction.

channeldb/channel.go Outdated Show resolved Hide resolved
@Crypt-iQ
Copy link
Collaborator

Crypt-iQ commented Aug 25, 2022

I think this patch shouldn't use the shortchanid, and should instead use the funding outpoint - we shouldn't add more code that needs reorg handling. I also think there should be benchmarks for lookup latency with N settles over M channels.

In the revocation PR, you brought up the idea of looking at the resolvers instead of doing this. Why would you need this if you weren't the exit hop? Maybe this is because I don't understand the htlc interceptor settle flow where you intercept and settle back, already knowing the preimage. Another case is maybe you want to see if you've lost money as an intermediate node due to not being able to claim on the incoming?

Another thought is: could this use some sort of hash accumulator? It seems that you're only performing lookups on the set... Not sure of a golang library that can help here.

@joostjager
Copy link
Contributor Author

joostjager commented Aug 25, 2022

I think this patch shouldn't use the shortchanid, and should instead use the funding outpoint - we shouldn't add more code that needs reorg handling.

Are reorgs even supported generally for channels that are fully confirmed?

But I get the point. I use short channel id, because we also use that in the htlc interception api. These two apis are related and can be used together. Short chan id may also have not been the best decision for htlc interception?

I could convert the PR to outpoints and then leave it up to the user to do the mapping from short chan ids if necessary?

I also think there should be benchmarks for lookup latency with N settles over M channels.

What scenario are you thinking of specifically when performance would be critical? LND internally doesn't do any lookups, it would just be load generated by a connected application.

In the revocation PR, you brought up the idea of looking at the resolvers instead of doing this. Why would you need this if you weren't the exit hop? Maybe this is because I don't understand the htlc interceptor settle flow where you intercept and settle back, already knowing the preimage. Another case is maybe you want to see if you've lost money as an intermediate node due to not being able to claim on the incoming?

Linking the resolver idea here: #6347 (comment)

I think just looking at the resolvers is not sufficient, because you don't get a signal when an htlc setlles off-chain in the happy flow. Without that signal, an application still doesn't know when the payment truly cleared.

One scenario where you need this as a routing node is if you use the htlc interceptor to create virtual channels. We use it with lnmux. I believe LSPs for mobile users have similar requirements for just-in-time opened channels. And indeed, more generally for exact accounting on a routing node you probably want to know if you were able to claim all your incoming forwards. For that though, the resolver information might be sufficient as it is not time critical. Maybe there is an edge case if the counterparty doesn't go to chain (yet) to timeout the htlc.

Another thought is: could this use some sort of hash accumulator? It seems that you're only performing lookups on the set... Not sure of a golang library that can help here.

Not familiar with hash accumulators, but isn't that a probabilistic structure that can lead to false positives? False positives wouldn't be good for this application.

@Crypt-iQ
Copy link
Collaborator

Are reorgs even supported generally for channels that are fully confirmed?

No, my point is that (I think) eventually we'll be moving away from SCIDs to identify a channel, so it may not make sense to make another place where that's done.

Short chan id may also have not been the best decision for htlc interception?

I don't think you have a choice, the switch is pretty tied to SCIDs at the moment.

What scenario are you thinking of specifically when performance would be critical? LND internally doesn't do any lookups, it would just be load generated by a connected application.

What about the rpcserver call to check if something is settled? Additionally, some nodes will be inserting frequently into this global settled bucket so that's where the concern comes from. I'm basically wondering if a large bucket here makes insertion/lookups very slow due to bbolt traversing the b+ tree and having to cache a lot of leaves.

Not familiar with hash accumulators, but isn't that a probabilistic structure that can lead to false positives? False positives wouldn't be good for this application.

I was thinking of something like utreexo that had proof inclusion, but after glancing at the paper, that doesn't actually help with storage here.

@joostjager
Copy link
Contributor Author

No, my point is that (I think) eventually we'll be moving away from SCIDs to identify a channel, so it may not make sense to make another place where that's done.

Can you get confirmation on that before I convert this PR?

What scenario are you thinking of specifically when performance would be critical? LND internally doesn't do any lookups, it would just be load generated by a connected application.

What about the rpcserver call to check if something is settled? Additionally, some nodes will be inserting frequently into this global settled bucket so that's where the concern comes from. I'm basically wondering if a large bucket here makes insertion/lookups very slow due to bbolt traversing the b+ tree and having to cache a lot of leaves.

My thinking here is that that single db lookup should be negligible compared to all the updates that are required to just lock in and settle an htlc on the channel. It can also be a batch lookup for more efficiency. The insert itself is part of an existing transaction, so at least that wouldn't add another write lock.

I can do some benchmarking once the shape of this PR is more final. Linking related PR lightninglabs/neutrino#213, which @Roasbeef brought up in the review club.

@Crypt-iQ
Copy link
Collaborator

Can you get confirmation on that before I convert this PR?

Yes, I confirmed it

The insert itself is part of an existing transaction, so at least that wouldn't add another write lock.

I am concerned about the insert here

@joostjager
Copy link
Contributor Author

joostjager commented Aug 26, 2022

@Crypt-iQ if you talk about the funding outpoint, do you mean txid:index, or the "compressed" 32-byte format that xors the index into the txid? I don't think we use the 32-byte format on the rpc interface anywhere yet.

@Crypt-iQ
Copy link
Collaborator

@Crypt-iQ if you talk about the funding outpoint, do you mean txid:index, or the "compressed" 32-byte format that xors the index into the txid? I don't think we use the 32-byte format on the rpc interface anywhere yet.

the outpoint, not the channel id (which is the 32-byte representation)

@joostjager
Copy link
Contributor Author

joostjager commented Aug 26, 2022

Another thing to be aware of is that we need to keep the mapping around for closed channels too. For example when you LookupInvoice and want to get the final status of the htlc, you still need to map the short chan ids in that output to outpoints and the channels may be closed by that time. Do you foresee any problems with this or is it just a matter of downloading the mapping from ClosedChannels, or looking up on-chain based on short chan id? It's not the most convenient for the caller though.

@saubyk saubyk removed this from the v0.16.0 milestone Aug 27, 2022
@joostjager joostjager force-pushed the lookup-htlc branch 2 times, most recently from 923633e to 9339943 Compare October 26, 2022 12:57
@joostjager
Copy link
Contributor Author

And making this opt-in can complicate things. So I guess I'm not sure, tho leaning towards making it opt-in, at lease give users an option.

Given the advanced state of this PR, would you be okay with discussing and implementing the opt-in in another PR, possibly combined with delete functionality?

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the advanced state of this PR, would you be okay with discussing and implementing the opt-in in another PR, possibly combined with delete functionality?

Sure, let's continue there then. Again good work!

channeldb/db.go Outdated

finalHtlcsBucket := tx.ReadBucket(finalHtlcsBucket)
if finalHtlcsBucket == nil {
return nil, kvdb.ErrBucketNotFound
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be a defined error, something like ErrNoChanDBExists, otherwise we won't know which bucket is missing by looking at the error alone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller is wrapping it:

return fmt.Errorf("cannot fetch final htlcs bucket: %w", err)

Do you think that is insufficient?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's sufficient for the caller's caller, not the method itself. So whoever calls LookupFinalHtlc understands it, but another method using fetchFinalHtlcsBucket won't, unless we wrap it there again. Plus I won't be able to tell whether it's the root bucket is missing, or the nested bucket is missing here. This is nit, but may give us headache in the long run.

Copy link
Contributor Author

@joostjager joostjager Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so this is getting back to #6517 (comment). Does every error need to contain an indication of the function that generated it? I am not sure if this is done generally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there is also a problem with the pr currently, because in rpcserver.go, I still match against ErrHtlcUnknown. Bubbling up those very granular errors also requires them to be matched higher up, which can be considered a layering violation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did add specific errors for the two missing bucket cases, replacing kvdb.ErrBucketNotFound. Btw, the linter forcing Error.Is in switch statements doesn't feel completely right to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the inverse of how %w is commonly used. Don't you think that there is a limit to how much low-level error information you can carry across package boundaries? I'd think that logging may be a better way to keep the layers separated cleanly.

That's also true. I don't like the error chain becoming too long, and logging is definitely very helpful. But it seems like we don't do any loggings in channel.go. As for the new pattern, I think we need to gather more opinions. Also I'll try it out to see if it's really better in my other PRs.

Btw, the linter forcing Error.Is in switch statements doesn't feel completely right to me.

I get it. But I just tell myself errors.Is is just like == with the ability to compare wrapped errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal opinion is that errors.Is everywhere does not always make it better. Also for the common check if ctx.Err() == context.Cancelled it feels a bit weird. Almost like you don't really know what you're checking and instead use a catch-all to be sure. But ofc a minor nit in the grand scheme of things.

For this PR, shall I leave it as is now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's good enough! Thanks for the work and discussion, it's been very helpful to me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joostjager joostjager force-pushed the lookup-htlc branch 2 times, most recently from cf8b018 to 642ee7e Compare October 27, 2022 06:53
@joostjager
Copy link
Contributor Author

joostjager commented Oct 27, 2022

Removed dev cli command. Saved here: bottlepay@5292416

@casey-bowman
Copy link

casey-bowman commented Nov 6, 2022

I'd like to clarify what to expect in the API of lnd 0.16 regarding this issue. I saw @joostjager 's tweet

To address this, lnd 0.16 will contain a new api that provides insight into the final resolution of htlcs. It's another step in the maturation process of lightning. Use it to your advantage.

I'm currently using Alex Bosworth's lightning library, checking for settlement on an invoice subscription. Is there or will there be an event in the invoices subscription stream from lnd for final settlement in v0.16?

@joostjager
Copy link
Contributor Author

joostjager commented Nov 7, 2022

Is there or will there be an event in the invoices subscription stream from lnd for final settlement in v0.16?

Currently there is not. Added steps for how to do it at the moment to the pr description.

@JssDWt
Copy link
Contributor

JssDWt commented Jan 16, 2023

@joostjager I seem to end up in this piece of code here for every new block for a channel that's still open. The htlc is apparently not handled.

This PR appears to fix the problem by finalizing sub dust incoming htlcs. So I'm considering applying this PR. But the following comment suggests that the channel should already be closed: https://github.com/lightningnetwork/lnd/pull/6517/files#diff-a0b8064876b1b1d6085fa7ffdbfd38c81cb06c1ca3f34a08dbaacba203cda3ebR2227-R2228

Do you have an idea whether it will be problematic that the channel is still open when handling the HtlcIncomingDustFinalAction?

@joostjager
Copy link
Contributor Author

I seem to end up in this piece of code here for every new block for a channel that's still open.

What is it that you try to do exactly? I could use a bit more context to answer the question.

If the channel is still open and there are no htlcs at risk (or htlcs at risk are dust), there is no action to be taken for the channel arbitrator.

@JssDWt
Copy link
Contributor

JssDWt commented Jan 16, 2023

If the channel is still open and there are no htlcs at risk (or htlcs at risk are dust), there is no action to be taken for the channel arbitrator.

@joostjager I've got expired htlcs, some are even 520 days old. I get go to chain for incoming htlc logs for these htlcs on every new block. So it seems these htlcs are never resolved. Then I noticed that these HTLCs never get any action assigned if they're sub dust. So it never goes to chain, but also never ends up in a final state. I currently have the suspicion many of these sub dust incoming htlcs are clogging up the work LND is doing and that is causing new htlcs to expire. Hence I thought applying this PR, where the htlcs do end up in a final state, could help.

@joostjager
Copy link
Contributor Author

joostjager commented Jan 16, 2023

I don't think that this PR is going to help with that. It only adds reporting of the on-chain resolution to the final resolution table, but doesn't change the actual resolution / mark as complete process itself.

@JssDWt
Copy link
Contributor

JssDWt commented Jan 16, 2023

I don't think that this PR is going to help with that. It only adds reporting of the on-chain resolution to the final resolution table, but doesn't change the actual resolution / mark as complete process itself.

Thanks for the quick reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
htlcswitch invoices safety General label for issues/PRs related to the safety of using the software
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reliable HTLC outcome lookup