New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
routing: cancelable payment loop #8734
base: master
Are you sure you want to change the base?
Conversation
48cfd5b
to
1d6e008
Compare
dcdd913
to
02072fa
Compare
02072fa
to
e2cfd30
Compare
e2cfd30
to
4fa701d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
Only one blocking comment around removing the context.Context
nil check.
The other comment is just a suggestion
0bfb3db
to
2ab2bd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
2ab2bd9
to
4f18995
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have timeout already? Why do we need this? I think once the payment is sent, there's no easy way to cancel it, maybe related #5461
That's also my understanding, we can't cancel the payment once the htlc is sent. Here, additionally to the existent timeout check that interrupts the payment loop on expiry, we pass in a context (from the rpc server client stream) that is checked for cancellation and interrupts the payment loop if the user cancelled the stream context(Ctrl+C'd the sendpayment from cli). This can only happen if a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code wise this looks good 🔥! Wanted to note that this code may break usage patterns, if users expect to be able to dispatch payments in the background with lncli sendpayment
or somehow if SendPaymentV2
is not waited on for responses. This would not be possible anymore with the current setting as it stops the payment loop in the next occasion if the context is cancelled. If the cmd is interrupted, this will lead to cancellation of the payment loop, as expected. Should we retain backward compatibility by adding a bool
cancellable
to the payment request (we could then decide whether we'd like to forward the context or to initialize a new background one)?
return err | ||
} | ||
|
||
case <-ctx.Done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue, that if the context times out we could instead fail the payment with FailureReasonTimeout
. Would need to check if the ctx.Err()
is context.DeadlineExceeded
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah agree - we could also get rid of p.timeoutChan
and use the context to control the timeout, so instead of using p.timeoutChan = time.After(timeout)
, we inherit the parent context with a timeout context here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I've removed the p.timeoutChan
and replaced it with a deadline dependent context.
routing/payment_lifecycle.go
Outdated
@@ -319,27 +322,42 @@ lifecycle: | |||
} | |||
|
|||
// checkTimeout checks whether the payment has reached its timeout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to update the docs too
routing/payment_lifecycle.go
Outdated
select { | ||
case <-p.timeoutChan: | ||
log.Warnf("payment attempt not completed before timeout") | ||
err := failPayment(channeldb.FailureReasonTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can instead do return failPayment(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
routing/payment_lifecycle.go
Outdated
case <-p.router.quit: | ||
return fmt.Errorf("check payment timeout got: %w", | ||
ErrRouterShuttingDown) | ||
|
||
// Fall through if we haven't hit our time limit. | ||
// Fall through if we haven't hit our time limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think gofmt
failed here
return err | ||
} | ||
|
||
case <-ctx.Done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah agree - we could also get rid of p.timeoutChan
and use the context to control the timeout, so instead of using p.timeoutChan = time.After(timeout)
, we inherit the parent context with a timeout context here
routing/payment_lifecycle.go
Outdated
} | ||
|
||
case <-ctx.Done(): | ||
log.Warnf("payment attempt context canceled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need cap the log and add a p.identifier
.
routing/payment_lifecycle_test.go
Outdated
@@ -179,6 +180,34 @@ func sendPaymentAndAssertFailed(t *testing.T, | |||
} | |||
} | |||
|
|||
// sendPaymentAndAssertFailed calls `resumePayment` and asserts that an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// sendPaymentAndAssertFailed calls `resumePayment` and asserts that an error | |
// sendPaymentAndAssertContextCancelled calls `resumePayment` and asserts that an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've called it sendPaymentAndAssertError
.
routing/payment_lifecycle_test.go
Outdated
@@ -179,6 +180,34 @@ func sendPaymentAndAssertFailed(t *testing.T, | |||
} | |||
} | |||
|
|||
// sendPaymentAndAssertFailed calls `resumePayment` and asserts that an error | |||
// is returned. | |||
func sendPaymentAndAssertContextCancelled(t *testing.T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it's the same as sendPaymentAndAssertFailed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've consolidated the two.
routing/payment_lifecycle.go
Outdated
return err | ||
} | ||
|
||
return ctx.Err() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we cannot return an error here, as it'd cause the payment loop to exit. For the timeout case, we never exit the loop. Consider this case, say a payment is made of two HTLC attempts,
HTLC1 ---> hop1 ---> remote
HTLC2 ---> hop2 ---> remote
Then remote settles the invoice, but hop2 is offline, we'd end up in this case,
HTLC1 ---> hop1 ---> remote ---> settled
HTLC2 ---> hop2 ---> remote ---> pending
After 60s, we'd time out this payment, but we won't quit the loop as we still think the payment is inflight,
lnd/channeldb/payment_status.go
Line 148 in 9d358bc
// | true | true | false | true | StatusInFlight | |
Then hop2 comes online and settles it, we'd consider the payment successful.
lnd/channeldb/payment_status.go
Line 156 in 9d358bc
// | false | true | false | true | StatusSucceeded | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, once the HTLC is sent, there's no easy way to cancel it unless you get a clear response from your peer saying it's failed/settled. If we exit here because the context is canceled, the payment would still have the status StatusInFlight
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed explanation. I see why we can't interrupt the entire payment loop but instead behave exactly how we do it the timeout case.
4f18995
to
215b4c9
Compare
215b4c9
to
9a3fa63
Compare
9a3fa63
to
99567f9
Compare
99567f9
to
d15764d
Compare
Important Auto Review SkippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
614296b
to
83eb67a
Compare
Thanks to the reviewers so far. I think I addressed all your concerns, namely:
I tested different scenarios locally and didn't encounter an error, additionally to the added test. Looking forward to another round of feedback. |
lnrpc/routerrpc/router.proto
Outdated
|
||
/* | ||
If set, the payment loop can be interrupted by manually canceling the | ||
payment context, even before the payment timeout is reached. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add something like: Note that the payment may still succeed after cancellation, as in-flight attempts can still settle afterwards. Canceling will only prevent further attempts from being sent.
. Similarly for the cli flag
|
||
// Cancel the timeout context. If the context already timed out or if | ||
// there was no timeout provided, this will be a no-op. | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't cancel the payment directly because trackPayment
is blocking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call is to avoid a linter issue that would arise if I wrote above ctx, _ = context.WithDeadline(ctx, timeout)
. And if I assign the cancel I have to call it. The right place seems to be after trackPayment
because then the context can be canceled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an alternative could be to place it after sendPayment
in SendPaymentAsync
, but not sure that's a good pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also silence the linter here with a comment as to why we don't need the context in this case.
// checkContext checks whether the payment context has been canceled. | ||
// Cancellation occurs manually or if the context times out. | ||
func (p *paymentLifecycle) checkContext(ctx context.Context) error { | ||
failPayment := func(reason channeldb.FailureReason) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be inlined again
routing/payment_lifecycle.go
Outdated
case <-ctx.Done(): | ||
failureReason := channeldb.FailureReasonError | ||
|
||
switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be an if/else instead
In this commit we set up the payment loop context according to user-provided parameters. The `cancelable` parameter indicates whether the user is able to interrupt the payment loop by cancelling the server stream context. We'll additionally wrap the context in a deadline if the user provided a payment timeout. We remove the timeout channel of the payment_lifecycle.go and in favor of the deadline context.
83eb67a
to
09592bf
Compare
Fixes #8534.
While interrupting
lncli estimateroutefee
ends the client program, the route server is left behind with an abandoned payment loop.To allow users to cancel payment requests properly, the payment loop in
resumePayment
has to be abandoned as well as soon as the user request it.lnd/routing/payment_lifecycle.go
Line 170 in f523f52
This PR makes the
SendPaymentV2
streaming context accessible from the payment loop and allows it to safely end if the context was cancelled.