Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

routing: cancelable payment loop #8734

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

hieblmi
Copy link
Collaborator

@hieblmi hieblmi commented May 7, 2024

Fixes #8534.

While interrupting lncli estimateroutefee ends the client program, the route server is left behind with an abandoned payment loop.

To allow users to cancel payment requests properly, the payment loop in resumePayment has to be abandoned as well as soon as the user request it.

func (p *paymentLifecycle) resumePayment() ([32]byte, *route.Route, error) {

This PR makes the SendPaymentV2 streaming context accessible from the payment loop and allows it to safely end if the context was cancelled.

@hieblmi hieblmi self-assigned this May 7, 2024
@hieblmi hieblmi added this to the v0.18.1 milestone May 7, 2024
@hieblmi hieblmi added the routing label May 7, 2024
@hieblmi hieblmi force-pushed the cancel-estimateroutefee branch 2 times, most recently from 48cfd5b to 1d6e008 Compare May 7, 2024 12:15
routing/payment_lifecycle.go Outdated Show resolved Hide resolved
@hieblmi hieblmi force-pushed the cancel-estimateroutefee branch 2 times, most recently from dcdd913 to 02072fa Compare May 7, 2024 12:50
@hieblmi hieblmi changed the title routing: Cancelable payment loop routing: cancelable payment loop May 7, 2024
@hieblmi hieblmi added the payments Related to invoices/payments label May 7, 2024
Copy link
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

Only one blocking comment around removing the context.Context nil check.

The other comment is just a suggestion

routing/payment_lifecycle.go Outdated Show resolved Hide resolved
routing/payment_lifecycle.go Outdated Show resolved Hide resolved
@hieblmi hieblmi force-pushed the cancel-estimateroutefee branch 2 times, most recently from 0bfb3db to 2ab2bd9 Compare May 10, 2024 09:15
Copy link
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@lightningnetwork lightningnetwork deleted a comment from coderabbitai bot May 10, 2024
Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have timeout already? Why do we need this? I think once the payment is sent, there's no easy way to cancel it, maybe related #5461

@hieblmi
Copy link
Collaborator Author

hieblmi commented May 11, 2024

That's also my understanding, we can't cancel the payment once the htlc is sent. Here, additionally to the existent timeout check that interrupts the payment loop on expiry, we pass in a context (from the rpc server client stream) that is checked for cancellation and interrupts the payment loop if the user cancelled the stream context(Ctrl+C'd the sendpayment from cli). This can only happen if a p.sendAttempt(attempt) completed.

Copy link
Collaborator

@bitromortac bitromortac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code wise this looks good 🔥! Wanted to note that this code may break usage patterns, if users expect to be able to dispatch payments in the background with lncli sendpayment or somehow if SendPaymentV2 is not waited on for responses. This would not be possible anymore with the current setting as it stops the payment loop in the next occasion if the context is cancelled. If the cmd is interrupted, this will lead to cancellation of the payment loop, as expected. Should we retain backward compatibility by adding a bool cancellable to the payment request (we could then decide whether we'd like to forward the context or to initialize a new background one)?

routing/payment_lifecycle.go Outdated Show resolved Hide resolved
routing/payment_lifecycle_test.go Outdated Show resolved Hide resolved
routing/payment_lifecycle_test.go Outdated Show resolved Hide resolved
routing/payment_lifecycle_test.go Outdated Show resolved Hide resolved
routing/router.go Outdated Show resolved Hide resolved
return err
}

case <-ctx.Done():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue, that if the context times out we could instead fail the payment with FailureReasonTimeout. Would need to check if the ctx.Err() is context.DeadlineExceeded.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah agree - we could also get rid of p.timeoutChan and use the context to control the timeout, so instead of using p.timeoutChan = time.After(timeout), we inherit the parent context with a timeout context here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I've removed the p.timeoutChan and replaced it with a deadline dependent context.

@@ -319,27 +322,42 @@ lifecycle:
}

// checkTimeout checks whether the payment has reached its timeout.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update the docs too

select {
case <-p.timeoutChan:
log.Warnf("payment attempt not completed before timeout")
err := failPayment(channeldb.FailureReasonTimeout)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can instead do return failPayment(...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

case <-p.router.quit:
return fmt.Errorf("check payment timeout got: %w",
ErrRouterShuttingDown)

// Fall through if we haven't hit our time limit.
// Fall through if we haven't hit our time limit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think gofmt failed here

return err
}

case <-ctx.Done():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah agree - we could also get rid of p.timeoutChan and use the context to control the timeout, so instead of using p.timeoutChan = time.After(timeout), we inherit the parent context with a timeout context here

routing/payment_lifecycle.go Outdated Show resolved Hide resolved
}

case <-ctx.Done():
log.Warnf("payment attempt context canceled")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need cap the log and add a p.identifier.

@@ -179,6 +180,34 @@ func sendPaymentAndAssertFailed(t *testing.T,
}
}

// sendPaymentAndAssertFailed calls `resumePayment` and asserts that an error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// sendPaymentAndAssertFailed calls `resumePayment` and asserts that an error
// sendPaymentAndAssertContextCancelled calls `resumePayment` and asserts that an error

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've called it sendPaymentAndAssertError.

@@ -179,6 +180,34 @@ func sendPaymentAndAssertFailed(t *testing.T,
}
}

// sendPaymentAndAssertFailed calls `resumePayment` and asserts that an error
// is returned.
func sendPaymentAndAssertContextCancelled(t *testing.T,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it's the same as sendPaymentAndAssertFailed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've consolidated the two.

return err
}

return ctx.Err()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we cannot return an error here, as it'd cause the payment loop to exit. For the timeout case, we never exit the loop. Consider this case, say a payment is made of two HTLC attempts,

HTLC1 ---> hop1 ---> remote
HTLC2 ---> hop2 ---> remote

Then remote settles the invoice, but hop2 is offline, we'd end up in this case,

HTLC1 ---> hop1 ---> remote ---> settled
HTLC2 ---> hop2 ---> remote ---> pending

After 60s, we'd time out this payment, but we won't quit the loop as we still think the payment is inflight,

// | true | true | false | true | StatusInFlight |

Then hop2 comes online and settles it, we'd consider the payment successful.

// | false | true | false | true | StatusSucceeded |

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, once the HTLC is sent, there's no easy way to cancel it unless you get a clear response from your peer saying it's failed/settled. If we exit here because the context is canceled, the payment would still have the status StatusInFlight.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation. I see why we can't interrupt the entire payment loop but instead behave exactly how we do it the timeout case.

Copy link

coderabbitai bot commented May 16, 2024

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hieblmi hieblmi force-pushed the cancel-estimateroutefee branch 3 times, most recently from 614296b to 83eb67a Compare May 17, 2024 08:08
@hieblmi
Copy link
Collaborator Author

hieblmi commented May 17, 2024

Thanks to the reviewers so far. I think I addressed all your concerns, namely:

  • Add a cacelable flag to the SendPaymentRequest that preserves the current behavior of a long running payment loop even if the stream context is canceled, but cancels the payment loop if set to true.
  • Wrap the send payment context into a deadline context in case the a timeout is set in the SendPaymentRequest.
  • Remove the timeout channel in payment_lifecycle.go in favor of the new timeout context.

I tested different scenarios locally and didn't encounter an error, additionally to the added test.

Looking forward to another round of feedback.


/*
If set, the payment loop can be interrupted by manually canceling the
payment context, even before the payment timeout is reached.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add something like: Note that the payment may still succeed after cancellation, as in-flight attempts can still settle afterwards. Canceling will only prevent further attempts from being sent.. Similarly for the cli flag


// Cancel the timeout context. If the context already timed out or if
// there was no timeout provided, this will be a no-op.
cancel()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't cancel the payment directly because trackPayment is blocking?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call is to avoid a linter issue that would arise if I wrote above ctx, _ = context.WithDeadline(ctx, timeout). And if I assign the cancel I have to call it. The right place seems to be after trackPayment because then the context can be canceled.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative could be to place it after sendPayment in SendPaymentAsync, but not sure that's a good pattern

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also silence the linter here with a comment as to why we don't need the context in this case.

// checkContext checks whether the payment context has been canceled.
// Cancellation occurs manually or if the context times out.
func (p *paymentLifecycle) checkContext(ctx context.Context) error {
failPayment := func(reason channeldb.FailureReason) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be inlined again

case <-ctx.Done():
failureReason := channeldb.FailureReasonError

switch {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an if/else instead

In this commit we set up the payment loop context
according to user-provided parameters. The
`cancelable` parameter indicates whether the user
is able to interrupt the payment loop by cancelling
the server stream context. We'll additionally wrap
the context in a deadline if the user provided a
payment timeout.
We remove the timeout channel of the payment_lifecycle.go
and in favor of the deadline context.
@hieblmi hieblmi requested a review from bitromortac May 17, 2024 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
payments Related to invoices/payments routing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature]: Allow for cancellation of EstimateRouteFee
5 participants