Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reliable payments] router payment state machine #2761

Merged

Conversation

halseth
Copy link
Contributor

@halseth halseth commented Mar 12, 2019

This PR introduces a persistent state machine to the ChannelRouter's payment flow, ensuring we can handle payment results that are received after a restart. It also paves they way for adding cancellation of payments, and resuming payment sessions across restarts.

Problem

lnd currently runs into problems if it is restarted while an HTLC is in flight on the network. The primary reason for this is the way the router hands the HTLC to the switch. The router persist no information about the payment, so when a result eventually comes back, we risk the information needed to properly populate the OutgoingPayment in the database is lost.

Solution

With the paymentStateMachine introduced in this PR, we store two pieces of key information:

  1. The HTLC paymentID. This is used to query the Switch whether the HTLC is still active. If not active we know that we can safely retry the payment attempt. If active the router will wait for the result to be available, and store the result to the DB.
  2. The HTLC's route. This is added to the DB together with the preimage when the payment succeeds.

Note

The Switch does not currently persist the pending payment attempts across restarts. This will be added in a follow-up PR.

Replaces #2475

Builds on

@halseth halseth added this to the 0.6 milestone Mar 12, 2019
@halseth halseth added the payments Related to invoices/payments label Mar 12, 2019
@halseth halseth force-pushed the reliable-payments-router-state-machine branch from e83fd93 to 2409e3d Compare March 12, 2019 13:55
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dig this approach! With the state machine, I find it much easier to follow than the prior attempts at a solution to this problem. I've completed an initial pass so far, and the main question in my mind is the size of the overlap between this new state machine and the existing control tower in the switch. At one point the control tower addressed a need within the codebase, but it seems like this new state machine can eventually subsume the responsibilities of the control tower.

htlcswitch/switch.go Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
routing/router.go Outdated Show resolved Hide resolved
htlcswitch/pending_payment.go Outdated Show resolved Hide resolved
htlcswitch/switch.go Outdated Show resolved Hide resolved
routing/route.go Outdated Show resolved Hide resolved
routing/payment_state_machine.go Outdated Show resolved Hide resolved
routing/payment_state_machine.go Outdated Show resolved Hide resolved
routing/payment_state_machine.go Outdated Show resolved Hide resolved
routing/router.go Outdated Show resolved Hide resolved
@halseth halseth force-pushed the reliable-payments-router-state-machine branch 7 times, most recently from 78a0c81 to 1687cdc Compare March 20, 2019 10:22
Copy link
Collaborator

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am, as always, worried about the persistence and the work and inflexibility it may cause us in the future. Tried to mainly analyze this part of the PR.

Why don't you merge the switch pr first btw, work bottom up? Or will we have with just this PR already good working resumes of payments?

channeldb/payments.go Outdated Show resolved Hide resolved
channeldb/payments.go Outdated Show resolved Hide resolved
htlcswitch/pending_payment.go Outdated Show resolved Hide resolved
routing/router.go Outdated Show resolved Hide resolved
routing/router.go Outdated Show resolved Hide resolved
routing/payment_state_machine.go Outdated Show resolved Hide resolved
binary.BigEndian.PutUint64(paymentIDBytes, paymentID)
// CompletePayment overwrites the OutgoingPayment stored in the DB for the
// corresponding payment hash with the completed one.
func (db *DB) CompletePayment(preimage lntypes.Preimage,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks unused

routing/router.go Show resolved Hide resolved
routing/router.go Show resolved Hide resolved
routing/payment_state_machine.go Outdated Show resolved Hide resolved
@halseth halseth force-pushed the reliable-payments-router-state-machine branch 5 times, most recently from ce35b99 to 9d1bd42 Compare March 21, 2019 14:58
Copy link
Collaborator

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely nicer without that intermediate persistent state.

Main comments:

  • Code structure in router
  • Consolidation of payment related stores

channeldb/payments.go Outdated Show resolved Hide resolved
routing/control_tower.go Outdated Show resolved Hide resolved
routing/payment_store.go Outdated Show resolved Hide resolved
routing/router.go Outdated Show resolved Hide resolved
routing/router.go Outdated Show resolved Hide resolved
channeldb/payments.go Outdated Show resolved Hide resolved
Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice refactor of the router/switch interactions, this should make the whole payment flow much tighter!

one question i have with new state introduced in the control tower, will we just treat all payments that are currently grounded as started but never attempted? an alternative would be to rename grounded payments as failed, and introduce a new state for initiated. not sure which is better atm

routing/router.go Outdated Show resolved Hide resolved
return nil
}

ctx.router.cfg.GetPaymentResult = func(paymentID uint64) (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many of these GetPaymentResult funcs seem similar, any way we can generate the closures w/ less code duplication?

routing/router.go Outdated Show resolved Hide resolved
routing/router.go Outdated Show resolved Hide resolved
routing/route_test.go Outdated Show resolved Hide resolved
rpcserver.go Show resolved Hide resolved
routing/router_test.go Outdated Show resolved Hide resolved
routing/control_tower.go Outdated Show resolved Hide resolved
routing/control_tower.go Outdated Show resolved Hide resolved
htlcswitch/switch.go Outdated Show resolved Hide resolved
@halseth halseth force-pushed the reliable-payments-router-state-machine branch 3 times, most recently from 7180772 to 7aae12a Compare March 26, 2019 10:11
channeldb/payments.go Outdated Show resolved Hide resolved
routing/router.go Show resolved Hide resolved
@halseth halseth force-pushed the reliable-payments-router-state-machine branch 2 times, most recently from fd20644 to ab0f77c Compare March 26, 2019 14:38
@Roasbeef Roasbeef removed this from the 0.6 milestone Mar 26, 2019
halseth added 25 commits May 27, 2019 20:18
migrateOutgoingPayments moves the OutgoingPayments into a new bucket format
where they all reside in a top-level bucket indexed by the payment hash. In
this sub-bucket we store information relevant to this payment, such as the
payment status.

To avoid that the router resend payments that have the status InFlight (we
cannot resume these payments for pre-migration payments) we delete those
statuses, so only Completed payments remain in the new bucket structure.
Since we have performed a migration, the db should be in a consistent
state, and we can remove the non-strict option.
This commit gives a new responsibility to the control tower, letting it
populate the payment bucket structure as the payment goes through
its different stages.

The payment will transition states Grounded->InFlight->Success/Failed,
where the CreationInfo/AttemptInfo/Preimage must be set accordingly.

This will be the main driver for the router state machine.
This encapsulates all state needed to resume a payment from any point of
the payment flow, and that must be shared between the different stages
of the execution. This is done to prepare for breaking the send loop
into smaller parts, and being able to resume the payment from any point
from persistent state.
This commit makes the router use the ControlTower to drive the payment
life cycle state machine, to keep track of active payments across
restarts.  This lets the router resume payments on startup, such that
their final results can be handled and stored when ready.
On startup the router will fetch the in-flight payments from the control
tower, and resume their execution.
TestRouterPaymentStateMachine tests that the router interacts as
expected with the ControlTower during a payment lifecycle, such that it
payment attempts are not sent twice to the switch, and results are
handled after a restart.
TestPaymentControlDeleteNonInFlight checks that calling DeletaPayments
only deletes payments from the database that are not in-flight.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
payments Related to invoices/payments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants