Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reliable payments] router payment state machine #2761

Merged

Conversation

Projects
None yet
4 participants
@halseth
Copy link
Collaborator

commented Mar 12, 2019

This PR introduces a persistent state machine to the ChannelRouter's payment flow, ensuring we can handle payment results that are received after a restart. It also paves they way for adding cancellation of payments, and resuming payment sessions across restarts.

Problem

lnd currently runs into problems if it is restarted while an HTLC is in flight on the network. The primary reason for this is the way the router hands the HTLC to the switch. The router persist no information about the payment, so when a result eventually comes back, we risk the information needed to properly populate the OutgoingPayment in the database is lost.

Solution

With the paymentStateMachine introduced in this PR, we store two pieces of key information:

  1. The HTLC paymentID. This is used to query the Switch whether the HTLC is still active. If not active we know that we can safely retry the payment attempt. If active the router will wait for the result to be available, and store the result to the DB.
  2. The HTLC's route. This is added to the DB together with the preimage when the payment succeeds.

Note

The Switch does not currently persist the pending payment attempts across restarts. This will be added in a follow-up PR.

Replaces #2475

Builds on

@halseth halseth added this to the 0.6 milestone Mar 12, 2019

@halseth halseth added the payments label Mar 12, 2019

@halseth halseth force-pushed the halseth:reliable-payments-router-state-machine branch from e83fd93 to 2409e3d Mar 12, 2019

@Roasbeef
Copy link
Member

left a comment

I really dig this approach! With the state machine, I find it much easier to follow than the prior attempts at a solution to this problem. I've completed an initial pass so far, and the main question in my mind is the size of the overlap between this new state machine and the existing control tower in the switch. At one point the control tower addressed a need within the codebase, but it seems like this new state machine can eventually subsume the responsibilities of the control tower.

Show resolved Hide resolved htlcswitch/switch.go
Show resolved Hide resolved server.go Outdated
Show resolved Hide resolved routing/router.go Outdated
Show resolved Hide resolved htlcswitch/pending_payment.go Outdated
Show resolved Hide resolved htlcswitch/switch.go Outdated
Show resolved Hide resolved routing/route.go Outdated
Show resolved Hide resolved routing/payment_state_machine.go Outdated
Show resolved Hide resolved routing/payment_state_machine.go Outdated
Show resolved Hide resolved routing/payment_state_machine.go Outdated
Show resolved Hide resolved routing/router.go Outdated

@halseth halseth force-pushed the halseth:reliable-payments-router-state-machine branch 7 times, most recently from 78a0c81 to 1687cdc Mar 19, 2019

@joostjager
Copy link
Collaborator

left a comment

I am, as always, worried about the persistence and the work and inflexibility it may cause us in the future. Tried to mainly analyze this part of the PR.

Why don't you merge the switch pr first btw, work bottom up? Or will we have with just this PR already good working resumes of payments?

Show resolved Hide resolved channeldb/payments.go Outdated
Show resolved Hide resolved channeldb/payments.go Outdated
Show resolved Hide resolved htlcswitch/pending_payment.go Outdated
Show resolved Hide resolved routing/router.go Outdated
Show resolved Hide resolved routing/router.go Outdated
Show resolved Hide resolved routing/payment_state_machine.go Outdated
binary.BigEndian.PutUint64(paymentIDBytes, paymentID)
// CompletePayment overwrites the OutgoingPayment stored in the DB for the
// corresponding payment hash with the completed one.
func (db *DB) CompletePayment(preimage lntypes.Preimage,

This comment has been minimized.

Copy link
@joostjager

joostjager Mar 20, 2019

Collaborator

Looks unused

Show resolved Hide resolved routing/router.go
Show resolved Hide resolved routing/router.go
Show resolved Hide resolved routing/payment_state_machine.go Outdated

@halseth halseth force-pushed the halseth:reliable-payments-router-state-machine branch 5 times, most recently from ce35b99 to 9d1bd42 Mar 20, 2019

@joostjager
Copy link
Collaborator

left a comment

Definitely nicer without that intermediate persistent state.

Main comments:

  • Code structure in router
  • Consolidation of payment related stores
Show resolved Hide resolved channeldb/payments.go Outdated
Show resolved Hide resolved routing/control_tower.go Outdated
Show resolved Hide resolved routing/payment_store.go Outdated
Show resolved Hide resolved routing/router.go Outdated
Show resolved Hide resolved routing/router.go Outdated
Show resolved Hide resolved channeldb/payments.go Outdated
@cfromknecht
Copy link
Collaborator

left a comment

nice refactor of the router/switch interactions, this should make the whole payment flow much tighter!

one question i have with new state introduced in the control tower, will we just treat all payments that are currently grounded as started but never attempted? an alternative would be to rename grounded payments as failed, and introduce a new state for initiated. not sure which is better atm

Show resolved Hide resolved routing/router.go Outdated
return nil
}

ctx.router.cfg.GetPaymentResult = func(paymentID uint64) (

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht Mar 22, 2019

Collaborator

many of these GetPaymentResult funcs seem similar, any way we can generate the closures w/ less code duplication?

Show resolved Hide resolved routing/router.go Outdated
Show resolved Hide resolved routing/router.go Outdated
Show resolved Hide resolved routing/route_test.go Outdated
Show resolved Hide resolved rpcserver.go
Show resolved Hide resolved routing/router_test.go Outdated
Show resolved Hide resolved routing/control_tower.go Outdated
Show resolved Hide resolved routing/control_tower.go Outdated
Show resolved Hide resolved htlcswitch/switch.go Outdated

@halseth halseth force-pushed the halseth:reliable-payments-router-state-machine branch 3 times, most recently from 7180772 to 7aae12a Mar 22, 2019

Show resolved Hide resolved channeldb/payments.go Outdated
Show resolved Hide resolved routing/router.go
Show resolved Hide resolved channeldb/control_tower.go Outdated

@halseth halseth force-pushed the halseth:reliable-payments-router-state-machine branch 2 times, most recently from fd20644 to ab0f77c Mar 26, 2019

@Roasbeef Roasbeef removed this from the 0.6 milestone Mar 26, 2019

halseth added some commits May 23, 2019

channeldb/payments+control_tower: split OutgoingPayments
This commit changes the format used to store payments within the
DB. Previously this was serialized as one continuous struct
OutgoingPayment, which also contained an Invoice struct we where only
using a few fields of. We now split it up into two simpler sub-structs
CreationInfo, AttemptInfo and PaymentPreimage.

We also want to associate the payments more closely with payment
statuses, so we move to this hierarchy:

There's one top-level bucket "sentPaymentsBucket" which contains a set
of sub-buckets indexed by a payment's payment hash. Each such sub-bucket
contains several fields:
paymentStatusKey -> the payment's status
paymentCreationInfoKey -> the payment's CreationInfo.
paymentAttemptInfoKey -> the payment's AttemptInfo.
paymentSettleInfoKey -> the payment's preimage (or zeroes for
non-settled payments)

The CreationInfo is information that is static during the whole payment
lifcycle. The attempt info is set each time a new payment attempt
(route+paymentID) is sent on the network. The preimage is information
only known when a payment succeeds.  It therefore makes sense to split
them.

We keep legacy serialization code for migration puproses.
channeldb/migration: add migration for new payment bucket structure
migrateOutgoingPayments moves the OutgoingPayments into a new bucket format
where they all reside in a top-level bucket indexed by the payment hash. In
this sub-bucket we store information relevant to this payment, such as the
payment status.

To avoid that the router resend payments that have the status InFlight (we
cannot resume these payments for pre-migration payments) we delete those
statuses, so only Completed payments remain in the new bucket structure.
channeldb/control_tower: add payment information during state changes
This commit gives a new responsibility to the control tower, letting it
populate the payment bucket structure as the payment goes through
its different stages.

The payment will transition states Grounded->InFlight->Success/Failed,
where the CreationInfo/AttemptInfo/Preimage must be set accordingly.

This will be the main driver for the router state machine.
channeldb/control_tower: remove non-strict option
Since we have performed a migration, the db should be in a consistent
state, and we can remove the non-strict option.
routing: extract payment flow into method on paymentLifecycle
This encapsulates all state needed to resume a payment from any point of
the payment flow, and that must be shared between the different stages
of the execution. This is done to prepare for breaking the send loop
into smaller parts, and being able to resume the payment from any point
from persistent state.
routing/router: resume payment state machine at startup
On startup the router will fetch the in-flight payments from the control
tower, and resume their execution.
routing/router: persist payment state machine
This commit makes the router use the ControlTower to drive the payment
life cycle state machine, to keep track of active payments across
restarts.  This lets the router resume payments on startup, such that
their final results can be handled and stored when ready.
routing/router_test: add TestRouterPaymentStateMachine
TestRouterPaymentStateMachine tests that the router interacts as
expected with the ControlTower during a payment lifecycle, such that it
payment attempts are not sent twice to the switch, and results are
handled after a restart.
channeldb/control_tower test: add TestPaymentControlDeleteNonInFlight
TestPaymentControlDeleteNonInFlight checks that calling DeletaPayments
only deletes payments from the database that are not in-flight.

@halseth halseth force-pushed the halseth:reliable-payments-router-state-machine branch from 9f7b1d9 to 7cb25a5 May 27, 2019

@Roasbeef Roasbeef merged commit 19fafd7 into lightningnetwork:master May 27, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.2%) to 60.744%
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.