Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

channeldb: Index payments by sequence number #4261

Merged
merged 6 commits into from
Jun 10, 2020

Conversation

carlaKC
Copy link
Collaborator

@carlaKC carlaKC commented May 7, 2020

First draft of migration for indexing payments, see #4164 for details.

payments-index-bucket
      <sequence nr>: <payment hash>
      <sequence nr>: <payment hash>

Since we have historic duplicate payments to the same hash, multiple entries in this index may point to the same payment. We take this into account when retrieving payments for a given hash by making sure that our sequence number matches what we're looking for.

This PR also includes unification of our pagination logic into a single paginater struct, since we use the same logic to invoices and payments. This can be split out into a separate PR, it was just useful to have all in one during dev.

@carlaKC carlaKC requested a review from cfromknecht May 11, 2020 08:10
@carlaKC carlaKC added the v0.11 label May 12, 2020
@cfromknecht cfromknecht added this to the 0.11.0 milestone May 14, 2020
@carlaKC carlaKC changed the title [WIP]: Index payments by sequence number channeldb: Index payments by sequence number May 15, 2020
@carlaKC carlaKC force-pushed the 4164-indexpayments branch 2 times, most recently from d9d94d0 to 5d2b5bf Compare May 17, 2020 14:55
@carlaKC carlaKC marked this pull request as ready for review May 18, 2020 08:13
@carlaKC carlaKC force-pushed the 4164-indexpayments branch 2 times, most recently from 8771811 to f6adab1 Compare May 18, 2020 12:16
Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments are about commit structure. I think that ideally it should be possible to run every commit on its own without leaving the db in an unusable state for example. Maybe a bit more squashing can accomplish that, or a slightly different split.

channeldb/payment_control.go Outdated Show resolved Hide resolved
channeldb/payment_control_test.go Show resolved Hide resolved
channeldb/payment_control_test.go Show resolved Hide resolved
channeldb/payments_test.go Show resolved Hide resolved
channeldb/migration14/migration_test.go Outdated Show resolved Hide resolved
// payment hash.
// payments-sequence-index-bucket
// |--sequence-number: <payment hash>
// |--sequence-number: <payment hash>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two identical keys?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, wasn't clear on the schema for these comments. Wrapped in <> to indicate that they're keyed by value, not a string.

channeldb/payment_control.go Show resolved Hide resolved
channeldb/payment_control_test.go Show resolved Hide resolved

// appendDuplicatePayment adds a duplicate payment to an existing payment. Note
// that this function requires a unique sequence number.
func appendDuplicatePayment(t *testing.T, db *DB, paymentHash lntypes.Hash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a bit of code in this PR to deal with duplicate payments. I am wondering for how long we are going to invest time in those? Maybe there is some kind of middle ground where we don't delete them completely, but only make them retrieval through a special endpoint. They would then also not need the indexing changes and all the associated test code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth dealing with legacy in a way that as as seamless to the user as possible, extra endpoints feels like another thing for people to load up into their mental model of how lnd works. If we want to sunset duplicates, I think we'd at least need to communicate that pretty widely, then perhaps have a release which deletes and csv-dumps those payments for the user. We could go down that road, but I think it's a long term plan, and people are having issues with their ListPyayments endpoint now, so I'd argue for tolerating them once more and then making a proper deprecation plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on extra endpoints. We'd need to add them in 0.11 and remove the payments from the regular query calls in 0.12.

// index offset we have been provided is beyond our last
// item. Since we are paginating backwards, we simply
// start at the last item.
if indexKey == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of test case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaviour is pre-existing for our invoices pagination, so I wanted to separate it out. The previous commit moves the logic across as is, and this one improves it. But I think a better way would be to add the failing test then fix it, going to switch the order around.

@carlaKC
Copy link
Collaborator Author

carlaKC commented May 27, 2020

I think that ideally it should be possible to run every commit on its own without leaving the db in an unusable state for example.

Squashed and re-ordered a bit. Only one that's questionable is the first two commits (add index/ delete index when we delete payments). I can squash them so that everything is done in one, but that makes for a pretty big commit.

I think the likelihood of us splitting these commits is pretty low (we wouldn't split cherry pick half way through a migration), but it really comes down to reviewer preference so happy to go either way (could even squash when we're closer to merge?)

@carlaKC carlaKC requested a review from joostjager May 27, 2020 17:43
Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice changes! no major comments on the indexing itself, just some minor thigns focused on how we can build on the changes introduced in #4285

func createPaymentIndexEntry(tx kvdb.RwTx, sequenceNumber []byte,
hash lntypes.Hash) error {

indexes, err := tx.CreateTopLevelBucket(paymentsIndexBucket)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the CreateTLB migration added in #4285 to avoid dynamic bucket creation?

channeldb/migtest/migtest.go Outdated Show resolved Hide resolved
channeldb/migration14/log.go Outdated Show resolved Hide resolved
channeldb/paginate.go Show resolved Hide resolved
channeldb/invoice_test.go Show resolved Hide resolved
Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested migration, list payments still works. Did you test this manually with duplicates? I don't know how to get them really, maybe roll back to an old lnd release.

For commit structure, I'd squash the first three commits into one. That way it is a self-contained change that moves to the new db structure. I don't mind the commit being bigger in this case. But not a blocker.

return lntypes.Hash{}, err
}

hash, err := lntypes.MakeHash(paymentHash[:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to quick check with @cfromknecht is how this payment index is going to work together with the upcoming AMP payments that don't have (a single) payment hash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, that is still tbd as to the best way to do that. more than likely we will start to index payments by payment address since that will be a uniform identifier that also works for MPP. when you start getting to recurring payments we would need to index by set_id in order to differentiate logical payments to the same invoice.

this could be implemented by adding a type byte to the index to signify which index the following 32-byte correspond to, but i don't know how much effort it would be for this PR. perhaps the best approach would be to deal with these details when we get there, and say that len(value) == 32 means payment hash. when we add the newer indexes, they could receive this type byte and be 33 bytes. i wouldn't want to block progress on this for some undecided details on AMP, but all in all i think we should be able to extend it as we need 👍

channeldb/payment_control_test.go Outdated Show resolved Hide resolved
channeldb/payments.go Outdated Show resolved Hide resolved
}

// At this point, we've exhausted the offset, so we'll
// begin collecting invoices found within the range.
resp.Invoices = append(resp.Invoices, invoice)
return 1, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it has a high risk of yagni

success: true,
failed: false,
success: false,
hasDuplicate: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In flight with a duplicate? Probably not going to worry about that

channeldb/paginate.go Show resolved Hide resolved
channeldb/payments_test.go Show resolved Hide resolved

// appendDuplicatePayment adds a duplicate payment to an existing payment. Note
// that this function requires a unique sequence number.
func appendDuplicatePayment(t *testing.T, db *DB, paymentHash lntypes.Hash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on extra endpoints. We'd need to add them in 0.11 and remove the payments from the regular query calls in 0.12.

channeldb/db.go Show resolved Hide resolved
for _, index := range indexList {
// Write indexes for each of our sequence numbers.
for _, duplicate := range index.sequenceNumbers {
err := putIndex(bucket, duplicate, index.paymentHash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth keeping an in-memory set here to detect duplicate sequence numbers? As a sanity check

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we need this in addition to the check in putIndex?

@carlaKC
Copy link
Collaborator Author

carlaKC commented Jun 2, 2020

Did you test this manually with duplicates?

I haven't yet, was thinking about manually inserting a duplicate into a local DB then running the migration, but that probably isn't the same as a real legacy db.

For commit structure, I'd squash the first three commits into one.

Squashed 👌

@carlaKC
Copy link
Collaborator Author

carlaKC commented Jun 4, 2020

@guggero has some duplicates on his node and managed a successful dry run:

2020-06-04 12:34:52.208 [WRN] LTND: open /root/.lnd/lnd.conf: no such file or directory
2020-06-04 12:34:52.209 [INF] LTND: Version: 0.10.99-beta commit=v0.5-beta-4510-gb969eb4420354cc13f9d434da93fa093b64529ac, build=production, logging=default
2020-06-04 12:34:52.209 [INF] LTND: Active chain: Bitcoin (network=mainnet)
2020-06-04 12:34:52.209 [INF] LTND: Opening the main database, this might take a few minutes...
2020-06-04 12:34:52.487 [INF] CHDB: Checking for schema update: latest_version=16, db_version=13
2020-06-04 12:34:52.487 [INF] CHDB: Performing database schema migration
2020-06-04 12:34:52.487 [INF] CHDB: Applying migration #14
2020-06-04 12:34:52.487 [INF] CHDB: Creating top-level bucket: "pay-addr-index" ...
2020-06-04 12:34:52.487 [INF] CHDB: Created top-level bucket: "pay-addr-index"
2020-06-04 12:34:52.487 [INF] CHDB: Applying migration #15
2020-06-04 12:34:52.487 [INF] CHDB: Creating top-level bucket: "payments-index-bucket" ...
2020-06-04 12:34:52.487 [INF] CHDB: Created top-level bucket: "payments-index-bucket"
2020-06-04 12:34:52.487 [INF] CHDB: Applying migration #16
2020-06-04 12:34:52.487 [INF] CHDB: Migrating payments to add sequence number index
2020-06-04 12:34:52.494 [INF] LTND: dry run migration successful, exiting
2020-06-04 12:34:52.494 [INF] LTND: Shutdown complete

Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly gtg here i think, main comment is that we can remove errNoIndexBucket because we can assume it exists due to the migration

channeldb/payment_control.go Show resolved Hide resolved

// fetchPaymentIndexEntry gets the payment hash for the sequence number provided
// from our payment indexes bucket.
func (p *PaymentControl) fetchPaymentIndexEntry(sequenceNumber uint64) (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this method only used for testing?

channeldb/payment_control.go Outdated Show resolved Hide resolved

indexes := tx.ReadWriteBucket(paymentsIndexBucket)
if indexes == nil {
return errNoIndexBucket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we assume this index always exists bc of the migration?

if err := kvdb.View(p.db, func(tx walletdb.ReadTx) error {
indexBucket := tx.ReadBucket(paymentsIndexBucket)
if indexBucket == nil {
return errNoIndexBucket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here wrt assuming the index exists. perhaps the error is not needed?

@carlaKC
Copy link
Collaborator Author

carlaKC commented Jun 9, 2020

remove errNoIndexBucket because we can assume it exists due to the migration

Indeed, removed this err check, don't need it anymore.

Also added a type enum for payments so that we're future proofed for AMP. I assume we don't need length since we'll always know what our own db types are, and I haven't bothered with splitting de/serialization out by type since we only have a single type (just read it out and santity check for now, future changes can rework to check type then read accordingly).

@carlaKC carlaKC requested a review from cfromknecht June 9, 2020 17:05
Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completed another pass, great that you were able to add the paymentIndexType so that we can extend this as we move to AMP! few last nits

channeldb/payment_control.go Outdated Show resolved Hide resolved

// paymentIndexTypeHash is a payment index type which indicates that we have
// created an index of payment sequence number to payment hash.
const paymentIndexTypeHash paymentIndexType = iota
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice addition 😎

we typically assign persisted enums to constants rather than using iota, otherwise reordering the definitions can break the mapping. not a blocker though, happy to change since i'll be extending this area in the near future

@@ -104,6 +104,14 @@ var (
paymentsIndexBucket = []byte("payments-index-bucket")
)

var (
errNoSequenceNumber = errors.New("sequence number not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the next commit it looks like it's possible for these errors to be returned via the public interface? if so we should consider exposing them as well and adding godocs

Update our current tests to include lookup of duplicate payments. We
do so in preparation for changing our lookup to be based on a new
payments index. We add an append duplicate function which will add a
duplicate payment with the minimum information required to successfully
read it from disk in tests.
Add an entry to a payments index bucket which maps sequence number
to payment hash when we initiate payments. This allows for more
efficient paginated queries. We create the top level bucket in its
own migration so that we do not need to create it on the fly.

When we retry payments and provide them with a new sequence number, we
delete the index for their existing payment so that we do not have an
index that points to a non-existent payment.

If we delete a payment, we also delete its index entry. This prevents
us from looking up entries from indexes to payments that do not exist.
In our current invoice pagination logic, we would not return any
invoices if our offset index was more than 1 off our last index and we
were paginating backwards. This commit adds a test case for this
behaviour before fixing it in the next commit.
We now use the same method of pagination for invoices and payments.
Rather than duplicate logic across calls, we add a pagnator struct
which can have query specific logic plugged into it. This commit also
addresses an existing issue where a reverse query for invoices with an
offset larger than our last offset would not return any invoices. We
update this behaviour to act more like c.Seek and just start from the
last entry. This behaviour change is covered by a unit test that
previously checked for the lack of invoices.
With our new index of sequence number to index, it is possible for
more than one sequence number to point to the same hash because legacy
lnd allowed duplicate payments under the same hash. We now store these
payments in a nested bucket within the payments database. To allow
lookup of the correct payment from an index, we require matching of the
payment hash and sequence number.
Use the new paginatior strcut for payments. Add some tests which will
specifically test cases on and around the missing index we force in our
test to ensure that we properly handle this case. We also add a sanity
check in the test that checks that we can query when we have no
payments.
Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🤠

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants