Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watchtower: reduce AckedUpdate storage footprint #7055

Merged
merged 12 commits into from Jan 17, 2023

Conversation

ellemouton
Copy link
Collaborator

@ellemouton ellemouton commented Oct 18, 2022

This PR contains a few migrations of the watchtower client DB.

Main Achievements:

  1. It greatly reduces the storage requirements of the session AckedUpdates through the use of a new RangeIndex model.
  2. The various indexes added in this PR nicely sets us up for a follow up PR in which we we will be able to determine which sessions we can delete (and tell the tower server that it can delete) based on if all channels that the session has updates for have been deleted. (Ie, this is a step towards [Question] Watchtower database cleanup? #6259, [feature]: cleanup of wtclient.db / removal of watchtower #7035)

Change Log:

1. Migrate ChanDetails Bucket

A ChannelDetails bucket is added and each registered channel gets its own sub-bucket in this bucket. The existing channel summaries are moved here under the cChannelSummary key and the old top-level cChanSummaryBkt is deleted. This is done so that we can store more details about a channel without needing to add new info to the encoded ChannelSummary.

2. Add a new Channel-ID index

A new channel-ID index is added. The index holds a mapping from db-assigned-ID (8 bytes) to real channel-ID (32 bytes). This mapping will allow us to preserve disk space in future when persisting references to channels.

3. Migrate AckedUpdates to the RangeIndex model.

This is the main meat of the PR.

Currently, for each session, we store its acked updates as follows:

seqNum (2 bytes) -> encoded BackupID{channelID (32 bytes) + commitHeight (8 bytes)}

The default number of backups per session is 1024 and so this means that regardless of the number of channels the session is actually covering, it takes 1024 * (2 + 32 + 8) bytes per session. Depending on how busy your node is, you could have many thousands of sessions (this user has 60k which would mean 2.5GB!).

The only reason that we want to keep track of AckedUpdates imo is if we need to determine which updates of a channel have/have not been backed up or if we want to replay backups to a different tower. So I argue that knowing the sequence number associated with each back up is not necessary.

Given this assumption, we can store the AckedUpdates using a RangeIndex (per channel). We will also make use of the new channel index described above (in point 2 of the change log) so that we only need 8 bytes per channel.
The RangeIndex is just a map from start height to end height of a range. This means that in the best case where all commit heights are present, we only need to store the start and end height. And if there are holes, we store separate ranges and that will still be more efficient since each range only requires 16 bytes (8 byte key for the start value and 8 bytes for the end value of the range). Our session can thus store its AckedUpdates as follows:

"acked-updates" => channel-ID (8 byte key sub-bucket) => start height (8 byte key) -> end-height (8 byte value)

Here is a quick example showing the the change of an index that starts empty and then gets the numbers 1, 2, 4, 3:

add 1:
    1 -> 1

add 2:
    1 -> 2

add 4:
    1 -> 2
    4 -> 4

add 3:
    1 -> 4

This means that every session only refers to a channel once and then just stores the index of ranges for that channel that is covered by the session.

Any additions to the index will at most require one value update & one key deletion.

The PR contains the necessary logic to make sure that the calculation of how to update an index given a new commit height is as efficient as possible.

Note:

I originally had the changes in #7059 here too since the bug in wtclientrpc makes it difficult to use the rpc to check the correctness of this PR. Separated it into its own PR now just to make the diff of this PR a bit smaller. But suggest we get that PR in first even though the two are not dependant on each other.

Fixes #6886

Copy link
Collaborator

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool PR! ✨ Did just a very light pass to understand the intuition behind the changes. Concept ACK! 🌮

Will provide a more detailed review after @Roasbeef has taken another look.

watchtower/wtdb/client_db.go Show resolved Hide resolved
watchtower/wtdb/client_db.go Outdated Show resolved Hide resolved
watchtower/wtdb/range_index.go Outdated Show resolved Hide resolved
watchtower/wtdb/client_db.go Outdated Show resolved Hide resolved
watchtower/wtdb/client_db.go Outdated Show resolved Hide resolved

// Get the range index for the session-channel pair.
chanID := byteOrder.Uint64(dbChanID)
index, err := c.getRangeIndex(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could also use the in-memory cache exclusively if we know that all range indexes for all channels are already cached. Maybe not as critical if NumAckedUpdates isn't called too frequently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the reason I didnt use the in-mem one here is cause this iterates through all sessions (even ones we will never use again) and would result in them then being stored in the cache even if they are not ever needed again

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass through the diff, need to revisit some of the existing client code to build a better mental model to review these changs.

At a glance, a few things pop out at me:

  • will there ever actually be a gap in the commitment height that a tower has ACK'd?
    • the new range index design seems to assume that a tower either deletes some random states, or that a client can somehow skip uploading a set of states (causing a gap in the set of ACK'd heights)
  • for the mapping of the 32-byte cid, why can't we just use the existing 8-byte scid instead?
  • We'll need to properly test these persistent changes to ensure they don't lead to a performance regression in practice. In particular I'm curious to see how this fares in a remote db setting, given there seems to be a few additional layers of bucket nesting (which can mean additional round trips w/ the current kv mapping)

watchtower/wtdb/client_db.go Show resolved Hide resolved
watchtower/wtdb/client_db.go Show resolved Hide resolved
watchtower/wtdb/client_db.go Show resolved Hide resolved
watchtower/wtdb/range_index.go Outdated Show resolved Hide resolved
watchtower/wtdb/range_index.go Outdated Show resolved Hide resolved
watchtower/wtdb/range_index.go Show resolved Hide resolved
"sync"
)

// RangeIndex can be used go keep track of which numbers have been added to a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing the motivation a bit here, in which case would there be a series of disparate gaps in what we've uploaded to the tower?

In other words, afaict, even if we're down for some time, we won't have gaps in prior heights uploaded unless the tower actually deleted information.

In the case where we switching to a tower and want to know the extent that we've re-uploaded our prior states, we'd still only need to track a single height (the final height that we've uploaded).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which case would there be a series of disparate gaps in what we've uploaded to the tower?

If we have more than one tower, we might switch to a session with a different tower for a bit if the first one goes down. Also on startup we just pick a random session to start sending updates on. In the idea case, yes there would not be gaps but we need to account for there being gaps. In future we may even want a more "Round Robin" type thing were we cycle through sessions more often & dont just use one till exhaustion before moving on to the next one.

In other words, afaict, even if we're down for some time, we won't have gaps in prior heights uploaded unless the tower actually deleted information.

There is nothing that ensures that we send the commitment update heights in a monotonically increasing way on a single session. On startup we just pick a random non-exhausted session to continue on.

In the case where we switching to a tower and want to know the extent that we've re-uploaded our prior states, we'd still only need to track a single height (the final height that we've uploaded).

Dont think I agree - if we have some of the commitments for a channel backed up to one tower and some to another & then we want remove the one tower, we just want to know the heights we uploaded to that session so that we know which we need to re-upload to the new tower

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing that ensures that we send the commitment update heights in a monotonically increasing way on a single session.

Can you point to the code that makes this not monotonically increasing? I thought the queues involved in sending to the tower were FIFO?

Dont think I agree - if we have some of the commitments for a channel backed up to one tower and some to another & then we want remove the one tower, we just want to know the heights we uploaded to that session so that we know which we need to re-upload to the new tower

I couldn't find where this logic is - can you point me to it? Or is this something for the future?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the queues involved in sending to the tower were FIFO?

they are FIFO. But what can happen is that: we send backups for heights 1, 2, 3 on session A. Then we restart but this time choose session B as our active session (cause maybe the tower for session A is temporarily down) so we send heights 4, 5, 6 on session B. Now we restart again and choose session A again etc.
See here where we load all our candidate sessions (map!) that we will iterate through to find one that works

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find where this logic is - can you point me to it? Or is this something for the future?

I think my comment above answers this. Let me know if it does not

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're saying that there can be gaps on session A? This makes sense. It's been a while since I took real analysis, but what tripped me up is that you said the heights cannot be monotonically increasing. I thought, but can't find a good internet answer, that a monotonically increasing sequence could have gaps - the only constraint here being that the sequence increases. I am curious to know the answer but it doesn't matter for this patch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see what you mean. I was mixing up the definition of "monotonically".
So:

  • session seq number will be monotonically increasing without gaps.
  • commitment height of a channel within a session is monotonically increasing and could possible have gaps.

watchtower/wtdb/range_index_test.go Outdated Show resolved Hide resolved
lnrpc/wtclientrpc/wtclient.go Show resolved Hide resolved
@bhandras
Copy link
Collaborator

bhandras commented Oct 28, 2022

  • We'll need to properly test these persistent changes to ensure they don't lead to a performance regression in practice. In particular I'm curious to see how this fares in a remote db setting, given there seems to be a few additional layers of bucket nesting (which can mean additional round trips w/ the current kv mapping)

I think it's quite easy to work around those since we can cache a range index and then there's no need for nesting down anymore after that if done properly, see: #7055 (comment)

@C-Otto
Copy link
Contributor

C-Otto commented Nov 20, 2022

I had a look at the migrations and the general approach, and I think this is a great idea. Even though some ranges might end up being tiny (in weird multi-WT scenarios), this shouldn't be an issue when compared to the status quo. Instead of iterating through the ranges a binary search might be better, but I don't think this is relevant for a one-off migration with just a couple of ranges (i.e. not millions).

@C-Otto
Copy link
Contributor

C-Otto commented Nov 28, 2022

The main migration takes quite a bit of RAM. I don't think this is a huge issue, but it might be problematic for some larger nodes (or nodes without a lot of RAM/swap).

mem-1669639641

@C-Otto
Copy link
Contributor

C-Otto commented Nov 28, 2022

Aside from the RAM spike, the migration worked out just fine and my node did a few forwards. I restarted again to compact the DB:

13:57:44.847 [INF] CHDB: DB compaction of /home/bitcoin/.lnd/data/graph/mainnet/wtclient.db successful, 5958950912 -> 1115611136 bytes (gain=5.34x)

Before the migration the DB had a size of 4650852352 bytes, so it's more like 4.17x. Nice, thanks a lot!

PS: With gzip I was able to bring this down to 65 MByte, so maybe there's more potential.

@ellemouton
Copy link
Collaborator Author

ellemouton commented Nov 28, 2022

PS: With gzip I was able to bring this down to 65 MByte, so maybe there's more potential.

ok cool - so with gzip, the gain is about 72x? awesome! Thanks for giving this a spin @C-Otto ! should get even more gains once the DeleteSession PR is in :)

Also - my original rough estimate for the db gains a node would get if they only had 1 channel was 1792x. So am I correct in saying that the node you migrated has over its lifetime roughly always had about 24 active channels at any given time (very rough estimate)?

@C-Otto
Copy link
Contributor

C-Otto commented Nov 28, 2022

For each session I checked, I see a bunch of channels - more than 30 each. But I'd say 24 active channels per sessions sounds very realistic to me!

@ellemouton ellemouton force-pushed the wtclientMigrations branch 2 times, most recently from 1faac9f to fd37c72 Compare December 5, 2022 15:23
@ellemouton
Copy link
Collaborator Author

ellemouton commented Dec 23, 2022

Report

I wrote a script here that allows you to quickly create a large wtclient.db file. It attempts to always have about 20 active channels at any given point.

Using this script, I created a wtclient.db file of 3GB. It had 25756 sessions with updates across a total of 125 channels (with around 20 being active at any given time)

End result:

After migrating this DB and the compacting it, I ended up with a db of size 185MB (gain of 16x).

NOTE: I found that first migrating the DB and then compacting it was significantly faster than first compacting and the migrating and compacting again. The only down side here is that if you migrate first, your db size will grow slightly (in my example, it grew to 3.2GB).

Migrating using multiple Txs vs 1 big TX

The migration has been split up into multiple transactions so that the migration is less RAM intensive.
Using my 3GB example DB, I preformed the migration using both methods. With the all-in-one-tx method, the average RAM usage was around 185MB and with the multiple-txs method (which migrates 5000 sessions per tx), the average RAM usage was around 12800 KB.

Note that when @C-Otto tried the migration (all-in-one) tx with his ~5GB db, the average RAM usabel was around 3GB but it is worth nothing that the encoding of certain items has changed since then so should be less now. @C-Otto, if you still have a copy of your previous DB around, it would be awesome to hear how this new multi-tx approach performs on it if you are willing to try it on a copy of that copy :)

cc @Crypt-iQ & @bhandras

@C-Otto
Copy link
Contributor

C-Otto commented Dec 23, 2022

Thanks, @ellemouton! I don't think I have the old DB anymore, but we'll get plenty of reports once this is released :)

@lightninglabs-deploy
Copy link

@bhandras: review reminder
@Roasbeef: review reminder
@Crypt-iQ: review reminder
@ellemouton, remember to re-request review from reviewers when ready

Copy link
Collaborator

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good! 🥇 Great work on the range index code too 🚀 My remaining question is whether to keep the new DB channel ID vs just using channel IDs?

watchtower/wtdb/range_index.go Show resolved Hide resolved
watchtower/wtdb/range_index.go Show resolved Hide resolved
watchtower/wtdb/range_index.go Show resolved Hide resolved
watchtower/wtdb/range_index_test.go Show resolved Hide resolved
watchtower/wtdb/migration4/client_db.go Outdated Show resolved Hide resolved
watchtower/wtdb/migration4/client_db.go Outdated Show resolved Hide resolved
watchtower/wtdb/migration4/client_db.go Outdated Show resolved Hide resolved
watchtower/wtdb/migration4/client_db.go Outdated Show resolved Hide resolved
watchtower/wtdb/client_db.go Show resolved Hide resolved
@ellemouton ellemouton force-pushed the wtclientMigrations branch 2 times, most recently from ce108a6 to 2aefe41 Compare January 5, 2023 14:07
Copy link
Collaborator

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 Excellent work @ellemouton! ✨

watchtower/wtdb/migration4/client_db.go Outdated Show resolved Hide resolved
}

// Start the iteration and exit on condition.
for k := k; k != nil; k, _ = c.Next() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized the kvdb cursor should have a Current() member so it's easier to write these type of loops.

watchtower/wtdb/migration4/codec.go Show resolved Hide resolved
watchtower/wtdb/migration4/codec.go Show resolved Hide resolved
watchtower/wtdb/client_db.go Outdated Show resolved Hide resolved
watchtower/wtdb/client_db.go Show resolved Hide resolved
@ellemouton ellemouton force-pushed the wtclientMigrations branch 2 times, most recently from 1aeeb50 to 7f40398 Compare January 6, 2023 08:25
// For each of these, create a new sub-bucket with this key in
// cChanDetailsBkt. In this sub-bucket, add the cChannelSummary key with
// the encoded ClientChanSummary as the value.
err = chanSummaryBkt.ForEach(func(chanID, summary []byte) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a RAM-intensive migration since the summaries are just small pkscripts, right? If it is intensive, could be worth splitting this ForEach into multiple transactions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a test that does this migration with 500 channels & found that the path does not even show up in the heap profile graph. Same for migration 3.

watchtower/wtdb/migration3/client_db.go Show resolved Hide resolved
watchtower/wtdb/range_index_test.go Show resolved Hide resolved
watchtower/wtdb/client_db.go Outdated Show resolved Hide resolved
watchtower/wtdb/version.go Outdated Show resolved Hide resolved
continue
}

err = kvdb.Update(db.bdb(), func(tx kvdb.RwTx) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 the migrations should've been in separate tx's from the beginning

watchtower/wtdb/migration4/client_db.go Outdated Show resolved Hide resolved
watchtower/wtdb/client_db.go Show resolved Hide resolved
"sync"
)

// RangeIndex can be used go keep track of which numbers have been added to a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're saying that there can be gaps on session A? This makes sense. It's been a while since I took real analysis, but what tripped me up is that you said the heights cannot be monotonically increasing. I thought, but can't find a good internet answer, that a monotonically increasing sequence could have gaps - the only constraint here being that the sequence increases. I am curious to know the answer but it doesn't matter for this patch.

Small refactor just to make the upcoming commit easier to parse. In
this commit, we make better use of the getChanSummary helper function.
In this commit a migration of the tower client db is done. The migration
creates a new top-level cChanDetailsBkt bucket and for each channel
found in the old cChanSummaryBkt bucket, it creates a new sub-bucket. In
the subbucket, the ClientChanSummary is then stored under the
cChannelSummary key. The reason for this migration is that it will be
useful in future when we want to store more easily accessible data under
a specific client ID.
In this commit, a new channel-ID index is added to the tower client db
with the help of a migration. This index holds a mapping from a
db-assigned-ID (a uint64 encoded using BigSize encoding) to real channel
ID (32 bytes). This mapping will help us save space in future when
persisting references to channels.
In this commit, a new concept called a RangeIndex is introduced. It
provides an efficient way to keep track of numbers added to a set by
keeping track of various ranges instead of individual numbers.

Notably, it also provides a way to map the contents & diffs applied to
the in memory RangeIndex (which uses a sorted array structure) to a
persisted KV structure.
Refactor the putClientSessionBody to take in a session sub-bucket rather
than the top-level session bucket. This is mainly to make an upcoming
commit diff easier to parse.
In this commit, an RangeIndex set is added to the ClientDB along with
getter methods that can be used to populate this in-memory set from the
DB.
In preparation for an upcoming commit where some helper functions will
need access to the ClientDB's ackedRangeIndex member, this commit
converts those helper functions into methods on the ClientDB struct.
Add a shouldFail boolean parameter to the migtest ApplyMigrationWithDB
in order to make it easier to test migration failures.
In this commit, we add the ability to add a wtdb version migration that
does not get given a transaction but rather a whole db object. This will
be useful for migrations that are best done in multiple transaction in
order to use less RAM.
In this commit, the code for migration 4 is added. This migration takes
all the existing session acked updates and migrates them to be stored in
the RangeIndex form instead. Note that this migration is not activated
in this commit. This is done in a follow up commit in order to keep this
one smaller.
In this commit, a migration is done that takes all the AckedUpdates of
all sessions and stores them in the RangeIndex pattern instead and
deletes the session's old AckedUpdates bucket. All the logic in the code
is also updates in order to write and read from this new structure.
@ellemouton
Copy link
Collaborator Author

Thanks so much for the reviews @Crypt-iQ & @bhandras !!! 🚀

@Crypt-iQ - I fixed the nits you found and left comments re your question about if the first 2 migrations are intensive.

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐒

@guggero guggero merged commit 0444792 into lightningnetwork:master Jan 17, 2023
@ellemouton ellemouton deleted the wtclientMigrations branch January 17, 2023 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce storage footprint of acked updates of watchtower client
7 participants