feat: Avoid loading all notes in memory #2183

douglaz · 2023-04-10T19:04:49Z

Stream data from DB in descending denomination order then only uses the required notes. Also calculates and uses a summary of the notes when some metrics like total amount are required.

Although this started as solution to #82 and may improve the speed of select_notes, the focus is mostly on reducing memory usage.

codecov · 2023-04-10T19:23:31Z

Codecov Report

Patch coverage: 84.26% and project coverage change: +0.52 🎉

Comparison is base (3c15cba) 59.22% compared to head (8bdf528) 59.75%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2183      +/-   ##
==========================================
+ Coverage   59.22%   59.75%   +0.52%     
==========================================
  Files         152      153       +1     
  Lines       31759    32213     +454     
==========================================
+ Hits        18810    19248     +438     
- Misses      12949    12965      +16

Impacted Files	Coverage Δ
fedimint-cli/src/lib.rs	`5.65% <0.00%> (+0.04%)`	⬆️
gateway/ln-gateway/src/actor.rs	`46.19% <0.00%> (+1.35%)`	⬆️
modules/fedimint-mint-client/src/lib.rs	`0.51% <0.00%> (-0.02%)`	⬇️
fedimint-core/src/db/mod.rs	`84.97% <55.20%> (-2.63%)`	⬇️
fedimint-rocksdb/src/lib.rs	`81.46% <90.96%> (+9.96%)`	⬆️
fedimint-client-legacy/src/mint/mod.rs	`88.07% <93.13%> (+2.06%)`	⬆️
fedimint-client-legacy/src/lib.rs	`78.71% <100.00%> (+0.56%)`	⬆️
fedimint-core/src/db/mem_impl.rs	`90.40% <100.00%> (+1.01%)`	⬆️
fedimint-core/src/db/notifications.rs	`95.39% <100.00%> (+0.25%)`	⬆️
fedimint-core/src/tiered.rs	`91.66% <100.00%> (+0.30%)`	⬆️
... and 2 more

... and 22 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

fedimint-client-legacy/src/mint/mod.rs

dpc · 2023-04-10T19:37:59Z

fedimint-client-legacy/src/mint/mod.rs

+        let (big_note_amount, big_note, checkpoint) =
+            last_big_note_checkpoint.expect("to have at least one big note");
+        // we can't make exact change, remove all small notes after the big note
+        selected.truncate(checkpoint);


I'm confused what's going here. Either there's something wrong here, or this function might need a bit more comments to explain things for a bit slower (yet still valuable!) developers like myself. :D

Take a look at the new comments

dpc · 2023-04-10T19:47:25Z

fedimint-client-legacy/src/mint/mod.rs

+
+    #[test_log::test(tokio::test)]
+    async fn select_notes_returns_exact_amount_with_minimum_notes() {
+        let f = || {


.clone a Vec<_> instead of lambda?

It's fine as is

dpc · 2023-04-10T19:49:25Z

fedimint-client-legacy/src/mint/mod.rs

+                .into_iter()
+                .flat_map(|(amount, number)| vec![(amount, 0usize); number])
+                .sorted()
+                .rev(),


.rev confuses me here. Said it's sorted... I didn't expect it to be in reversed order.

dpc · 2023-04-10T19:51:25Z

fedimint-client-legacy/src/mint/mod.rs

+        futures::stream::iter(
+            notes
+                .into_iter()
+                .flat_map(|(amount, number)| vec![(amount, 0usize); number])


How ... what ... that 0usize. I don't get it. Why do we need a constant 0 here? What is that usize in futures::Stream<Item = (Amount, usize)>?

You can think the zero is something like a SpendableNote, but just a mock for test purposes. But note this code already exists, I've just changed to a stream and moved it to perhaps a better place: https://github.com/fedimint/fedimint/blob/master/fedimint-core/src/tiered_multi.rs#L447

Could it be () if it doesn't mean anything? Or it has to be 0usize? 🗡️

I still don't get it, but whatever - not very important.

Comment would be helpful, I am confused too

fedimint-core/src/db/mod.rs

dpc · 2023-04-10T19:58:24Z

fedimint-core/src/db/mod.rs

+        let stream = isolated
+            .raw_find_by_prefix_sorted_descending(key_prefix)
+            .await?
+            .collect::<Vec<_>>()


This seems to collect into a Vec, which seems undesirable. I see that raw_find_by_prefix was doing it already, but isn't this ruining the benefits of streaming? We are still going to load all the matching notes at once into memory, just fake later that it was streaming ...

Nice catch, this was an oversight.

This is trickier than it seems. The borrow checker isn't happy if you remove the collect. I know it is feasible because if I inline the code of IsolatedDatabaseTransaction then it works: #2188

@m1sterc001guy any ideas here?

This is where I got stuck too :) Unless @dpc has ideas I think its fine to just inline the isolated code here

Will be addressed on #2202

In fact it's blocked by that PR because raw_find_by_prefix_sorted_descending will be implemented in the same way

This requires a rebase now I assume?

dpc · 2023-04-10T20:00:50Z

fedimint-core/src/db/mod.rs

+            .expect("Error doing prefix search in database")
+            .map(move |(key_bytes, value_bytes)| {
+                let key = KP::Record::from_bytes(&key_bytes, &decoders)
+                    .expect("Unrecoverable error reading DatabaseKey");


Maybe let's print at least the key (in case the value is somehow sensitive) to help debugging if this ever happens? We had AbbreviateHexBytes to help with that.

Added context

fedimint-core/src/tiered_multi.rs

fedimint-rocksdb/src/lib.rs

dpc · 2023-04-10T20:10:10Z

fedimint-rocksdb/src/lib.rs

+    let mut next_prefix = prefix.to_vec();
+    let mut increased = false;
+    for i in (0..next_prefix.len()).rev() {
+        if next_prefix[i] < 0xff {


if next_prefix[i] == 0xff seems easier to grasp.

for i in (0..next_prefix.len()).rev() { let next_byte = next_prefix[i].wrapping_add(1); next_prefix[i] = next_byte; if next_byte != 0 { increased = true; break; } }

or something like that?

That increased is kind of meh name. We always going to increase that number ... kind of ... . Maybe non_zero_digit? Especially with my version it seems to be a bit more clear.

Take a look now

fedimint-rocksdb/src/lib.rs

dpc · 2023-04-10T20:11:34Z

fedimint-rocksdb/src/lib.rs

+        }
+    }
+    if !increased {
+        bail!("Prefix is already the max one: {prefix:?}")


I don't think we should panic here. Seems like handling 0xff... should be taken care of. Return an Option maybe?

It isn't panic, it's an error. In fact it could be a panic because it's a bug (an invalid prefix has been created).

But if we return an option the caller should do the same, IMHO.

Huh, why if 0xff an invalid prefix? We might declare it as such because it's convenient, but it's not obviously true to me.

Correct me if I'm wrong, but in that case we might want to return the empty prefix vec![] so that we start iterating from the absolutely last element when iterating in reverse order. Naturally the 0xff prefix should come first then.

fedimint-rocksdb/src/lib.rs

dpc · 2023-04-10T20:24:16Z

fedimint-sqlite/src/lib.rs

+    ) -> Result<PrefixStream<'_>> {
+        let mut str_prefix = "".to_string();
+        for prefix in key_prefix {
+            str_prefix = format!("{str_prefix}{prefix:02X?}");


This seems like a meh way to format this prefix allocating new string for every byte.

format!("{}%", to_hex(key_prefix));

kind of thing should work instead?

Yeah, it seems inefficient

Replaced by hex::encode

dpc · 2023-04-10T20:25:36Z

fedimint-sqlite/src/lib.rs

+        let results = self.tx.fetch_all(query_prepared).await;
+
+        if results.is_err() {
+            warn!("sqlite find_by_prefix failed to retrieve key range. Returning empty iterator");


Is this really an error? I don't think queries that return nothing should unconditionally lead to warning messages in logging.

Oh, I see that it was like that in the previous code...

Oh and that formatting as well... :D ...

It's not mandatory, but please consider making it better (here and in existing version) while at it. :)

As mentioned above, we should probably also make raw_find_by_prefix return a Result so that we can just return the error here.

I've just added a FIXME, this looks suitable to be improved in a future PR

You already made the return type a result by now it seems, could we return the error here now?

dpc · 2023-04-10T20:30:07Z

fedimint-sqlite/src/lib.rs

+            return Ok(Box::pin(stream::iter(Vec::new())));
+        }
+
+        let rows = results.unwrap().into_iter().map(|row| {


This could all be

results.unwrap_or_else(|| vec![]).map(|row| {

and that if results.is_err() could be avoided altogether...

Is sqlite really returning error on empty result? That seems ... not right ... how is the caller supposed to figure out if there were no matches, or maybe someone unplugged the HDD?

Don't ask me, I've just copied the code 😅

If I remember correctly, it was throwing an error :) There might be a better API to use as opposed to fetch_all but at the time this was the easiest to get it to work.

I'll take a closer look at sqlite, perhaps we can improve that

I've just added a FIXME, this looks suitable to be improved in a future PR

dpc

I had plenty of comments (one of them being the actual streaming of notes not working like streaming and fetching all notes anyway), but putting them aside, on the higher level...

What are we optimize for exactly here, and should we get some benchmarks?

I assume that if we are going to stream the notes, then there's going to be some extra overhead of "give me more". When fetching all notes at once, I'd assume sqlite does everything once, returns it and that's about it. With an iterator (depending on how it is actually implement it) it could be: "give me one; here it goes; give me another one; here it goes; and another one please; there you go...". Each back and forth by necessity must have some overhead. Maybe negligible, maybe amortized, but still some.

With fetching all, we obviously waste (briefly, but still) some memory (allocation), and we the db needs to overfetch some data.

But with iterator we might actually be doing many more IO operations scattered over longer time. Total load on the system might be higher, especially if typically we need to fetch .. donno... 75% of all notes anyway (given that big notes are ... well... big, so there need to be fewer of them to compose a certain amount, one would expect).

If you have 1, 2, 4, 8 notes, you have 15 msats total, but if you make payments in 4-7 range you usually need to fetch all but one note anyway. So I expect, that even while having a decent balance, and making smallish payments, still a good chunk of notes will need to be fetched?

So I think it's fair to ask - is it worth it? The number of notes is not going to be sky high. I suspect 50-150 range, because:

pub const DEFAULT_MAX_NOTES_PER_DENOMINATION: u16 = 3;

and we power-of-2 tiers by default, so we can't be going all that far w.r.t number of tier.

dpc · 2023-04-10T20:50:17Z

Oooohhh... I was confused. So the algorithm is going from largest to smallest ...

But doesn't have the opposite problem? If you need to make a odd-value payment ... you're going to need to fetch almost all notes anyway, no? Because you need that 1-value note, that will come last, no matter what you do...

And then 3 out or 4 times you need all but 2 smallest tiers...

These exponential odds seem to conspire against us!

dpc · 2023-04-10T20:52:35Z

It seems to me that instead of streaming notes, if performance is the goal, it would be best to just put an in-memory write-through cache in front of that notes in database thing, and always have it memory after initial load. 🤷

douglaz · 2023-04-10T23:44:02Z

I had plenty of comments (one of them being the actual streaming of notes not working like streaming and fetching all notes anyway), but putting them aside, on the higher level...

What are we optimize for exactly here, and should we get some benchmarks?

Average/max memory usage and average I/O operations. Benchmarks would be nice, but I guess tricky to get?

I assume that if we are going to stream the notes, then there's going to be some extra overhead of "give me more".

This may or may not be not true. But even if it's true, note that the new code won't add more overhead for a rocksdb database. It will just read less of the existing iterator.

When fetching all notes at once, I'd assume sqlite does everything once, returns it and that's about it. With an iterator (depending on how it is actually implement it) it could be: "give me one; here it goes; give me another one; here it goes; and another one please; there you go...". Each back and forth by necessity must have some overhead. Maybe negligible, maybe amortized, but still some.

This is a good point. Some notes:

Both the new code and the old code will sort the data on sqlite (But you could argue there is a possible optimization where no sorting could be done)
Both the old code and the new code won't really stream on sqlite (I didn't change the fetchall)
Given the point above you can see I treated sqlite as a second citizen database. The whole idea was focused on rocksdb-like databases (where data is naturally sorted by key).

So this implementation doesn't really make anything better for databases like sqlite, but it also doesn't make it worse performance-wise.

With fetching all, we obviously waste (briefly, but still) some memory (allocation), and we the db needs to overfetch some data.

But with iterator we might actually be doing many more IO operations scattered over longer time. Total load on the system might be higher, especially if typically we need to fetch .. donno... 75% of all notes anyway (given that big notes are ... well... big, so there need to be fewer of them to compose a certain amount, one would expect).

Look at the implementation of fetch_all, it's iterators/streams all way down. So it won't require more IO operations.

If you have 1, 2, 4, 8 notes, you have 15 msats total, but if you make payments in 4-7 range you usually need to fetch all but one note anyway. So I expect, that even while having a decent balance, and making smallish payments, still a good chunk of notes will need to be fetched?

So I think it's fair to ask - is it worth it? The number of notes is not going to be sky high. I suspect 50-150 range, because:
pub const DEFAULT_MAX_NOTES_PER_DENOMINATION: u16 = 3;
and we power-of-2 tiers by default, so we can't be going all that far w.r.t number of tier.

If the number of notes is and will always be small then it's fair to say: don't bother.

But is this really the case? So there is not such thing as a "big wallet"?

douglaz · 2023-04-10T23:49:37Z

It seems to me that instead of streaming notes, if performance is the goal, it would be best to just put an in-memory write-through cache in front of that notes in database thing, and always have it memory after initial load. shrug

Although the new code have the same or better performance (as far as it can be determined by simple inspection), I can't say it's the goal. See the comment above. But I agree, putting everything in memory will probably be faster if you don't bother about memory consumption.

douglaz · 2023-04-11T17:16:37Z

@douglaz Currently we use a greedy note selection algorithm, but in the future we may want to choose notes that have a large enough anonymity set. How would this approach change?

Just to be clear, this PR opens up possibilities, it doesn't restrict you in any way. More specifically:

It formalizes that notes can be streamed up from DB sorted by denomination amount. This was already the case but it was actually broken due to small implementation details.
It formalizes a TieredSummary when you just need to know how many notes of each denomination you have. This structure was mostly there (without this fancy name), but now the code calculates it without loading the full notes on memory. A further optimization would be, as suggested by @elsirion elsewhere, to skip or delay decoding the full SpendableNote on this and other cases.
Uses 1) to implement the greedy selection algorithm by descending denomination amount, reading only a subset of the notes in many cases (reducing IO) and keeping the minimal number of notes on memory (reducing memory usage)

Now if you want to implement a algorithm focused on common notes you can use the idea of 1) and 3) but perhaps reading in ascending denomination order. Then you have at least two options:
A) If you have an up to date TieredSummary, you iterate over the stream keeping exactly what you need, stopping as soon as you are done
B) Without a TieredSummary, the accumulator keeps all of read lower denomination notes, stopping as soon as you are done

And of course you can do:
C) Same as B), but always reading until the end, without stopping earlier

Note that the current code on master implements C focusing on high denomination notes. This PR still keeps that possible but, besides implementing 3), it also makes it easier to implement A) and B)

(Other than all that, if you have a TieredSummary on hand, there are other possibilities involving more complex DB queries, but I won't detail it now)

jkitman · 2023-04-11T17:26:41Z

@douglaz Thanks for the detailed explanation, glad we have that case covered.

fedimint-client-legacy/src/mint/mod.rs

douglaz · 2023-04-13T02:51:21Z

Rebased

douglaz · 2023-04-13T02:52:30Z

Addressed code review

elsirion · 2023-04-14T09:27:10Z

fedimint-client-legacy/src/mint/mod.rs

+                }
+                None => {
+                    assert!(pending_amount > Amount::ZERO);
+                    if requested_amount > total_amount {


nit: Do we need total_amount here? To me it seems we could just check if last_big_note_checkpoint is Some, which would also avoid the expect further down:

fedimint/fedimint-client-legacy/src/mint/mod.rs

Lines 475 to 476 in be65b79

let (big_note_amount, big_note, checkpoint) =

last_big_note_checkpoint.expect("to have at least one big note");

I thought it would make things more clearer, but it isn't strictly necessary. Changed to not use it

elsirion · 2023-04-14T09:28:43Z

fedimint-core/src/db/mod.rs

+        let stream = isolated
+            .raw_find_by_prefix_sorted_descending(key_prefix)
+            .await?
+            .collect::<Vec<_>>()


This requires a rebase now I assume?

elsirion · 2023-04-14T09:37:27Z

fedimint-rocksdb/src/lib.rs

+        }
+    }
+    if !increased {
+        bail!("Prefix is already the max one: {prefix:?}")


Huh, why if 0xff an invalid prefix? We might declare it as such because it's convenient, but it's not obviously true to me.

Correct me if I'm wrong, but in that case we might want to return the empty prefix vec![] so that we start iterating from the absolutely last element when iterating in reverse order. Naturally the 0xff prefix should come first then.

elsirion · 2023-04-14T09:39:39Z

fedimint-sqlite/Cargo.toml

@@ -15,6 +15,7 @@ sqlx = { version = "0.7.0-alpha.2", default-features = false, features = ["runti
 fedimint-core ={ path = "../fedimint-core" }
 rand = "0.8"
 tracing = "0.1.37"
+hex = "0.4.3"


We tried to get rid of the hex dependency some time ago and instead used the impl inside bitcoin_hashes. Not sure if worth the hassle though … @mxz42

Not sure either, but at some point we can depend on their hex crate as we depend on bitcoin_hashes anyway.
we also use the format_hex macro in a few places.

elsirion · 2023-04-14T09:42:35Z

fedimint-sqlite/src/lib.rs

+        let results = self.tx.fetch_all(query_prepared).await;
+
+        if results.is_err() {
+            warn!("sqlite find_by_prefix failed to retrieve key range. Returning empty iterator");


You already made the return type a result by now it seems, could we return the error here now?

douglaz · 2023-04-14T16:17:37Z

Rebased

douglaz · 2023-04-14T16:18:24Z

Addressed code review

elsirion

Looks good to me, referring to @m1sterc001guy for the DB specific impls.

m1sterc001guy · 2023-04-17T18:27:09Z

fedimint-core/src/db/mod.rs

@@ -351,8 +351,16 @@ pub trait IDatabaseTransaction<'a>: 'a + MaybeSend {

    async fn raw_remove_entry(&mut self, key: &[u8]) -> Result<Option<Vec<u8>>>;

+    /// Returns an stream of key-value pairs with keys that start with
+    /// `key_prefix`. No particular ordering is guaranteed.


Nit: we might want to make this ascending in the future

Avoid loading all notes in memory on input creation.

douglaz requested review from a team as code owners April 10, 2023 19:04

douglaz mentioned this pull request Apr 10, 2023

feat: Avoid loading all notes in memory #2088

Closed