fix(trie): combining state parts memory usage #3986

mikhailOK · 2021-02-20T20:49:19Z

After getting all state parts, write them to storage one at a time
instead of combining them in memory.

Test plan

test for trie seek and iteration
test that creating parts behavior is the same
test that combining parts behavior is the same

After getting all state parts, write them to storage one at a time instead of combining them in memory. Test plan --------- test for trie seek and iteration test that creating parts behavior is the same test that combining parts behavior is the same

bowenwang1996

where do we actually save the state parts to storage?

chain/chain/src/types.rs

bowenwang1996 · 2021-02-22T02:23:02Z

core/store/src/trie/iterator.rs

                }
-                TrieNode::Branch(mut children, _) => {
+                TrieNode::Branch(children, _) => {
                    if key.is_empty() {


why does it make sense for key to be empty?

key is mutable, it's the seek argument after truncating the path traversed from its prefix. Empty key means the seek ends in this node, this case is unchanged apart from a refactor.
The main change is to the invariant between key_nibbles and trail when trail ends with Extension or Leaf

bowenwang1996 · 2021-02-22T02:29:49Z

core/store/src/trie/iterator.rs

+        iterator.seek(&seek_key).unwrap();
+        let result1: Vec<_> = iterator.map(Result::unwrap).take(5).collect();
+        let result2: Vec<_> =
+            map.range(seek_key.to_vec()..).map(|(k, v)| (k.clone(), v.clone())).take(5).collect();


why do we take 5 here?

to take a few steps after seek.

core/store/src/trie/state_parts.rs

mikhailOK

part downloader logic validates parts and writes them to ColStateParts, after collecting all parts we do set_state_finalize which calls apply_state_part on each of them. apply_state_part reads from ColStateParts and writes to ColState.

mikhailOK · 2021-02-22T02:52:32Z

core/store/src/trie/iterator.rs

                }
-                TrieNode::Branch(mut children, _) => {
+                TrieNode::Branch(children, _) => {
                    if key.is_empty() {


key is mutable, it's the seek argument after truncating the path traversed from its prefix. Empty key means the seek ends in this node, this case is unchanged apart from a refactor.
The main change is to the invariant between key_nibbles and trail when trail ends with Extension or Leaf

mikhailOK · 2021-02-22T02:59:52Z

core/store/src/trie/iterator.rs

+        iterator.seek(&seek_key).unwrap();
+        let result1: Vec<_> = iterator.map(Result::unwrap).take(5).collect();
+        let result2: Vec<_> =
+            map.range(seek_key.to_vec()..).map(|(k, v)| (k.clone(), v.clone())).take(5).collect();


to take a few steps after seek.

bowenwang1996

I don't fully understand the code, but the tests look good to me. Could we start writing a test that involves syncing 30GB of state and see how it works?

mikhailOK · 2021-02-22T18:47:09Z

Manually tested with testnet, memory usage was low and combining parts took about a minute (2.5GB state)

matklad · 2021-02-24T10:40:59Z

core/store/src/trie/mod.rs

+    fn unwrap_hash(&self) -> &CryptoHash {
+        match self {
+            Self::Hash(hash) => hash,
+            Self::InMemory(_) => unreachable!(),
+        }
+    }


tiny nit, but a more idiomatic version would probably be

impl NodeHandle { fn as_hash(&self) -> Option<&CryptoHash> { match self { Self::Hash(hash) => Some(hash), Self::InMemory(_) => None, } } }

that way, we get to re-use Option's unwrap.

@matklad I guess this way we're losing important knowledge about unreachable!() situation.

There'll be .unwrap() on the call-site, so this just moves the impossibility assertion from the definition site to the call-site.

There's no much semantic difference between unreachable!() and Option::unwrap -- both mean that the programmer asserts something at runtime, which can't be explained using rust's type system.

But, again, this is at most a minor code-style point, in the grand scheme of things, both options are totally fine/

@matklad sure. My point in using unreachable!() shows that something is impossible by design, but each unwrap must be explained why it's safe. It's not about rust, it's about being clear to devs.

(My understanding might be outdated. If it's possible to point that something is not reachable by design another better way (with no using unreachable!()) please let me know.)

I understand unreachable! the same way. But, in my mind, .unwrap() also signals impossibility by design? That is, if we look at the call-site,

hash = *child.unwrap_hash(); /* versus */ hash = *child.as_hash().unwrap();

it seems me that both versions say that, by design, child is guaranteed to be Hash in this particular case.

The difference is that, when I see foo.unwrap(), I immediately know that is signals an impossibility of something. When I see foo.unwrap_hash(), I need to check the definition of unwrap_hash to realize that it has an unreachable inside.

@matklad not sure is it the standard way of understanding in the team. For example, in my latest work, there are explanations of unwrap used (https://github.com/near/nearcore/compare/master...gc_headers_2?expand=1#diff-c26b0a777803f79360d3f469862cfcd95deb3c390809095670f6db88d00332f4R236-R239) and I found them useful.

Basically, I work with concepts written by others as black boxes. That's why, in my opinion, unwraps must be explained, otherwise we may end up with lack of understanding what's possible and why. Seeing unreachable!() helps me to dive into details much better than guessing can I reuse the same pattern in another place.

One more example - we use tons of unwraps in testing because we don't want to process errors in any other way. It doesn't mean the methods used in tests should never return errors. :)

I agree that unwraps should be explained (or, better still, refactored out of existence). But I would also think that, for the same reason, a call to .unwrap_hash() should be explained.

Ie, if I see an uncommitted call to .unwrap_hash(), I understand that the author believes, for some good reason, that unreachable case won't happen here. But that's exactly the same info I get from seeing .unwrap(). That is, .unwrap() and .unwrap_hash() signal the same level of intentionallyty to me.

After getting all state parts, write them to storage one at a time instead of combining them in memory. Test plan --------- test for trie seek and iteration test that creating parts behavior is the same test that combining parts behavior is the same

mikhailOK requested review from Kouprin and bowenwang1996 February 20, 2021 20:49

mikhailOK requested review from evgenykuzyakov, frol, MaksymZavershynskyi and SkidanovAlex as code owners February 20, 2021 20:49

mikhailOK mentioned this pull request Feb 20, 2021

Combine state parts should work when the state is large #3262

Closed

mikhailOK linked an issue Feb 20, 2021 that may be closed by this pull request

Combine state parts should work when the state is large #3262

Closed

bowenwang1996 requested a review from EgorKulikov February 22, 2021 01:16

bowenwang1996 reviewed Feb 22, 2021

View reviewed changes

mikhailOK commented Feb 22, 2021

View reviewed changes

small refactor

1d4b4da

bowenwang1996 approved these changes Feb 22, 2021

View reviewed changes

Merge branch 'master' into state_parts

a48b7f7

mikhailOK merged commit a637519 into master Feb 23, 2021

mikhailOK deleted the state_parts branch February 23, 2021 00:03

matklad reviewed Feb 24, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(trie): combining state parts memory usage #3986

fix(trie): combining state parts memory usage #3986

mikhailOK commented Feb 20, 2021

bowenwang1996 left a comment

bowenwang1996 Feb 22, 2021

mikhailOK Feb 22, 2021

bowenwang1996 Feb 22, 2021

mikhailOK Feb 22, 2021

mikhailOK left a comment

mikhailOK Feb 22, 2021

mikhailOK Feb 22, 2021

bowenwang1996 left a comment

mikhailOK commented Feb 22, 2021

matklad Feb 24, 2021

Kouprin Feb 24, 2021

matklad Feb 24, 2021

Kouprin Feb 24, 2021

matklad Feb 24, 2021

Kouprin Feb 24, 2021

matklad Feb 24, 2021

fix(trie): combining state parts memory usage #3986

fix(trie): combining state parts memory usage #3986

Conversation

mikhailOK commented Feb 20, 2021

Test plan

bowenwang1996 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikhailOK left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bowenwang1996 left a comment

Choose a reason for hiding this comment

mikhailOK commented Feb 22, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment