Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(trie): combining state parts memory usage #3986

Merged
merged 3 commits into from
Feb 23, 2021
Merged

fix(trie): combining state parts memory usage #3986

merged 3 commits into from
Feb 23, 2021

Conversation

mikhailOK
Copy link
Contributor

After getting all state parts, write them to storage one at a time
instead of combining them in memory.

Test plan

test for trie seek and iteration
test that creating parts behavior is the same
test that combining parts behavior is the same

After getting all state parts, write them to storage one at a time
instead of combining them in memory.

Test plan
---------
test for trie seek and iteration
test that creating parts behavior is the same
test that combining parts behavior is the same
Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we actually save the state parts to storage?

chain/chain/src/types.rs Outdated Show resolved Hide resolved
}
TrieNode::Branch(mut children, _) => {
TrieNode::Branch(children, _) => {
if key.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it make sense for key to be empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key is mutable, it's the seek argument after truncating the path traversed from its prefix. Empty key means the seek ends in this node, this case is unchanged apart from a refactor.
The main change is to the invariant between key_nibbles and trail when trail ends with Extension or Leaf

iterator.seek(&seek_key).unwrap();
let result1: Vec<_> = iterator.map(Result::unwrap).take(5).collect();
let result2: Vec<_> =
map.range(seek_key.to_vec()..).map(|(k, v)| (k.clone(), v.clone())).take(5).collect();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we take 5 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to take a few steps after seek.

core/store/src/trie/state_parts.rs Outdated Show resolved Hide resolved
core/store/src/trie/state_parts.rs Outdated Show resolved Hide resolved
Copy link
Contributor Author

@mikhailOK mikhailOK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part downloader logic validates parts and writes them to ColStateParts, after collecting all parts we do set_state_finalize which calls apply_state_part on each of them. apply_state_part reads from ColStateParts and writes to ColState.

}
TrieNode::Branch(mut children, _) => {
TrieNode::Branch(children, _) => {
if key.is_empty() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key is mutable, it's the seek argument after truncating the path traversed from its prefix. Empty key means the seek ends in this node, this case is unchanged apart from a refactor.
The main change is to the invariant between key_nibbles and trail when trail ends with Extension or Leaf

iterator.seek(&seek_key).unwrap();
let result1: Vec<_> = iterator.map(Result::unwrap).take(5).collect();
let result2: Vec<_> =
map.range(seek_key.to_vec()..).map(|(k, v)| (k.clone(), v.clone())).take(5).collect();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to take a few steps after seek.

Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the code, but the tests look good to me. Could we start writing a test that involves syncing 30GB of state and see how it works?

@mikhailOK
Copy link
Contributor Author

Manually tested with testnet, memory usage was low and combining parts took about a minute (2.5GB state)

@mikhailOK mikhailOK merged commit a637519 into master Feb 23, 2021
@mikhailOK mikhailOK deleted the state_parts branch February 23, 2021 00:03
Comment on lines +66 to +71
fn unwrap_hash(&self) -> &CryptoHash {
match self {
Self::Hash(hash) => hash,
Self::InMemory(_) => unreachable!(),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit, but a more idiomatic version would probably be

impl NodeHandle {
    fn as_hash(&self) -> Option<&CryptoHash> {
        match self {
            Self::Hash(hash) => Some(hash),
            Self::InMemory(_) => None,
        }
    }
}

that way, we get to re-use Option's unwrap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matklad I guess this way we're losing important knowledge about unreachable!() situation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There'll be .unwrap() on the call-site, so this just moves the impossibility assertion from the definition site to the call-site.

There's no much semantic difference between unreachable!() and Option::unwrap -- both mean that the programmer asserts something at runtime, which can't be explained using rust's type system.

But, again, this is at most a minor code-style point, in the grand scheme of things, both options are totally fine/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matklad sure. My point in using unreachable!() shows that something is impossible by design, but each unwrap must be explained why it's safe. It's not about rust, it's about being clear to devs.

(My understanding might be outdated. If it's possible to point that something is not reachable by design another better way (with no using unreachable!()) please let me know.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand unreachable! the same way. But, in my mind, .unwrap() also signals impossibility by design? That is, if we look at the call-site,

hash = *child.unwrap_hash();

/* versus */

hash = *child.as_hash().unwrap();

it seems me that both versions say that, by design, child is guaranteed to be Hash in this particular case.

The difference is that, when I see foo.unwrap(), I immediately know that is signals an impossibility of something. When I see foo.unwrap_hash(), I need to check the definition of unwrap_hash to realize that it has an unreachable inside.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matklad not sure is it the standard way of understanding in the team. For example, in my latest work, there are explanations of unwrap used (https://github.com/near/nearcore/compare/master...gc_headers_2?expand=1#diff-c26b0a777803f79360d3f469862cfcd95deb3c390809095670f6db88d00332f4R236-R239) and I found them useful.

Basically, I work with concepts written by others as black boxes. That's why, in my opinion, unwraps must be explained, otherwise we may end up with lack of understanding what's possible and why. Seeing unreachable!() helps me to dive into details much better than guessing can I reuse the same pattern in another place.

One more example - we use tons of unwraps in testing because we don't want to process errors in any other way. It doesn't mean the methods used in tests should never return errors. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that unwraps should be explained (or, better still, refactored out of existence). But I would also think that, for the same reason, a call to .unwrap_hash() should be explained.

Ie, if I see an uncommitted call to .unwrap_hash(), I understand that the author believes, for some good reason, that unreachable case won't happen here. But that's exactly the same info I get from seeing .unwrap(). That is, .unwrap() and .unwrap_hash() signal the same level of intentionallyty to me.

bowenwang1996 pushed a commit that referenced this pull request Mar 19, 2021
After getting all state parts, write them to storage one at a time
instead of combining them in memory.

Test plan
---------
test for trie seek and iteration
test that creating parts behavior is the same
test that combining parts behavior is the same
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Combine state parts should work when the state is large
4 participants