feat: stabilize chunk nodes cache #6628

Longarithm · 2022-04-18T21:17:41Z

Stabilize feature introduced in #6450.

Motivation

For each node, for which we charge touching trie node cost (TTN), we want to do it only once per chunk - on the first access. Based on our estimation, reading node from cache requires cost ~7x lower than TTN, and it helps to reduce Aurora costs by ~27% on a sample of receipts with average 60 Tgas cost.

This is achieved by inserting these nodes into chunk_cache inside TrieCachingStorage, which is automatically dropped during finishing chunk processing. We enable such caching by switching to TrieCacheMode::CachingChunk mode, and disable it by switching back to TrieCacheMode::CachingShard. This is currently done only during funcalls, because we don't actually charge for nodes accessed in other places.

We don't set neither size limit nor maximal value limit for chunk cache items. Instead, we show that its size can't exceed 153 MB due to other runtime costs. Note for reviewers: the analysis below was updated after merging #6450.

Size limit

We store two different types of items: RawTrieNodeWithSizes and values, DB key for each of them is the value hash.

RawTrieNodeWithSize contains some data of limited size and sometimes a storage key part - Vec<u8>. For limited size data, Branch node is the worst case - here we store memory_usage, 16 Option<CryptoHash> and 1 Option<(u32, CryptoHash)>. The total size is 8 + 16 * 33 + 40 = 576 < 600 B. Also node can contain a 2 KB key part for which we may not charge full byte cost, e.g. during storage_has_key query. So the overall worst case is ~2 KB, and the limit is 500 Tgas / touching_trie_node * 2KB = 62 MB.

Total size of values is limited by 500 Tgas / storage_read_value_byte = 500 Tgas / 5.5 Mgas ~= 91 MB. So it gives 153 MB per chunk and 612 MB for all chunks. Experiments show that on real data chunk cache size doesn't exceed 1 MB.

Testing

process_blocks::chunk_nodes_cache_test - send transaction modifying contract storage before / after protocol upgrade, check the correctness of costs.
test_chunk_nodes_cache_same_common_parent_runtime - apply 3 receipts for keys with common parent and check that cost of last 2 receipts is lower.
test_chunk_nodes_cache_branch_value_runtime - apply 2 receipts for keys where one is a prefix of another and check that cost of last receipt is lower.
test_chunk_nodes_cache_mode_runtime - check that we don't enable chunk cache for all receipts. Manual removal of disabling chunk cache mode leads to test failure.
run sampled 200 blocks on each shard and check that costs decrease, performance doesn't degrade and chunk cache size is limited.

mm-near · 2022-04-19T09:51:15Z

let's also update the CHANGELOG file

nagisa

The code changes look great; can’t really comment on the decision of whether to stabilize or wait, since I haven't been participating in the development of this feature all that much.

matklad · 2022-04-19T13:46:19Z

Ok, this looks good to me I think! There are couple of iffy semantic moments here I want to surface, to make sure we are ok with them:

This PR adds a cross-receipt dependency. If two receipts happen to land in the same chunk, than the cost of the second receipt might get lower, as the first one will bear the cost of populating the cache.
This PR introduces dependency on failed receipts. That is, the above consideration holds even if the first receipt contains an action which fails.

My understanding is that the 1. behavior is actually pre-existing: TrieUpdate.committed has exactly the same "inter-receipt cache leakage" problem, though it requires a write, rather than a read.

The 2. behavior is I think new (and, to be honest, doing this final review is the first time I noticed it. Might be worth adding a test for?).

So, this gets a 👍 from me, though I'd love if someone from core/runtime took a final look here to confirm that the big picture here is exactly what we want.

matklad · 2022-04-19T15:32:07Z

Discussed with bowenwang1996: the above should be fine, so let's merge it, but, indeed, let's not forget to update the changelog

Longarithm added 2 commits April 19, 2022 01:14

stabilize feature

4402ffe

add comment to 53.json

d3465ce

Longarithm self-assigned this Apr 18, 2022

fix config hash

ebf0732

Longarithm marked this pull request as ready for review April 18, 2022 22:01

Longarithm requested a review from a team as a code owner April 18, 2022 22:01

Longarithm requested review from nagisa, matklad and mm-near April 18, 2022 22:01

mm-near approved these changes Apr 19, 2022

View reviewed changes

nagisa approved these changes Apr 19, 2022

View reviewed changes

matklad approved these changes Apr 19, 2022

View reviewed changes

Longarithm added 3 commits April 19, 2022 20:55

Merge branch 'master' into stab-chunk-cache

749d1bd

finish merge

5ab5058

fix config

1bc1c01

Longarithm added the S-automerge label Apr 19, 2022

fix rpc schema

a831ab6

near-bulldozer bot merged commit 06d4303 into master Apr 19, 2022

near-bulldozer bot deleted the stab-chunk-cache branch April 19, 2022 17:23

Longarithm mentioned this pull request Apr 20, 2022

Save all visited nodes into cache on block level #5920

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: stabilize chunk nodes cache #6628

feat: stabilize chunk nodes cache #6628

Longarithm commented Apr 18, 2022 •

edited

Loading

mm-near commented Apr 19, 2022

nagisa left a comment

matklad commented Apr 19, 2022

matklad commented Apr 19, 2022

feat: stabilize chunk nodes cache #6628

feat: stabilize chunk nodes cache #6628

Conversation

Longarithm commented Apr 18, 2022 • edited Loading

Motivation

Size limit

Testing

mm-near commented Apr 19, 2022

nagisa left a comment

Choose a reason for hiding this comment

matklad commented Apr 19, 2022

matklad commented Apr 19, 2022

Longarithm commented Apr 18, 2022 •

edited

Loading