speed up state/block loading #5207

arnetheduck · 2023-07-21T13:30:29Z

When loading blocks and states from db/era, we currently redundantly check their CRC32 - for a state, this costs 50ms of loading time presently (110mb uncompressed size) on a decent laptop.

remove maxDecompressedDbRecordSize - not actually used on recent data since we store the framed format - also, we're in luck: we blew past the limit quite some time ago
fix obsolete exception-based error checking
avoid zeroMem when reading from era store

see status-im/nim-snappy#22 for benchmarks

github-actions · 2023-07-21T13:59:47Z

Unit Test Results

        6 files ±0     715 suites ±0 28m 14s ⏱️ + 2m 46s
  3 710 tests ±0   3 431 ✔️ ±0 279 💤 ±0 0 ❌ ±0
10 536 runs ±0 10 252 ✔️ ±0 284 💤 ±0 0 ❌ ±0

Results for commit ead1ac6. ± Comparison against base commit df80ae6.

♻️ This comment has been updated with latest results.

When loading blocks and states from db/era, we currently redundantly check their CRC32 - for a state, this costs 50ms of loading time presently (110mb uncompressed size) on a decent laptop. * remove `maxDecompressedDbRecordSize` - not actually used on recent data since we store the framed format - also, we're in luck: we blew past the limit quite some time ago * fix obsolete exception-based error checking * avoid `zeroMem` when reading from era store see status-im/nim-snappy#22 for benchmarks

zah · 2023-07-25T22:07:12Z

Can you clarify the claim why you find these checks to be redundant? Is it because the code computes hash_tree_root of the loaded state? Does this happen on every code path loading a state? Or is this a claim that Sqlite3 itself verifies the integrity of the data, which doesn't seem to be the case:
https://www.sqlite.org/cksumvfs.html

arnetheduck · 2023-08-02T09:14:52Z

Is it because the code computes hash_tree_root of the loaded state?
Yes

Does this happen on every code path loading a state?

Sort of - it happens as soon as a slot transition or block application happens - that said, this PR does increase risk slightly: in case of the wrong kind of data corruption, decompression could plausibly create a bogus state that gets read from in some code path where no HTR is called - however, the crc32 check was never really an intended part of the database design, ie a data corruption can happen when loading a public key or any other uncompressed data from disk (which does not pass through snappy) - if we want consistent crc32 checking, this needs to be an explicit feature on top of all database data.

Notably, our crc32 implementation is also a ridiculously slow one - >10x faster implementations exist, like https://github.com/google/crc32c - if we go down that path, we should switch to a proper implementation.

arnetheduck added 2 commits July 25, 2023 19:31

bump snappy

ead1ac6

zah force-pushed the no-crc branch from 697ece6 to ead1ac6 Compare July 25, 2023 16:33

zah enabled auto-merge (squash) July 25, 2023 22:07

zah disabled auto-merge July 26, 2023 07:47

zah merged commit e837938 into unstable Jul 26, 2023
10 of 12 checks passed

zah deleted the no-crc branch July 26, 2023 07:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up state/block loading #5207

speed up state/block loading #5207

arnetheduck commented Jul 21, 2023

github-actions bot commented Jul 21, 2023 •

edited

zah commented Jul 25, 2023

arnetheduck commented Aug 2, 2023

speed up state/block loading #5207

speed up state/block loading #5207

Conversation

arnetheduck commented Jul 21, 2023

github-actions bot commented Jul 21, 2023 • edited

Unit Test Results

zah commented Jul 25, 2023

arnetheduck commented Aug 2, 2023

github-actions bot commented Jul 21, 2023 •

edited