Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up state/block loading #5207

Merged
merged 2 commits into from
Jul 26, 2023
Merged

speed up state/block loading #5207

merged 2 commits into from
Jul 26, 2023

Conversation

arnetheduck
Copy link
Member

When loading blocks and states from db/era, we currently redundantly check their CRC32 - for a state, this costs 50ms of loading time presently (110mb uncompressed size) on a decent laptop.

  • remove maxDecompressedDbRecordSize - not actually used on recent data since we store the framed format - also, we're in luck: we blew past the limit quite some time ago
  • fix obsolete exception-based error checking
  • avoid zeroMem when reading from era store

see status-im/nim-snappy#22 for benchmarks

@github-actions
Copy link

github-actions bot commented Jul 21, 2023

Unit Test Results

         6 files  ±0       715 suites  ±0   28m 14s ⏱️ + 2m 46s
  3 710 tests ±0    3 431 ✔️ ±0  279 💤 ±0  0 ±0 
10 536 runs  ±0  10 252 ✔️ ±0  284 💤 ±0  0 ±0 

Results for commit ead1ac6. ± Comparison against base commit df80ae6.

♻️ This comment has been updated with latest results.

When loading blocks and states from db/era, we currently redundantly
check their CRC32 - for a state, this costs 50ms of loading time
presently (110mb uncompressed size) on a decent laptop.

* remove `maxDecompressedDbRecordSize` - not actually used on recent
data since we store the framed format - also, we're in luck: we blew
past the limit quite some time ago
* fix obsolete exception-based error checking
* avoid `zeroMem` when reading from era store

see status-im/nim-snappy#22 for benchmarks
@zah
Copy link
Member

zah commented Jul 25, 2023

Can you clarify the claim why you find these checks to be redundant? Is it because the code computes hash_tree_root of the loaded state? Does this happen on every code path loading a state? Or is this a claim that Sqlite3 itself verifies the integrity of the data, which doesn't seem to be the case:
https://www.sqlite.org/cksumvfs.html

@zah zah enabled auto-merge (squash) July 25, 2023 22:07
@zah zah disabled auto-merge July 26, 2023 07:47
@zah zah merged commit e837938 into unstable Jul 26, 2023
10 of 12 checks passed
@zah zah deleted the no-crc branch July 26, 2023 07:47
@arnetheduck
Copy link
Member Author

Is it because the code computes hash_tree_root of the loaded state?
Yes

Does this happen on every code path loading a state?

Sort of - it happens as soon as a slot transition or block application happens - that said, this PR does increase risk slightly: in case of the wrong kind of data corruption, decompression could plausibly create a bogus state that gets read from in some code path where no HTR is called - however, the crc32 check was never really an intended part of the database design, ie a data corruption can happen when loading a public key or any other uncompressed data from disk (which does not pass through snappy) - if we want consistent crc32 checking, this needs to be an explicit feature on top of all database data.

Notably, our crc32 implementation is also a ridiculously slow one - >10x faster implementations exist, like https://github.com/google/crc32c - if we go down that path, we should switch to a proper implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants