-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Conversation
Build bloom using the false positive rate (0.01) and the desired size Store the bloom filter size in the db Ensure compute_bitmap_size returns a size rounded up to nearest multiple of 8 so we can use it to find the right u64-boundaries
docs don't use stored k_num (wip)
Un-pubify bloom code cleanup
Remove the accounts existence bloom filter. Used to compare performance of running with no bloom to running with an optimized bloom (see https://github.com/openethereum/openethereum/pull/11581).
Don't allow bloom loading to panic the client: check bounds Report on state memory used while importing blocks from disk Measure time elapsed for actual import when importing from disk
I would be in favor of merging the PR that removes blooms first. Currently, this bloom filter is 100% saturated, which means it is useless. So removing that code is a clear improvement over the status quo. Then in your import benchmark it would be nice to know what kind of non-existent account queries are triggered during import (I believe this also applies to GetNodeData requests). And how can we tune RocksDB's built-in bloom filter via settings to achieve the same improvement as building a larger bloom filter manually or even make it better. Another thing that we discussed is having a common setup in the cloud for proper deterministic and reproducible benches that we can share. |
It's worth noting that the saturation of the filter is going to depend on which network we're talking about – it might be great for Goerli for instance – and also depending on how the node was synced: slow-synced nodes will have a more saturated bloom than a warp synced node, nodes that have seen a lot of reorgs will be worse off etc. The static nature of the current bloom filter makes it impossible to do the right thing for all use-cases. This, imo, is another strong argument for removing it. |
@eduadiez Is this something you could help with you think? |
State cache won't be much use when importing a single block, since for a single block execution, all of the data that's accessed twice in in the overlay anyway. It should however help when importing a bunch of blocks that access the same storage items. |
The bloom filter might be full now, but it was not around block 2.3m, when it helped a lot with underpriced queries of non-existent accounts. I'd measure the effect of its removal on the full sync, if that's still relevant. |
The numbers above were importing 1000 blocks, although I guess your reasoning still stands and I do not know anything about the access patterns within those blocks so maybe no transaction touched the same storage items and the cache was useless. I'm inclined to leave the state cache as is and if anything remove the user-visible config option: apparently messing with this setting is not trivial. |
That's a really good point. I'll check the no-blooms branch and see if there is a terrible regression around that area of the chain. |
Closing in favour of #11589 |
The bloom filter used to detect whether an account exists or not is currently too small and is fully saturated. It needs to be rebuilt for all big chains (eth, etc, and their testnets). The only way to do this is by iterating over all accounts in the state db which is a very lengthy process (~14h for ETH mainnet, but using multiple threads can bring this down to below 3h).
Rather than making this a mandatory migration process this PR proposes a new command:
openethereum db rebuild-accounts-bloom
. This way node operators can choose to perform the rebuild at the time of their choosing.Note that the choice of bloom filter is not trivial to determine. Currently ETH has ~85M accounts. Using a 10x bloom filter size yields a bloom filter that is ~60% full, so we likely need something like 15x to be "safe" for a while.
(Or no bloom filter at all, which is also an option.)
Benchmark results comparing no-blooms to this branch and using an accounts existence bloom filter built for 100M accounts (which is fine for the current state of the chain but not for very long):
No blooms, import 1000 blocks, 1 warmup + 3 runs:
Rebuilt bloom, import 1000 blocks, 1 warmup + 3 runs:
The bloom filter for 100M accounts is 120Mb and it is kept in RAM. Node startup is slowed down by 2-5sec as the bloom filter loads.
At this point I was curious to find out what impact increasing cache size had on performance. For importing blocks I believe the two relevant settings are
--cache-size-state
and--cache-size-db
. The former caches accounts and smart contract code; the latter configures therocksdb
caches.No blooms – 100x state cache: --cache-size-state=2500
Increasing the state cache actually makes things worse. 😨
How about the db caches?
No blooms – 10x rocksdb cache: --cache-size-db=2560
No blooms – 100x rocksdb cache: --cache-size-db=25600
So that is interesting! Let's do some more of those.
No bloom – 2…8x rocksdb cache: --cache-size-db=512…2048
So perhaps unsurprisingly, giving rocksdb more cache memory gives an almost linear performance increase in block imports up until 10x. The benfit drops off somewhere after 10x, possibly because there is too little RAM left for the OS page cache on my 32Gb laptop.