Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core, eth, trie: bloom filter for trie node dedup during fast sync #19489

Merged
merged 5 commits into from May 13, 2019

Conversation

Projects
None yet
3 participants
@karalabe
Copy link
Member

commented Apr 23, 2019

State sync (part of fast sync) works by taking the root of a state trie, downloading it from remote peers, deduplicating all children based on the local database and recursing any missing subtries. The deduplication simply checks the local database if the node is already known and omits downloading it if so. Unfortunately this check is exceedingly expensive as the database grows.


This PR adds a bloom filter to the state sync, so that instead of checking each and every node whether we already have it locally on disk or not, we first consult a gigantic bloom filter. If the node being deduplicated is not in the bloom, it surely is not on disk either, so we can save an expensive database lookup. If the bloom says it "might" be present, we must verify.

The bloom however can only return correct answers (i.e. a trie node surely was not yet downloaded) if we ensure that all trie nodes on disk have been injected into the filter too. This is the case for a fresh node with no database (i.e. no nodes whatsoever), but it does not hold true if a node is restarted mid-sync. For those instances, the PR starts by scanning the local database and injecting any previously downloaded trie node hashes into the bloom. As long as the initialization is running, the bloom always returns "maybe", delegating fast sync to reach down into the database instead of relying on the bloom.


The code is not ideal. We need to clean it up a bit, possibly thread safety wise too. We might also consider reimplementing the bloom instead of pulling in the library, not a fan. The PR is mostly up for benchmarking, discussions, reviews (note, it's not bad, just not final).

@karalabe karalabe added this to the 1.9.0 milestone Apr 23, 2019

@karalabe karalabe requested review from holiman and zsfelfoldi as code owners Apr 23, 2019

@karalabe karalabe force-pushed the karalabe:fast-synd-db-bloom branch from 2b89dea to b42deae Apr 23, 2019

Show resolved Hide resolved eth/downloader/statesync.go Outdated
Show resolved Hide resolved trie/sync.go
Show resolved Hide resolved trie/sync.go
Show resolved Hide resolved trie/sync_bloom.go Outdated
Show resolved Hide resolved trie/sync_bloom.go Outdated
@holiman

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

On my machine ( a VM with NAT)

[user@work go-ethereum]$ build/bin/geth
INFO [04-23|12:55:51.680] Bumping default cache on mainnet         provided=1024 updated=4096
...
INFO [04-23|12:56:41.861] Block synchronisation started 
INFO [04-23|12:56:47.231] Allocated fast sync bloom                size=1294.00MiB
...
INFO [04-23|12:57:07.142] Imported new block headers               count=0    elapsed=6.903ms   number=1232267 hash=1edd29…4ead71 age=3y1mo1w ignored=192
INFO [04-23|12:57:07.206] Initialized fast sync bloom              items=2431935 errorrate=0.000 time=19.974763615s

Basically, it currently starts iterating the state lazily once we get some state from a peer. So on my machine it's 50s before I get a good peer, and only then the bloom iteration starts, and took 20s. Wouldn't it be better to start the bloom init as soon as we determine that we're going to do a fast-sync?

@karalabe karalabe force-pushed the karalabe:fast-synd-db-bloom branch 2 times, most recently from 35c7f72 to 9efb0f7 Apr 23, 2019

@holiman

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

It finished fast sync in roughly 8 hours, bloom error rate reaching 100, which is 1% error rate. Db read 1.8T, It was fast enough that it even suffered from a leveldb-imposed write delay, which we usually does not get after Gary's fixes.

Show resolved Hide resolved eth/downloader/downloader.go Outdated
@@ -134,8 +140,13 @@ func (s *Sync) AddRawEntry(hash common.Hash, depth int, parent common.Hash) {
if _, ok := s.membatch.batch[hash]; ok {
return
}
if ok, _ := s.database.Has(hash.Bytes()); ok {
return
if s.bloom.Contains(hash[:]) {

This comment has been minimized.

Copy link
@Matthalp

Matthalp Apr 24, 2019

Contributor

I see this logic is also used at like 104. Is it worth extracting into a method, such as func (s *Sync) checkIfDuplicate(hash common.Hash) bool.

This comment has been minimized.

Copy link
@karalabe

karalabe Apr 24, 2019

Author Member

Meh, I'm not a fan of making every tiny op a method. Unless the code is really complex, I think inlining some minor things keeps the mental overhead of reading it lower.

There's also a slight difference. This code does a return, code a bit down does continue, so you still need to check the result and do something accordingly. You'd just replace 5 LOC with a method declaration + 3 LOC to call it.

}
written := len(s.membatch.order)
written := len(s.membatch.order) // TODO(karalabe): could an order change improve write performance?

This comment has been minimized.

Copy link
@Matthalp

Matthalp Apr 24, 2019

Contributor

FWIW I wondered if the order field in Batch was still adding value in the codebase. But that's probably because I haven't been contributing for that long.

@@ -292,8 +304,13 @@ func (s *Sync) children(req *request, object node) ([]*request, error) {
if _, ok := s.membatch.batch[hash]; ok {
continue
}
if ok, _ := s.database.Has(node); ok {
continue
if s.bloom.Contains(node) {

This comment has been minimized.

Copy link
@Matthalp

Matthalp Apr 24, 2019

Contributor

Similar comment about common code above.

Show resolved Hide resolved trie/sync_bloom.go

// Wait one second, but check termination more frequently
for i := 0; i < 10; i++ {
if atomic.LoadUint32(&b.closed) == 1 {

This comment has been minimized.

Copy link
@Matthalp

Matthalp Apr 24, 2019

Contributor

FWIW This is a place where the errgroup ctx.Done() pattern in a select block would work out nicely.

This comment has been minimized.

Copy link
@karalabe

karalabe Apr 24, 2019

Author Member

Yeah, this was me being lazy. I'll swap it ouf for channels, just wanted to prep a PR for a benchmark.

Show resolved Hide resolved trie/sync_bloom.go Outdated
Show resolved Hide resolved trie/sync_bloom.go
Show resolved Hide resolved trie/sync_bloom.go Outdated
Show resolved Hide resolved trie/sync_bloom.go Outdated
@Matthalp

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

I think this is a great idea to pursue. I do wonder how future-proof the bloom filter and its parameters are as the state size grows. Has any analysis been done on that?

@holiman

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

Fast sync finished,

  • 7h 45m vs 11h 35m
  • 1.84TB db read vs 3.76TB read
  • 1.89TB db write vs 2.76 TB write

1% error rate on bloom filter

@karalabe karalabe force-pushed the karalabe:fast-synd-db-bloom branch 2 times, most recently from cbf4095 to 775ef62 Apr 24, 2019

karalabe added some commits Apr 23, 2019

@karalabe karalabe force-pushed the karalabe:fast-synd-db-bloom branch 2 times, most recently from 071a29e to a0fd022 May 13, 2019

@karalabe karalabe force-pushed the karalabe:fast-synd-db-bloom branch from a0fd022 to 8639d92 May 13, 2019

@holiman
Copy link
Contributor

left a comment

LGTM, haven't tested the latest change yet though

@karalabe karalabe merged commit 9effd64 into ethereum:master May 13, 2019

0 of 2 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build failed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.