Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
eth/63 fast synchronization algorithm #1889
This PR aggregates a lot of small modifications to
The goal of the the fast sync algorithm is to exchange processing power for bandwidth usage. Instead of processing the entire block-chain one link at a time, and replay all transactions that ever happened in history, fast syncing downloads the transaction receipts along the blocks, and pulls an entire recent state database. This allows a fast synced node to still retain its status an an archive node containing all historical data for user queries (and thus not influence the network's health in general), but at the same time to reassemble a recent network state at a fraction of the time it would take full block processing.
An outline of the fast sync algorithm would be:
By downloading and verifying the entire header chain, we can guarantee with all the security of the classical sync, that the hashes (receipts, state tries, etc) contained within the headers are valid. Based on those hashes, we can confidently download transaction receipts and the entire state trie afterwards. Additionally, by placing the pivoting point (where fast sync switches to block processing) a bit below the current head (1024 blocks), we can ensure that even larger chain reorganizations can be handled without the need of a new sync (as we have all the state going that many blocks back).
The historical block-processing based synchronization mechanism has two (approximately similarly costing) bottlenecks: transaction processing and PoW verification. The baseline fast sync algorithm successfully circumvents the transaction processing, skipping the need to iterate over every single state the system ever was in. However, verifying the proof of work associated with each header is still a notably CPU intensive operation.
However, we can notice an interesting phenomenon during header verification. With a negligible probability of error, we can still guarantee the validity of the chain, only by verifying every
Let's define the negligible probability
The above table should be interpreted in such a way, that if we verify every
Using this caveat however would mean, that the pivot point can be considered secure only after
With this caveat calculated, the fast sync should be modified so that up to the
Blockchain protocols in general (i.e. Bitcoin, Ethereum, and the others) are susceptible to Sybil attacks, where an attacker tries to completely isolate a node from the rest of the network, making it believe a false truth as to what the state of the real network is. This permits the attacker to spend certain funds in both the real network and this "fake bubble". However, the attacker can only maintain this state as long as it's feeding new valid blocks it itself is forging; and to successfully shadow the real network, it needs to do this with a chain height and difficulty close to the real network. In short, to pull off a successful Sybil attack, the attacker needs to match the network's hash rate, so it's a very expensive attack.
Compared to the classical Sybil attack, fast sync provides such an attacker with an extra ability, that of feeding a node a view of the network that's not only different from the real network, but also that might go around the EVM mechanics. The Ethereum protocol only validates state root hashes by processing all the transactions against the previous state root. But by skipping the transaction processing, we cannot prove that the state root contained within the fast sync pivot point is valid or not, so as long as an attacker can maintain a fake blockchain that's on par with the real network, it could create an invalid view of the network's state.
To avoid opening up nodes to this extra attacker ability, fast sync (beside being solely opt-in) will only ever run during an initial sync (i.e. when the node's own blockchain is empty). After a node managed to successfully sync with the network, fast sync is forever disabled. This way anybody can quickly catch up with the network, but after the node caught up, the extra attack vector is plugged in. This feature permits users to safely use the fast sync flag (
To benchmark the performance of the new algorithm, four separate tests were run: full syncing from scrath on Frontier and Olympic, using both the classical sync as well as the new sync mechanism. In all scenarios there were two nodes running on a single machine: a seed node featuring a fully synced database, and a leech node with only the genesis block pulling the data. In all test scenarios the seed node had a fast-synced database (smaller, less disk contention) and both nodes were given 1GB database cache (
The machine running the tests was a Zenbook Pro, Core i7 4720HQ, 12GB RAM, 256GB m.2 SSD, Ubuntu 15.04.
The resulting databases contain the entire blockchain (all blocks, all uncles, all transactions), every transaction receipt and generated logs, and the entire state trie of the head 1024 blocks. This allows a fast synced node to act as a full archive node from all intents and purposes.
The fast sync algorithm requires the functionality defined by eth/63. Because of this, testing in the live network requires for at least a handful of discoverable peers to update their nodes to eth/63. On the same note, verifying that the implementation is truly correct will also entail waiting for the wider deployment of eth/63.
Just a mental note, my chain assembly functions do not push chain events into the mux. This should probably be something to discuss as to what - if anything - should be pushed. Another open ended question is how to incorporate the state download progress into
Great write up, given that the summary of the PR is quite thorough and this whole change is non-trivial, I wanted to start by nitpicking a bit about the summary itself. After review it could be compiled into a nice small paper/report pdf :)
Pivot Point at 1024
Having a point like this seems reasonable, but the choice of 1024 is arbitrary and needs better motivation. For example, in https://blog.ethereum.org/2015/09/14/on-slow-and-fast-block-times/ Vitalik argues practical finality (assuming no attackers with hash power very close to 51%) with a 17s block time averaging 2 minutes (8 blocks for Ethereum)
So a much smaller number should be OK to configure here for the pivot point if the reason for it is to avoid the final sync happening within threshold for probable chain reorgs. Also please consider naming this point to something like "reorg threshold" or "reorg security/probability threshold" to make it more descriptive.
K-th header PoW verification
Seems reasonable. It states we have
Clarifying this will make it easier for readers unfamiliar with blockchains and also when we later on refer to this PR post-merge.
I'd also add a reference to FIPS 202 section A.1 which is the official claim to SHA3's collision resistance - this can be good given recent discussions around SHA-1's collision resistance which after research ended up being less than initially thought.
The table of
For example, for
(K needs to be 43 to satisfy
Finally, the analysis concludes by selecting
Good catch with the rounding error, I'll have to create a tad better code for it. Maybe I could use float64 too and not need to do this hula hoop jumping with big.Float. Regarding the Olympic sync time, I know, I started running it ad ran out of disk space, so the whole thing crashed after 3 hours :P Will try to run it again tonight.
I've corrected the rounding issue in the
The selection of 2048 was really because at the current blockchain size, it requires approximately 5K header verifications, whereas the extremities of the listed values already approach 6+K.
The selection of
Lastly for the pivot point... I don't think it looses us much if we process 1K, but maybe on an embedded system it's more painful so I'm happy with reducing it, just let's figure out a reasonable value to reduce to.
From the initial comment:
This allows a fast synced node to contain all historical data for user queries (like a classically synced node, and thus not influence the network's health in general)...
@karalabe I have a question on what you mean by
@jjtny1 that would be my question too. I thought geth is downloading the state in parallel to downloading the block headers, hence it pulls some historical states on the way, but it's not downloading it completely and the complete state download happens only at the pivot point(???)
Thanks @karalabe! Sorry to bother you 3 years later :)
Hi I read this article had some question, if i set fast sync, now my chaindata(include states) is may 40G, but 1 year later, my chaindata may be large again, because the states trash history have more again, have some Solutions to delete states trash history in operation? thks