New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth/63 fast synchronization algorithm #1889

Merged
merged 11 commits into from Oct 21, 2015

Conversation

Projects
None yet
@karalabe
Member

karalabe commented Oct 9, 2015

This PR aggregates a lot of small modifications to core, trie, eth and other packages to collectively implement the eth/63 fast synchronization algorithm. In short, geth --fast.

Algorithm

The goal of the the fast sync algorithm is to exchange processing power for bandwidth usage. Instead of processing the entire block-chain one link at a time, and replay all transactions that ever happened in history, fast syncing downloads the transaction receipts along the blocks, and pulls an entire recent state database. This allows a fast synced node to still retain its status an an archive node containing all historical data for user queries (and thus not influence the network's health in general), but at the same time to reassemble a recent network state at a fraction of the time it would take full block processing.

An outline of the fast sync algorithm would be:

  • Similarly to classical sync, download the block headers and bodies that make up the blockchain
  • Similarly to classical sync, verify the header chain's consistency (POW, total difficulty, etc)
  • Instead of processing the blocks, download the transaction receipts as defined by the header
  • Store the downloaded blockchain, along with the receipt chain, enabling all historical queries
  • When the chain reaches a recent enough state (head - 1024 blocks), pause for state sync:
    • Retrieve the entire Merkel Patricia state trie defined by the root hash of the pivot point
    • For every account found in the trie, retrieve it's contract code and internal storage state trie
  • Upon successful trie download, mark the pivot point (head - 1024 blocks) as the current head
  • Import all remaining blocks (1024) by fully processing them as in the classical sync

Analysis

By downloading and verifying the entire header chain, we can guarantee with all the security of the classical sync, that the hashes (receipts, state tries, etc) contained within the headers are valid. Based on those hashes, we can confidently download transaction receipts and the entire state trie afterwards. Additionally, by placing the pivoting point (where fast sync switches to block processing) a bit below the current head (1024 blocks), we can ensure that even larger chain reorganizations can be handled without the need of a new sync (as we have all the state going that many blocks back).

Caveats

The historical block-processing based synchronization mechanism has two (approximately similarly costing) bottlenecks: transaction processing and PoW verification. The baseline fast sync algorithm successfully circumvents the transaction processing, skipping the need to iterate over every single state the system ever was in. However, verifying the proof of work associated with each header is still a notably CPU intensive operation.

However, we can notice an interesting phenomenon during header verification. With a negligible probability of error, we can still guarantee the validity of the chain, only by verifying every K-th header, instead of each and every one. By selecting a single header at random out of every K headers to verify, we guarantee the validity of an N-length chain with the probability of (1/K)^(N/K) (i.e. we have 1/K chance to spot a forgery in K blocks, a verification that's repeated N/K times).

Let's define the negligible probability Pn as the probability of obtaining a 256 bit SHA3 collision (i.e. the hash Ethereum is built upon): 1/2^128. To honor the Ethereum security requirements, we need to choose the minimum chain length N (below which we veriy every header) and maximum K verification batch size such as (1/K)^(N/K) <= Pn holds. Calculating this for various {N, K} pairs is pretty straighforward, a simple and lenient solution being http://play.golang.org/p/B-8sX_6Dq0.

N K N K N K N K
1024 43 1792 91 2560 143 3328 198
1152 51 1920 99 2688 152 3456 207
1280 58 2048 108 2816 161 3584 217
1408 66 2176 116 2944 170 3712 226
1536 74 2304 128 3072 179 3840 236
1664 82 2432 134 3200 189 3968 246

The above table should be interpreted in such a way, that if we verify every K-th header, after N headers the probability of a forgery is smaller than the probability of an attacker producing a SHA3 collision. It also means, that if a forgery is indeed detected, the last N headers should be discarded as not safe enough. Any {N, K} pair may be chosen from the above table, and to keep the numbers reasonably looking, we chose N=2048, K=100. This will be fine tuned later after being able to observe network bandwidth/latency effects and possibly behavior on more CPU limited devices.

Using this caveat however would mean, that the pivot point can be considered secure only after N headers have been imported after the pivot itself. To prove the pivot safe faster, we stop the "gapped verificatios" X headers before the pivot point, and verify every single header onward, including an additioanl X headers post-pivot before accepting the pivot's state. Given the above N and K numbers, we chose X=24 as a safe number.

With this caveat calculated, the fast sync should be modified so that up to the pivoting point - X, only every K=100-th header should be verified (at random), after which all headers up to pivot point + X should be fully verified before starting state database downloading. Note: if a sync fails due to header verification the last N headers must be discarded as they cannot be trusted enough.

Weakness

Blockchain protocols in general (i.e. Bitcoin, Ethereum, and the others) are susceptible to Sybil attacks, where an attacker tries to completely isolate a node from the rest of the network, making it believe a false truth as to what the state of the real network is. This permits the attacker to spend certain funds in both the real network and this "fake bubble". However, the attacker can only maintain this state as long as it's feeding new valid blocks it itself is forging; and to successfully shadow the real network, it needs to do this with a chain height and difficulty close to the real network. In short, to pull off a successful Sybil attack, the attacker needs to match the network's hash rate, so it's a very expensive attack.

Compared to the classical Sybil attack, fast sync provides such an attacker with an extra ability, that of feeding a node a view of the network that's not only different from the real network, but also that might go around the EVM mechanics. The Ethereum protocol only validates state root hashes by processing all the transactions against the previous state root. But by skipping the transaction processing, we cannot prove that the state root contained within the fast sync pivot point is valid or not, so as long as an attacker can maintain a fake blockchain that's on par with the real network, it could create an invalid view of the network's state.

To avoid opening up nodes to this extra attacker ability, fast sync (beside being solely opt-in) will only ever run during an initial sync (i.e. when the node's own blockchain is empty). After a node managed to successfully sync with the network, fast sync is forever disabled. This way anybody can quickly catch up with the network, but after the node caught up, the extra attack vector is plugged in. This feature permits users to safely use the fast sync flag (--fast), without having to worry about potential state root attacks happening to them in the future. As an additional safety feature, if a fast sync fails close to or after the random pivot point, fast sync is disabled as a safety precaution and the node reverts to full, block-processing based synchronization.

Performance

To benchmark the performance of the new algorithm, four separate tests were run: full syncing from scrath on Frontier and Olympic, using both the classical sync as well as the new sync mechanism. In all scenarios there were two nodes running on a single machine: a seed node featuring a fully synced database, and a leech node with only the genesis block pulling the data. In all test scenarios the seed node had a fast-synced database (smaller, less disk contention) and both nodes were given 1GB database cache (--cache=1024).

The machine running the tests was a Zenbook Pro, Core i7 4720HQ, 12GB RAM, 256GB m.2 SSD, Ubuntu 15.04.

Dataset (blocks, states) Normal sync (time, db) Fast sync (time, db)
Frontier, 357677 blocks, 42.4K states 12:21 mins, 1.6 GB 2:49 mins, 235.2 MB
Olympic, 837869 blocks, 10.2M states 4:07:55 hours, 21 GB 31:32 mins, 3.8 GB

The resulting databases contain the entire blockchain (all blocks, all uncles, all transactions), every transaction receipt and generated logs, and the entire state trie of the head 1024 blocks. This allows a fast synced node to act as a full archive node from all intents and purposes.

Closing remarks

The fast sync algorithm requires the functionality defined by eth/63. Because of this, testing in the live network requires for at least a handful of discoverable peers to update their nodes to eth/63. On the same note, verifying that the implementation is truly correct will also entail waiting for the wider deployment of eth/63.

@robotally

This comment has been minimized.

Show comment
Hide comment
@robotally

robotally Oct 9, 2015

Vote Count Reviewers
👍 1 @Gustav-Simonsson
👎 0

Updated: Wed Oct 21 17:17:48 UTC 2015

robotally commented Oct 9, 2015

Vote Count Reviewers
👍 1 @Gustav-Simonsson
👎 0

Updated: Wed Oct 21 17:17:48 UTC 2015

@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Oct 9, 2015

Current coverage is 48.02%

Merging #1889 into develop will decrease coverage by -0.03% as of 0c592c7

Powered by Codecov. Updated on successful CI builds.

codecov-io commented Oct 9, 2015

Current coverage is 48.02%

Merging #1889 into develop will decrease coverage by -0.03% as of 0c592c7

Powered by Codecov. Updated on successful CI builds.

@karalabe karalabe added please review and removed in progress labels Oct 9, 2015

@karalabe

This comment has been minimized.

Show comment
Hide comment
@karalabe

karalabe Oct 9, 2015

Member

Just a mental note, my chain assembly functions do not push chain events into the mux. This should probably be something to discuss as to what - if anything - should be pushed. Another open ended question is how to incorporate the state download progress into eth.syncing (we have no means to know the number of states we need to pull... can we estimate it on the client side? it's something to still figure out).

Member

karalabe commented Oct 9, 2015

Just a mental note, my chain assembly functions do not push chain events into the mux. This should probably be something to discuss as to what - if anything - should be pushed. Another open ended question is how to incorporate the state download progress into eth.syncing (we have no means to know the number of states we need to pull... can we estimate it on the client side? it's something to still figure out).

@Gustav-Simonsson

This comment has been minimized.

Show comment
Hide comment
@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

Great write up, given that the summary of the PR is quite thorough and this whole change is non-trivial, I wanted to start by nitpicking a bit about the summary itself. After review it could be compiled into a nice small paper/report pdf :)

Pivot Point at 1024

Having a point like this seems reasonable, but the choice of 1024 is arbitrary and needs better motivation. For example, in https://blog.ethereum.org/2015/09/14/on-slow-and-fast-block-times/ Vitalik argues practical finality (assuming no attackers with hash power very close to 51%) with a 17s block time averaging 2 minutes (8 blocks for Ethereum)

So a much smaller number should be OK to configure here for the pivot point if the reason for it is to avoid the final sync happening within threshold for probable chain reorgs. Also please consider naming this point to something like "reorg threshold" or "reorg security/probability threshold" to make it more descriptive.

K-th header PoW verification

Seems reasonable. It states we have 1/K chance to spot forgery in K blocks, but I think what is meant here is 1/K being the probability of spotting a forgery in any given block. For a range of K blocks the probability should be close to 1. Please clarify this. Also the later (1/K)^(N/K) refers not to probability of spotting forgery or correctness, but the reverse: probability of an attacker getting away with forgery.

Clarifying this will make it easier for readers unfamiliar with blockchains and also when we later on refer to this PR post-merge.

I'd also add a reference to FIPS 202 section A.1 which is the official claim to SHA3's collision resistance - this can be good given recent discussions around SHA-1's collision resistance which after research ended up being less than initially thought.

The table of {N, K} pairs is not entirely correct.

For example, for N = 1024, K = 44 we get (please excuse Erlang console syntax):

1> Pn = 1/math:pow(2,128).
2.938735877055719e-39
2> f(N), N = 1024.
1024
3>  f(K), K = 44.
44
4> math:pow((1/K), (N/K)) < Pn.
false
5> f(K), K = 43.
43
6> math:pow((1/K), (N/K)) < Pn.
true
7> math:pow((1/K), (N/K)).     
1.2608347620717529e-39

(K needs to be 43 to satisfy (1/K)^(N/K) <= Pn)

For N = 2048, K = 113 it seems some error in the calculation has accumulated more, as we need K = 108 to satisfy the property:

8> f(N), N = 2048.             
2048
9> f(K), K = 113.              
113
10> math:pow((1/K), (N/K)) < Pn.
false
11> f(K), K = 108.
108
12> math:pow((1/K), (N/K)) < Pn.
true
13> math:pow((1/K), (N/K)).     
2.7558821171772803e-39

Finally, the analysis concludes by selecting N=2048, K=100 due to "keep numbers reasonably looking". This is a somewhat arbitrary argument and needs more motivation. These numbers should be fine to configure exactly to achieve a certain probability threshold, even if that results in odd-looking numbers not falling in common ranges such as powers of two or round decimal numbers. The important thing is that their configuration in code is well documented.

Member

Gustav-Simonsson commented Oct 12, 2015

Great write up, given that the summary of the PR is quite thorough and this whole change is non-trivial, I wanted to start by nitpicking a bit about the summary itself. After review it could be compiled into a nice small paper/report pdf :)

Pivot Point at 1024

Having a point like this seems reasonable, but the choice of 1024 is arbitrary and needs better motivation. For example, in https://blog.ethereum.org/2015/09/14/on-slow-and-fast-block-times/ Vitalik argues practical finality (assuming no attackers with hash power very close to 51%) with a 17s block time averaging 2 minutes (8 blocks for Ethereum)

So a much smaller number should be OK to configure here for the pivot point if the reason for it is to avoid the final sync happening within threshold for probable chain reorgs. Also please consider naming this point to something like "reorg threshold" or "reorg security/probability threshold" to make it more descriptive.

K-th header PoW verification

Seems reasonable. It states we have 1/K chance to spot forgery in K blocks, but I think what is meant here is 1/K being the probability of spotting a forgery in any given block. For a range of K blocks the probability should be close to 1. Please clarify this. Also the later (1/K)^(N/K) refers not to probability of spotting forgery or correctness, but the reverse: probability of an attacker getting away with forgery.

Clarifying this will make it easier for readers unfamiliar with blockchains and also when we later on refer to this PR post-merge.

I'd also add a reference to FIPS 202 section A.1 which is the official claim to SHA3's collision resistance - this can be good given recent discussions around SHA-1's collision resistance which after research ended up being less than initially thought.

The table of {N, K} pairs is not entirely correct.

For example, for N = 1024, K = 44 we get (please excuse Erlang console syntax):

1> Pn = 1/math:pow(2,128).
2.938735877055719e-39
2> f(N), N = 1024.
1024
3>  f(K), K = 44.
44
4> math:pow((1/K), (N/K)) < Pn.
false
5> f(K), K = 43.
43
6> math:pow((1/K), (N/K)) < Pn.
true
7> math:pow((1/K), (N/K)).     
1.2608347620717529e-39

(K needs to be 43 to satisfy (1/K)^(N/K) <= Pn)

For N = 2048, K = 113 it seems some error in the calculation has accumulated more, as we need K = 108 to satisfy the property:

8> f(N), N = 2048.             
2048
9> f(K), K = 113.              
113
10> math:pow((1/K), (N/K)) < Pn.
false
11> f(K), K = 108.
108
12> math:pow((1/K), (N/K)) < Pn.
true
13> math:pow((1/K), (N/K)).     
2.7558821171772803e-39

Finally, the analysis concludes by selecting N=2048, K=100 due to "keep numbers reasonably looking". This is a somewhat arbitrary argument and needs more motivation. These numbers should be fine to configure exactly to achieve a certain probability threshold, even if that results in odd-looking numbers not falling in common ranges such as powers of two or round decimal numbers. The important thing is that their configuration in code is well documented.

@Gustav-Simonsson

This comment has been minimized.

Show comment
Hide comment
@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

The time for normal sync on Olympic is missing.

Member

Gustav-Simonsson commented Oct 12, 2015

The time for normal sync on Olympic is missing.

@karalabe

This comment has been minimized.

Show comment
Hide comment
@karalabe

karalabe Oct 12, 2015

Member

Good catch with the rounding error, I'll have to create a tad better code for it. Maybe I could use float64 too and not need to do this hula hoop jumping with big.Float. Regarding the Olympic sync time, I know, I started running it ad ran out of disk space, so the whole thing crashed after 3 hours :P Will try to run it again tonight.

Member

karalabe commented Oct 12, 2015

Good catch with the rounding error, I'll have to create a tad better code for it. Maybe I could use float64 too and not need to do this hula hoop jumping with big.Float. Regarding the Olympic sync time, I know, I started running it ad ran out of disk space, so the whole thing crashed after 3 hours :P Will try to run it again tonight.

@karalabe

This comment has been minimized.

Show comment
Hide comment
@karalabe

karalabe Oct 12, 2015

Member

I've corrected the rounding issue in the {N, K} generator, and updated the description with the new code and the corrected values. The selected values in the PR as well within range, so no stress there.

The selection of 2048 was really because at the current blockchain size, it requires approximately 5K header verifications, whereas the extremities of the listed values already approach 6+K.

The selection of K itself is not that very relevant (+- a few, from the implementation's point of view), as it's just a maximum gap. The fast sync code does the verifications whenever importing a batch, and always verifies the last of a batch, so when importing 192 (default batch size), we're actually doing 2 random checks + the last always, so practically, K suggest 100, but we have more like 66. Also, as we're only processing one batch of headers at a time, if you verify less than the CPU cores available, you'll either way have to wait for it to complete, so raising it higher doesn't gain you much. But I agree that we should decide on why it is some arbitrary value :)

Lastly for the pivot point... I don't think it looses us much if we process 1K, but maybe on an embedded system it's more painful so I'm happy with reducing it, just let's figure out a reasonable value to reduce to.

Member

karalabe commented Oct 12, 2015

I've corrected the rounding issue in the {N, K} generator, and updated the description with the new code and the corrected values. The selected values in the PR as well within range, so no stress there.

The selection of 2048 was really because at the current blockchain size, it requires approximately 5K header verifications, whereas the extremities of the listed values already approach 6+K.

The selection of K itself is not that very relevant (+- a few, from the implementation's point of view), as it's just a maximum gap. The fast sync code does the verifications whenever importing a batch, and always verifies the last of a batch, so when importing 192 (default batch size), we're actually doing 2 random checks + the last always, so practically, K suggest 100, but we have more like 66. Also, as we're only processing one batch of headers at a time, if you verify less than the CPU cores available, you'll either way have to wait for it to complete, so raising it higher doesn't gain you much. But I agree that we should decide on why it is some arbitrary value :)

Lastly for the pivot point... I don't think it looses us much if we process 1K, but maybe on an embedded system it's more painful so I'm happy with reducing it, just let's figure out a reasonable value to reduce to.

Show outdated Hide outdated core/blockchain.go Outdated
Show outdated Hide outdated core/blockchain.go Outdated
Show outdated Hide outdated core/blockchain.go Outdated
Show outdated Hide outdated core/blockchain.go Outdated
Show outdated Hide outdated core/blockchain.go Outdated
Show outdated Hide outdated core/blockchain.go Outdated
@@ -231,19 +347,12 @@ func (bc *BlockChain) Reset() {
// ResetWithGenesisBlock purges the entire blockchain, restoring it to the
// specified genesis state.
func (bc *BlockChain) ResetWithGenesisBlock(genesis *types.Block) {
// Dump the entire block chain and purge the caches
bc.SetHead(0)

This comment has been minimized.

@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

Should this be after getting the lock?

@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

Should this be after getting the lock?

This comment has been minimized.

@karalabe

karalabe Oct 12, 2015

Member

SetHead has its own lock.

@karalabe

karalabe Oct 12, 2015

Member

SetHead has its own lock.

Show outdated Hide outdated core/blockchain.go Outdated
Show outdated Hide outdated core/blockchain.go Outdated
Show outdated Hide outdated core/blockchain.go Outdated
Show outdated Hide outdated core/blockchain.go Outdated
td := new(big.Int).Add(header.Difficulty, ptd)
// Make sure no inconsistent state is leaked during insertion
self.mu.Lock()

This comment has been minimized.

@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

Do the writes of TD and header after the if TD > ... block below need locking? If not this lock could be taken only if the TD is greater than current.

@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

Do the writes of TD and header after the if TD > ... block below need locking? If not this lock could be taken only if the TD is greater than current.

This comment has been minimized.

@karalabe

karalabe Oct 13, 2015

Member

Yes, they do. I've seen quite a lot of weird anomalies while testing and they turned out to be caused by other parts of the code seeing partial database writes. Of course the part in the if is much more prone to this as it's a long running op compared to the td/header writing, but they too can lead to errors due to partial views of the database. Additionally, we need to obtain the mutex already to access the current head (in the if's condition).

@karalabe

karalabe Oct 13, 2015

Member

Yes, they do. I've seen quite a lot of weird anomalies while testing and they turned out to be caused by other parts of the code seeing partial database writes. Of course the part in the if is much more prone to this as it's a long running op compared to the td/header writing, but they too can lead to errors due to partial views of the database. Additionally, we need to obtain the mutex already to access the current head (in the if's condition).

for i := header.Number.Uint64() + 1; GetCanonicalHash(self.chainDb, i) != (common.Hash{}); i++ {
DeleteCanonicalHash(self.chainDb, i)
}
// Overwrite any stale canonical number assignments

This comment has been minimized.

@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

"canonical number assignments" can probably be simplified to "headers"

@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

"canonical number assignments" can probably be simplified to "headers"

This comment has been minimized.

@karalabe

karalabe Oct 13, 2015

Member

Canonical numbers aren't the headers. The canonical numbering is an association of block numbers to block hashes. No headers are rewritten or even touched, we only update what we believe to be the current best chain progression.

@karalabe

karalabe Oct 13, 2015

Member

Canonical numbers aren't the headers. The canonical numbering is an association of block numbers to block hashes. No headers are rewritten or even touched, we only update what we believe to be the current best chain progression.

Show outdated Hide outdated core/blockchain.go Outdated
Show outdated Hide outdated core/blockchain.go Outdated
Show outdated Hide outdated core/blockchain.go Outdated
}
}
}
// Start as many worker threads as goroutines allowed

This comment has been minimized.

@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

Btw, did you try to spawn less than GOMAXPROCS to see if the max is indeed optimal? Don't know too much about Go scheduling but if the node is also processing other things it could be a lower number is optimal. Would be interesting to benchmark this on a full sync.

@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

Btw, did you try to spawn less than GOMAXPROCS to see if the max is indeed optimal? Don't know too much about Go scheduling but if the node is also processing other things it could be a lower number is optimal. Would be interesting to benchmark this on a full sync.

This comment has been minimized.

@karalabe

karalabe Oct 13, 2015

Member

That would be a bit too much of a micro optimisation for me. Even if you benchmark that N is better than M, it will depend on you machine, the capacity of individual cores, ill also depend on what the machine itself as well as the node is doing in between. Finally, even if you manage to figure out a number, any future commit might make that number completely off. IMHO we should give Go a fighting chance to run everything concurrently, but leave it to its own scheduler to refrain itself if need be.

@karalabe

karalabe Oct 13, 2015

Member

That would be a bit too much of a micro optimisation for me. Even if you benchmark that N is better than M, it will depend on you machine, the capacity of individual cores, ill also depend on what the machine itself as well as the node is doing in between. Finally, even if you manage to figure out a number, any future commit might make that number completely off. IMHO we should give Go a fighting chance to run everything concurrently, but leave it to its own scheduler to refrain itself if need be.

This comment has been minimized.

@Gustav-Simonsson

Gustav-Simonsson Oct 13, 2015

Member

Makes sense, thanks for clarifying.

@Gustav-Simonsson

Gustav-Simonsson Oct 13, 2015

Member

Makes sense, thanks for clarifying.

checkPow := verify[index]
var err error
if index == 0 {

This comment has been minimized.

@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

Should this header too be verified with it's parent?

@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

Should this header too be verified with it's parent?

This comment has been minimized.

@karalabe

karalabe Oct 13, 2015

Member

Ah sorry, probably the name of the method is a bit weird. Both methods do a full verification, with the difference that for one I specifically provide the parent and for the other the parent is retrieved from the database. If the index == 0, then I need to reach out to the database since I don't have the parent header myself. On the other hand, i index > 0, then there's no need to do the expensive database lookup, if I know anyway what the parent is.

@karalabe

karalabe Oct 13, 2015

Member

Ah sorry, probably the name of the method is a bit weird. Both methods do a full verification, with the difference that for one I specifically provide the parent and for the other the parent is retrieved from the database. If the index == 0, then I need to reach out to the database since I don't have the parent header myself. On the other hand, i index > 0, then there's no need to do the expensive database lookup, if I know anyway what the parent is.

atomic.AddInt32(&stats.processed, 1)
}
}
// Start as many worker threads as goroutines allowed

This comment has been minimized.

@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

This one can be extracted into a helper function since it's equivalent to the one used in InsertHeaderChain

@Gustav-Simonsson

Gustav-Simonsson Oct 12, 2015

Member

This one can be extracted into a helper function since it's equivalent to the one used in InsertHeaderChain

This comment has been minimized.

@karalabe

karalabe Oct 13, 2015

Member

I'll think about it a bit more. I'm reluctant to split it out doe to making the code harder to understand, but I'll do it if I can find a clean enough way.

@karalabe

karalabe Oct 13, 2015

Member

I'll think about it a bit more. I'm reluctant to split it out doe to making the code harder to understand, but I'll do it if I can find a clean enough way.

@karalabe

This comment has been minimized.

Show comment
Hide comment
@karalabe

karalabe Oct 20, 2015

Member

I'll squash all the commits after ca96683 when reviewers give the thumbs up.

Member

karalabe commented Oct 20, 2015

I'll squash all the commits after ca96683 when reviewers give the thumbs up.

@alexvandesande

This comment has been minimized.

Show comment
Hide comment
@alexvandesande

alexvandesande commented Oct 20, 2015

👏🏻

@Gustav-Simonsson

This comment has been minimized.

Show comment
Hide comment
@Gustav-Simonsson

Gustav-Simonsson Oct 20, 2015

Member

aside from two small comments and questions (see gitter chat) 👍

Member

Gustav-Simonsson commented Oct 20, 2015

aside from two small comments and questions (see gitter chat) 👍

// Generate the list of headers that should be POW verified
verify := make([]bool, len(chain))
for i := 0; i < len(verify)/checkFreq; i++ {
index := i*checkFreq + self.rand.Intn(checkFreq)

This comment has been minimized.

@Gustav-Simonsson

Gustav-Simonsson Oct 21, 2015

Member

Actually this makes it a bit more predictable than it could be, as it guarantees there is one and only one PoW verification in every checkFreq long interval. A better way would be to make it fully random by generating len(chain)/checkFreq indexes randomly in the range 0 to len(chain)

@Gustav-Simonsson

Gustav-Simonsson Oct 21, 2015

Member

Actually this makes it a bit more predictable than it could be, as it guarantees there is one and only one PoW verification in every checkFreq long interval. A better way would be to make it fully random by generating len(chain)/checkFreq indexes randomly in the range 0 to len(chain)

This comment has been minimized.

@karalabe

karalabe Oct 21, 2015

Member

Hmm, though then we wouldn't be able to guarantee that every K batch is at least minimally verified, so I'm not sure about it.

@karalabe

karalabe Oct 21, 2015

Member

Hmm, though then we wouldn't be able to guarantee that every K batch is at least minimally verified, so I'm not sure about it.

@karalabe

This comment has been minimized.

Show comment
Hide comment
@karalabe

karalabe Oct 21, 2015

Member

@Gustav-Simonsson @obscuren Squashed all the review and discussion commits into one, ready for merge from my part.

Member

karalabe commented Oct 21, 2015

@Gustav-Simonsson @obscuren Squashed all the review and discussion commits into one, ready for merge from my part.

@Gustav-Simonsson

This comment has been minimized.

Show comment
Hide comment
@Gustav-Simonsson

Gustav-Simonsson Oct 21, 2015

Member

👍 to merge. the comment about random K is not a blocker. though would be nice to improve later on, perhaps in combination with tweaks of other fast sync params.

Member

Gustav-Simonsson commented Oct 21, 2015

👍 to merge. the comment about random K is not a blocker. though would be nice to improve later on, perhaps in combination with tweaks of other fast sync params.

obscuren added a commit that referenced this pull request Oct 21, 2015

Merge pull request #1889 from karalabe/fast-sync-rebase
eth/63 fast synchronization algorithm

@obscuren obscuren merged commit 0467a6c into ethereum:develop Oct 21, 2015

5 checks passed

buildbot/ARM Go pull requests DEV build done.
Details
buildbot/Linux Go pull requests DEV build done.
Details
buildbot/OSX Go pull requests DEV build done.
Details
buildbot/Windows Go pull requests DEV build done.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@obscuren obscuren removed the please review label Oct 21, 2015

@obscuren obscuren modified the milestone: 1.3.0 Oct 30, 2015

@chriseth chriseth referenced this pull request Feb 8, 2016

Closed

PV63 #177

@jamesray1

This comment has been minimized.

Show comment
Hide comment
@jamesray1

jamesray1 Jul 5, 2017

From the initial comment:

This allows a fast synced node to still retain its status an an archive node containing all historical data for user queries

This allows a fast synced node to contain all historical data for user queries (like a classically synced node, and thus not influence the network's health in general)...

jamesray1 commented Jul 5, 2017

From the initial comment:

This allows a fast synced node to still retain its status an an archive node containing all historical data for user queries

This allows a fast synced node to contain all historical data for user queries (like a classically synced node, and thus not influence the network's health in general)...

@yihaient

This comment has been minimized.

Show comment
Hide comment
@yihaient

yihaient Jan 24, 2018

Can you please explain how to do this for a beginner? I am unable to download 100% of the blocks and I have a working MAcbook Pro that is brand new and the latest Mist/Ethereum Wallet Desktop App. I do not understand please help! thanks

yihaient commented Jan 24, 2018

Can you please explain how to do this for a beginner? I am unable to download 100% of the blocks and I have a working MAcbook Pro that is brand new and the latest Mist/Ethereum Wallet Desktop App. I do not understand please help! thanks

@yihaient

This comment has been minimized.

Show comment
Hide comment
@yihaient

yihaient Jan 24, 2018

@karalabe @Gustav-Simonsson Problem Above ^^^^^

Can you please explain how to do this for a beginner? I am unable to download 100% of the blocks and I have a working MAcbook Pro that is brand new and the latest Mist/Ethereum Wallet Desktop App. I do not understand please help! thanks

yihaient commented Jan 24, 2018

@karalabe @Gustav-Simonsson Problem Above ^^^^^

Can you please explain how to do this for a beginner? I am unable to download 100% of the blocks and I have a working MAcbook Pro that is brand new and the latest Mist/Ethereum Wallet Desktop App. I do not understand please help! thanks

@jjtny1

This comment has been minimized.

Show comment
Hide comment
@jjtny1

jjtny1 Feb 15, 2018

@karalabe I have a question on what you mean by
"This allows a fast synced node to still retain its status an an archive node containing all historical data for user queries (and thus not influence the network's health in general)".
How can a node that only has block headers and a receipt chain function as an archive node and have all historic data? You are only receiving the state for the latest Merkle Patricia Trie. How are you getting historical state?

jjtny1 commented Feb 15, 2018

@karalabe I have a question on what you mean by
"This allows a fast synced node to still retain its status an an archive node containing all historical data for user queries (and thus not influence the network's health in general)".
How can a node that only has block headers and a receipt chain function as an archive node and have all historic data? You are only receiving the state for the latest Merkle Patricia Trie. How are you getting historical state?

@ivica7

This comment has been minimized.

Show comment
Hide comment
@ivica7

ivica7 Mar 7, 2018

@jjtny1 that would be my question too. I thought geth is downloading the state in parallel to downloading the block headers, hence it pulls some historical states on the way, but it's not downloading it completely and the complete state download happens only at the pivot point(???)

In #15001 (comment) @karalabe is talking about the difficulties because of morphing state during the fast sync.

ivica7 commented Mar 7, 2018

@jjtny1 that would be my question too. I thought geth is downloading the state in parallel to downloading the block headers, hence it pulls some historical states on the way, but it's not downloading it completely and the complete state download happens only at the pivot point(???)

In #15001 (comment) @karalabe is talking about the difficulties because of morphing state during the fast sync.

@helinwang

This comment has been minimized.

Show comment
Hide comment
@helinwang

helinwang May 28, 2018

But by skipping the transaction processing, we cannot prove that the state root contained within the fast sync pivot point is valid or not, so as long as an attacker can maintain a fake blockchain that's on par with the real network, it could create an invalid view of the network's state.

Thanks @karalabe! Sorry to bother you 3 years later :)
One question: isn't state root of the pivot point valid if the normal sync from the pivot point to the tip (1024 blocks away) gives the same state root hash with the tip?

helinwang commented May 28, 2018

But by skipping the transaction processing, we cannot prove that the state root contained within the fast sync pivot point is valid or not, so as long as an attacker can maintain a fake blockchain that's on par with the real network, it could create an invalid view of the network's state.

Thanks @karalabe! Sorry to bother you 3 years later :)
One question: isn't state root of the pivot point valid if the normal sync from the pivot point to the tip (1024 blocks away) gives the same state root hash with the tip?

@lyh168

This comment has been minimized.

Show comment
Hide comment
@lyh168

lyh168 Sep 30, 2018

Hi I read this article had some question, if i set fast sync, now my chaindata(include states) is may 40G, but 1 year later, my chaindata may be large again, because the states trash history have more again, have some Solutions to delete states trash history in operation? thks

lyh168 commented Sep 30, 2018

Hi I read this article had some question, if i set fast sync, now my chaindata(include states) is may 40G, but 1 year later, my chaindata may be large again, because the states trash history have more again, have some Solutions to delete states trash history in operation? thks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment