Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented snapshot synchronization #1473

Merged
merged 10 commits into from Jul 7, 2022

Conversation

jeongkyun-oh
Copy link
Contributor

@jeongkyun-oh jeongkyun-oh commented Jun 30, 2022

Proposed changes

snapsync is a faster synchronisation feature derived from Ethereum. This PR ported the snapsync related PR from Ethereum.

Synchronisation methods from Ethereum

  1. full sync: fetch headers and bodies (transactions) only, but reexecute the downloaded transactions to output transaction receipts and world states. Klaytn supports only full sync for now.
  2. fast sync: fetch headers, bodies, and receipts until pivot block (no trie verification), and download all the trie nodes on the pivot.
  3. snap sync: fetch headers, bodies, and receipts until pivot block and download snapshot data (trie leaf nodes) on the pivot after which the entire trie is reconstructed.

We had to fetch staking information as well to verify headers.

Snap sync components

  1. snap protocol: a message protocol to send the snapshot data including accounts, storages, bytecodes as well as trie nodes data. Basically, it sends all the data which is needed to reconstruct a state trie on a specific pivot block. Snap protocol is independent to the original klay/istanbul protocol.
  2. snap syncer: snap syncer schedules the snapshot data fetching and healing the reconstructed trie. The healing phase is explained later.

Snap sync process

  1. Assuming that the remote peers have the snapshot data which is qualified to serve snap protocol.
  2. Fetch a pivot block first. Pivot block is a specific block where the state trie is reconstructed. Pivot block is dynamically moved forward by 64 blocks. This is because a remote peer hold snapshot data about recent 128 blocks.
  3. Snap sync is started. Snap syncer scheduling the data retrieval. At the beginning, It divides hash space (0x000...000 ... 0xfff...fff) to 16 spaces, and it requests the range to retrieve the snapshot account data whose key is within that range. So, max 16 snap peers can be served to send those data.
  4. The remote peers iterates the snapshot data in the increasing order of the keys, and pack the data to send back to the requested peer. At this moment, the merkle proof is added to the packet.
  5. After receiving the account data, it stores the data to database. If the account is smart contract, schedule code and storage retrieval as well. If all related data is downloaded, it processes the data such that creating partial state trie and persisting the data (range of accounts, and partial trie nodes) to database.
  6. The syncer synchronises the snapshot data based on a state root hash. However, the root would be changed every time the pivot moves. So, the downloaded snapshot data might be out-dated. We considered this data as correct even though it is not and will be fixed in heal stage.
  7. After all account data is downloaded, the state trie has to be healed since the out-dated accounts were inserted to the trie. In this stage, heal task is scheduled. It first fetch root and then children recursively. If it meets a subtrie root, then it doesn't go into deeper trie. Likewise it can heal the entire trie from root node.
  8. After reconstructing the state trie, the sync mode is changed to full, and it will creates block as we know.

TODO

  • log enhancement
  • add metrics
  • stack trie update
  • deliver snap packet test

Types of changes

Please put an x in the boxes related to your change.

  • Bugfix
  • New feature or enhancement
  • Others

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING GUIDELINES doc
  • I have signed the CLA
  • Lint and unit tests pass locally with my changes ($ make test)
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

@jeongkyun-oh jeongkyun-oh mentioned this pull request Jun 30, 2022
31 tasks
@jeongkyun-oh jeongkyun-oh self-assigned this Jun 30, 2022
@aidan-kwon aidan-kwon added the need to merge Need to merge for the next time label Jul 1, 2022
node/cn/handler.go Outdated Show resolved Hide resolved
// Send back anything accumulated (or empty in case of errors)
return p2p.Send(peer.rw, AccountRangeMsg, &AccountRangePacket{
ID: req.ID,
Accounts: accounts,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accounts can be nil. Is it okay?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any test case to cover this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added testcases, but encoding/decoding nil accounts were okay. Am I missing something? Please take another look.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeongkyun-oh I added my test case with the name of handler2_test.go

@kjhman21
Copy link
Collaborator

kjhman21 commented Jul 3, 2022

@jeongkyun-oh Is it possible to turn off this feature? For example, CNs do not want to provide snapshot request to avoid performance degradation.

node/cn/snap/handler.go Outdated Show resolved Hide resolved
node/cn/snap/nodeset.go Outdated Show resolved Hide resolved
node/cn/snap/protocol.go Outdated Show resolved Hide resolved
node/cn/snap/protocol.go Outdated Show resolved Hide resolved
node/cn/snap/sync.go Outdated Show resolved Hide resolved
@jeongkyun-oh jeongkyun-oh force-pushed the 220630-impl-snap-protocol branch 3 times, most recently from c4d46e4 to 4986395 Compare July 5, 2022 00:37
}

// incHash returns the next hash, in lexicographical order (a.k.a plus one)
func incHash(h common.Hash) common.Hash {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't handle the overflow. It seems that it is intended, right? We might need a comment for this.

snapshots.Disable()
}
logger.Warn("Enabling snapshot sync prototype")
d.snapSync = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is d.snapSync for? Do we have any reason to execute the above lines only once?

@jeongkyun-oh jeongkyun-oh added this to In progress in Storage via automation Jul 6, 2022
@jeongkyun-oh jeongkyun-oh added this to the v1.9.0 milestone Jul 6, 2022
ethan-kr
ethan-kr previously approved these changes Jul 7, 2022
Storage automation moved this from In progress to Review in progress Jul 7, 2022
Copy link
Collaborator

@kjhman21 kjhman21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but need to have more test case.

@jeongkyun-oh jeongkyun-oh merged commit ebeb245 into klaytn:dev Jul 7, 2022
Storage automation moved this from Review in progress to Done Jul 7, 2022
@jeongkyun-oh jeongkyun-oh linked an issue Jul 7, 2022 that may be closed by this pull request
@blukat29 blukat29 removed the need to merge Need to merge for the next time label Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Snap synchronization support
5 participants