Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Warp sync part I #9227

Merged
29 commits merged into from
Aug 2, 2021
Merged

Warp sync part I #9227

29 commits merged into from
Aug 2, 2021

Conversation

arkpar
Copy link
Member

@arkpar arkpar commented Jun 29, 2021

Closes #1208

Initial support for syncing with GRANDPA warp sync protocol. It works as follows:

  1. Download and verify finality proofs up to the latest finalized blocks.
  2. Download and whole state snapshot for the block that is proven to be final.
  3. GRANDPA and BABE are reset with authority set and epoch information read from the imported state.

Still left to do in the following PR:

  • Support for syncing child tries.
  • A background process to download full block history after initial sync. This will also need to reconstruct authority set changes in sc-finality-grandpa.

Marked with B0-silent as this is still WIP and will be considered complete after the following PR.

polkadot companion: paritytech/polkadot#3382

@arkpar arkpar added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit labels Jun 29, 2021
@expenses
Copy link
Contributor

What's the use of warp-syncing a full node? Won't (most?) full nodes need all the blocks?

@arkpar
Copy link
Member Author

arkpar commented Jun 30, 2021

As I've mentioned in the description, full blocks will be downloaded in the background after warp sync is complete.

@arkpar arkpar mentioned this pull request Jul 6, 2021
@rphmeier
Copy link
Contributor

rphmeier commented Jul 9, 2021

#1208 - does this close it?

@arkpar
Copy link
Member Author

arkpar commented Jul 10, 2021

Right, closes #1208

@rphmeier
Copy link
Contributor

Since #1208 is the primary source of documentation for warp sync, it would be good to preserve that documentation either in-line in the code somewhere or in a wiki document. Otherwise, it will disappear and the basis for the code will be less apparent.

@arkpar
Copy link
Member Author

arkpar commented Jul 15, 2021

Most of what's described in #1208 was not really implemented. The actual thing is buried somewhere in the middle of the discussion there.
I've added basic description to client/network/README.md in e5991c1

@arkpar
Copy link
Member Author

arkpar commented Jul 15, 2021

@andresilva Waiting for your review

Copy link
Contributor

@andresilva andresilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall changes LGTM, just some minor nits. It would be nice to have some tests for the sync logic changes.

Sorry for the delay in reviewing 🙏.

test-utils/test-runner/src/client.rs Show resolved Hide resolved
primitives/consensus/common/src/import_queue.rs Outdated Show resolved Hide resolved
@@ -817,18 +818,6 @@ impl<B, E, Block, RA> Client<B, E, Block, RA> where
}
};

// ensure parent block is finalized to maintain invariant that
// finality is called sequentially.
if finalized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this? Maybe guard it under parent_exists instead?

Copy link
Member Author

@arkpar arkpar Jul 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird hack. Why only finalize a single parent? What if their parent is not finalized as well?
Sequential finalization is already insured by the DB layer. There's ensure_sequential_finalization function that is called on block commit. I believe this code was added before ensure_sequential_finalization and is no longer needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply_finality_with_block_hash was taking the tree route from the last finalized block and making sure all intermediary blocks are finalized, but yeah you're right that this is guaranteed by the DB layer now. The only thing extra that apply_finality_with_block_hash is doing is sending finality notifications for all these intermediary blocks (although we cap it to 256 blocks), I am not sure if we still should keep doing that as some subsystems might be expecting these finality notifications (@rphmeier is this needed for parachains?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I'll add it back with a comment.

@@ -16,58 +16,57 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though there's a generic backend for generating warp proofs this is still quite coupled to GRANDPA and feels a bit wrong to add it to the generic layer of networking.

Copy link
Member Author

@arkpar arkpar Jul 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not think of a good place to put it. Leaving it in a stand-alone crate would introduce a circular dependency because it uses types from sc-network and sc-network is not really designed for pluggable sync algorithms. Ideally there should be a separate sc-network-api crate which only contains the network interface, but not the actual implementation. This however turned out to be a substantial refactoring outside of the scope of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this more abstract in the future, there's no point blocking this PR now on this since the only consensus engine we have right now that supports Warping is GRANDPA/BABE anyway.

client/consensus/babe/src/lib.rs Outdated Show resolved Hide resolved
Err(sp_api::ApiError::Application(_)) => {
// The new API is not supported in this runtime. Try reading directly from storage.
// This code may be removed once warp sync to an old runtime is no longer needed.
for prefix in ["GrandpaFinality", "Grandpa"] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

// Read authority list and set id from the state.
self.authority_set_hard_forks.clear();
let block_id = BlockId::hash(hash);
let authorities = self.inner.runtime_api().grandpa_authorities(&block_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason taking this from state works is because we're querying at a finalized block, maybe we should add a comment here to that effect.

client/informant/src/display.rs Outdated Show resolved Hide resolved
client/network/src/protocol/sync.rs Show resolved Hide resolved
client/network/src/protocol/sync.rs Show resolved Hide resolved
@rphmeier
Copy link
Contributor

I'm skeptical of merging anything touching sync code right now. What is our maintenance plan for this code?

@arkpar
Copy link
Member Author

arkpar commented Jul 17, 2021

@rphmeier Could you please elaborate? What's the reason for scepticism?
The "maintenance plan" is the same as for the rest of the code base. Again, do you have reasons to believe the sync code is poorly maintained?

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>
@arkpar
Copy link
Member Author

arkpar commented Jul 17, 2021

@andresilva Tests are in the following PR #9284

@arkpar
Copy link
Member Author

arkpar commented Jul 22, 2021

@andresilva Care to take another look?

Copy link
Contributor

@andresilva andresilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think there's nothing here that would break warp sync for smoldot.

@@ -1331,14 +1412,21 @@ where
match self.client.status(BlockId::Hash(hash)) {
Ok(sp_blockchain::BlockStatus::InChain) => {
// When re-importing existing block strip away intermediates.
let _ = block.take_intermediate::<BabeIntermediate<Block>>(INTERMEDIATE_KEY)?;
let _ = block.take_intermediate::<BabeIntermediate<Block>>(INTERMEDIATE_KEY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC you removed the ? because it doesn't really make a difference for correctness whether we can take the intermediate or not, either it's not there in the first place or has the wrong type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, when reimporting an existing block we assume it has been validated the first time.

client/network/src/config.rs Outdated Show resolved Hide resolved
@@ -16,58 +16,57 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this more abstract in the future, there's no point blocking this PR now on this since the only consensus engine we have right now that supports Warping is GRANDPA/BABE anyway.

@@ -817,18 +818,6 @@ impl<B, E, Block, RA> Client<B, E, Block, RA> where
}
};

// ensure parent block is finalized to maintain invariant that
// finality is called sequentially.
if finalized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply_finality_with_block_hash was taking the tree route from the last finalized block and making sure all intermediary blocks are finalized, but yeah you're right that this is guaranteed by the DB layer now. The only thing extra that apply_finality_with_block_hash is doing is sending finality notifications for all these intermediary blocks (although we cap it to 256 blocks), I am not sure if we still should keep doing that as some subsystems might be expecting these finality notifications (@rphmeier is this needed for parachains?).

client/network/src/protocol/sync.rs Outdated Show resolved Hide resolved
client/consensus/babe/src/lib.rs Outdated Show resolved Hide resolved
client/service/src/builder.rs Outdated Show resolved Hide resolved
client/network/README.md Outdated Show resolved Hide resolved
client/finality-grandpa/src/import.rs Outdated Show resolved Hide resolved
client/db/src/lib.rs Outdated Show resolved Hide resolved
client/network/src/protocol/sync.rs Outdated Show resolved Hide resolved
}
if let Some(request) = sync.next_warp_poof_request() {
for (id, peer) in self.peers.iter_mut() {
if peer.state.is_available() && peer.best_number >= sync.target_block_number() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target_block_number always return 0 until we hit the phase of downloading the state proof. So, we could also send such a warp proof request to some peer that itself is still syncing?

Sounds like a bug?

Copy link
Member Author

@arkpar arkpar Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Unfortunately with the current GRANDPA warp sync protocol it is not possible to learn the latest finalized block number before downloading and verifying all proofs first. In my first implementation I've set a recently announced block as target, but in general that would not work because there's no way to know if it is finalized in advance. I'll add a generic check that a peer is not too far behind the majority of the peerset instead.

@@ -620,6 +687,17 @@ impl<B: BlockT> ChainSync<B> {
return Ok(None)
}

if let SyncMode::Warp = &self.mode {
// TODO: wait for 3 peers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the check for local testing. Will put it back.

}
if let Some(request) = sync.next_state_request() {
for (id, peer) in self.peers.iter_mut() {
if peer.state.is_available() && peer.best_number >= sync.target_block_number() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

log::debug!(target: "sync", "Starting warp state sync.");
if let Some(provider) = &self.warp_sync_provider {
self.warp_sync =
Some(WarpSync::new(self.client.clone(), provider.clone()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean we will start again from zero when the node is restarted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Proofs and state are downloaded into memory anyway. Block history download won't be restarted however, but that's coming in the following PR.

@arkpar arkpar requested a review from bkchr July 30, 2021 06:14
@arkpar
Copy link
Member Author

arkpar commented Jul 30, 2021

@cheme @bkchr @andresilva Could you please take a look at the companion PR paritytech/polkadot#3382 as well?

@arkpar
Copy link
Member Author

arkpar commented Aug 2, 2021

bot merge

@ghost
Copy link

ghost commented Aug 2, 2021

Trying merge.

@ghost ghost merged commit c093de5 into master Aug 2, 2021
@ghost ghost deleted the a-warp-sync branch August 2, 2021 08:27
ericjmiller pushed a commit to ericjmiller/substrate that referenced this pull request Aug 2, 2021
* Started warp sync

* BABE & GRANDPA recovery

* Warp sync protocol

* Sync warp proofs first

* Added basic documentation

* Apply suggestions from code review

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>

* Style changes

* Apply suggestions from code review

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>

* fmt

* Apply suggestions from code review

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>

* Fixed chage trie pruning wrt missing blocks

* Restore parent finalization

* fmt

* fmt

* Revert pwasm-utils bump

* Change error type & check API version

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Build fix

* Fixed target block check

* Formatting

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
bkchr added a commit that referenced this pull request Aug 2, 2021
* copying rustfmt from root to node-template build.
Also, running rustfmt on this.

* Update .maintain/node-template-release/src/main.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Add some important events to batch & staking to ensure ease of analysis (#9462)

* Add some important events to batch & staking to ensure ease of analysis

* Update frame/staking/src/pallet/mod.rs

Co-authored-by: Zeke Mostov <32168567+emostov@users.noreply.github.com>

* Update lib.rs

* fix test

Co-authored-by: Zeke Mostov <32168567+emostov@users.noreply.github.com>
Co-authored-by: Shawn Tabrizi <shawntabrizi@gmail.com>

* Move client consensus parts out of primitives and into client/consensus/api (#9319)

* moved client code out of primitives

* bump ci

* Fixup from merge.

* Removed unused deps thanks to review feedback

* Removing unneeded deps

* updating lock file

* note about rustfmt

* fixed typo to bump ci

* Move lonely CacheKeyId to parent

* cargo fmt

* updating import style

* Update docs/STYLE_GUIDE.md

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>

* Refactor Benchmarks for Less Wasm Memory Usage (#9373)

* extract repeat out of benchmark

* remove r

* unused

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_balances --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/balances/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_balances --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/balances/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_staking --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/staking/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* use linked map to keep order

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_balances --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/balances/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* Delete pallet_balances.rs

* Delete out

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_staking --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/staking/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* steps and repeat to tuple (current_*, total_*)

* idea for list command

* fmt

* use benchmark list in cli

* handle steps in cli

* move log update to cli

* fmt

* remove old todo

* line width

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_balances --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/balances/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* benchmark metadata function

* don't need this warm up

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_balances --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/balances/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* fix warnings

* fix node-template

* fix

* fmt

* line width

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_staking --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/staking/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* improve docs

* improve cli

* fix format

* fix bug?

* Revert "fix bug?"

This reverts commit 8051bf1.

* skip frame-metadata

* extract repeat out of benchmark

* remove r

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_balances --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/balances/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_balances --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/balances/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_staking --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/staking/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* use linked map to keep order

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_balances --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/balances/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* Delete pallet_balances.rs

* Delete out

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_staking --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/staking/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* steps and repeat to tuple (current_*, total_*)

* idea for list command

* fmt

* use benchmark list in cli

* handle steps in cli

* move log update to cli

* remove old todo

* line width

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_balances --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/balances/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* benchmark metadata function

* don't need this warm up

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_balances --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/balances/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* fix warnings

* fix node-template

* fix

* fmt

* line width

* cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_staking --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/staking/src/weights.rs --template=./.maintain/frame-weight-template.hbs

* improve docs

* improve cli

* fix format

* fix bug?

* Revert "fix bug?"

This reverts commit 8051bf1.

* skip frame-metadata

* Update .gitlab-ci.yml

* fix import

* Update .gitlab-ci.yml

Co-authored-by: Parity Benchmarking Bot <admin@parity.io>

* Warp sync part I (#9227)

* Started warp sync

* BABE & GRANDPA recovery

* Warp sync protocol

* Sync warp proofs first

* Added basic documentation

* Apply suggestions from code review

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>

* Style changes

* Apply suggestions from code review

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>

* fmt

* Apply suggestions from code review

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>

* Fixed chage trie pruning wrt missing blocks

* Restore parent finalization

* fmt

* fmt

* Revert pwasm-utils bump

* Change error type & check API version

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Build fix

* Fixed target block check

* Formatting

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Add rustfmt skip to default frame benchmarking template (#9473)

This was missed in the introduction pr of rustfmt. There we only had
updated the Substrate local template.

* CI: stop publishing to crates.io until unleash is fixed (#9474)

* CI: stop publishing to crates.io until unleash is fixed; allow restarting k8s runners

* CI: fix CI if ci-release tag is pushed

Co-authored-by: Eric Miller <emiller@lirio.co>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Gavin Wood <gavin@parity.io>
Co-authored-by: Zeke Mostov <32168567+emostov@users.noreply.github.com>
Co-authored-by: Shawn Tabrizi <shawntabrizi@gmail.com>
Co-authored-by: Squirrel <gilescope@gmail.com>
Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>
Co-authored-by: Parity Benchmarking Bot <admin@parity.io>
Co-authored-by: Arkadiy Paronyan <arkady.paronyan@gmail.com>
Co-authored-by: Denis Pisarev <denis.pisarev@parity.io>
@nuke-web3 nuke-web3 mentioned this pull request Oct 29, 2021
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GRANDPA: Warp Sync
5 participants