Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit sync API for downloading important, possibly orphaned, forks #3633

Merged
merged 7 commits into from Sep 28, 2019

Conversation

@arkpar
Copy link
Member

commented Sep 17, 2019

Sync services provides an API to download specified block chains explicitly.

Vec<(PeerId, BlockRequest<B>)>
{
debug!(target: "sync", "Explicit sync request for block {:?} with {:?}", hash, peers);
if self.is_known(&hash) || self.is_already_downloading(&hash) {

This comment has been minimized.

Copy link
@rphmeier

rphmeier Sep 17, 2019

Member

I'd say we need something a bit more explicit. The fork-sync requests should be kept around forever until removed or the block is actually known, and the is_already_downloading check should be done periodically. We can imagine a case where is_already_downloading is true at the moment, but then for some reason the node stops downloading that block. Then our requested block still wouldn't be synced.


let block_status = self.client.block_status(&BlockId::Number(number - One::one()))
.unwrap_or(BlockStatus::Unknown);
if block_status == BlockStatus::InChainPruned {

This comment has been minimized.

Copy link
@rphmeier

rphmeier Sep 17, 2019

Member

As a follow-up, there should be a special mode where we just download the headers for this chain if requested, but we don't have the state to process them.

This comment has been minimized.

Copy link
@rphmeier

rphmeier Sep 18, 2019

Member

#3639 for a more detailed writeup of ^

@rphmeier rphmeier added A5-grumble and removed A0-pleasereview labels Sep 17, 2019
@rphmeier

This comment has been minimized.

Copy link
Member

commented Sep 17, 2019

It's also missing the APIs for higher-level code to inform the sync service about which peers it knows has the block. It should have add/remove capability.

@arkpar

This comment has been minimized.

Copy link
Member Author

commented Sep 17, 2019

It's also missing the APIs for higher-level code to inform the sync service about which peers it knows has the block.

But sync_fork does exactly that.
What's the case for remove? Peer forgets a block?
Sync tries to download all forks that other peers have and we don't. It does not matter if GRANDPA is interested in the block or not. Do you suggest changing that?

@rphmeier

This comment has been minimized.

Copy link
Member

commented Sep 17, 2019

The set may change over time, is what I mean - are multiple calls guaranteed to alter the existing "download session" so that it will sync the fork from the union of the given peer-sets?

@arkpar

This comment has been minimized.

Copy link
Member Author

commented Sep 17, 2019

No, but I'm going to implement that along with request persistence.

@rphmeier

This comment has been minimized.

Copy link
Member

commented Sep 17, 2019

It does not matter if GRANDPA is interested in the block or not. Do you suggest changing that?

In general, I suggest sync being a little bit smarter about figuring out what "all forks" are based on what GRANDPA is interested in. Even if the fork is no longer active it should constantly be trying to sync it, even if attempts fail.


if number > peer.best_number {
peer.best_number = number;
peer.best_hash = hash.clone();

This comment has been minimized.

Copy link
@rphmeier

rphmeier Sep 17, 2019

Member

why set that to the peer's best block? i don't see this working when there is 1 peer with 2 forks we want to download, if that's a pre-requisite to being able to download those blocks.

This comment has been minimized.

Copy link
@arkpar

arkpar Sep 17, 2019

Author Member

best_number is used to track global sync target. best_hash is not used for anything really, other than reporting generic peer info.

@arkpar arkpar added A3-inprogress and removed A5-grumble labels Sep 17, 2019
@rphmeier rphmeier changed the title Explicit sync API Explicit sync API for downloading important, possibly orphaned, forks Sep 18, 2019
@arkpar

This comment has been minimized.

Copy link
Member Author

commented Sep 21, 2019

Fork download requests are now persistent. Sync will continue trying even if peer reconnects.

@@ -728,7 +728,7 @@ impl<B: BlockT, S: NetworkSpecialization<B>, H: ExHashT> Protocol<B, S, H> {
match request.direction {
message::Direction::Ascending => id = BlockId::Number(number + One::one()),
message::Direction::Descending => {
if number.is_zero() {
if number.is_one() {

This comment has been minimized.

Copy link
@rphmeier

rphmeier Sep 21, 2019

Member

if it is zero, will that cause an error further down the line when parent_hash gives None for the header/block?

This comment has been minimized.

Copy link
@arkpar

arkpar Sep 22, 2019

Author Member

parent_hash is not an Option. for the genesis block it is a zero hash.

This comment has been minimized.

Copy link
@arkpar

arkpar Sep 22, 2019

Author Member

That change is still wrong though. We should allow downloading the genesis block in principle.

}

let block_status = self.client.block_status(&BlockId::Number(number - One::one()))
.unwrap_or(BlockStatus::Unknown);
if block_status == BlockStatus::InChainPruned {
trace!(target: "sync", "Refusing to sync ancient block {:?}", hash);
return Vec::default();
return;

This comment has been minimized.

Copy link
@rphmeier

rphmeier Sep 21, 2019

Member

this is also not the only kind of ancient fork we can encounter - what happens / is supposed to happen when the common ancestor is more than one block back?

This comment has been minimized.

Copy link
@arkpar

arkpar Sep 22, 2019

Author Member

Sync attempts to download the chain up to the last finalized block, not just one block. There's currently a limit of how many blocks these forks thare downloaded up the chain can be. We can't be allowed to download infinite chains of headers/blocks without validating them.

This comment has been minimized.

Copy link
@rphmeier

rphmeier Sep 23, 2019

Member

Sync attempts to download the chain up to the last finalized block, not just one block

I don't understand what you mean. Yes, there can be a practical cutoff somewhere, and I think refusing to download forks beyond finalized_height + constant_allowance would be reasonable (with constant_allowance around 1000).

This allowance needs to be really permissive. Our nodes can afford a few extra MB of RAM if it means that you reduce the likelihood of consensus stalling by several orders of magnitude. And if consensus does stall, I don't really see any issue with downloading "infinitely" long chains as a light client, as long as it's in incremental parts, since there's nothing better the node could be doing and that would at least be trying to unstick consensus. The chains in practice aren't going to be infinitely long, as long as the block production mechanism is sane. If it's insane, then it's not the job of the sync code to work around that.

But the most recent finalized block of the fork can at most be the common ancestor (otherwise it wouldn't be a fork). So we always have to download beyond that.

if let Some(request) = self.download_unknown_stale(&peer_id, &hash) {
requests.push((peer_id, request));
if number > peer.best_number {
peer.best_number = number;

This comment has been minimized.

Copy link
@rphmeier

rphmeier Sep 21, 2019

Member

What is the invariant that sync is meant to uphold?

Is this always supposed to be the peer's highest block, or is it the peer's best chain? We do have some non longest-chain fork choice rules (1. follow finality, 2. aurababous), so they are not always the same thing.

This comment has been minimized.

Copy link
@arkpar

arkpar Sep 22, 2019

Author Member

This is just the highest block currently. The sync tracks the longest fork a bit differently from the shorter forks. E.g. it supports parallel block fetch for the longest chain only. It still fetches all forks though so it should not matter which one is actually "best". There will be a refactoring to get rid of the "main" chain notion, but this is out of scope of this PR.

requests: &HashMap<B::Hash, SyncRequest<B>>,
best_num: NumberFor<B>,
finalized: &B::Hash,
attributs: &message::BlockAttributes,

This comment has been minimized.

Copy link
@rphmeier

rphmeier Sep 21, 2019

Member

typo: attributes?

if !r.peers.contains(id) {
continue
}
if r.number <= best_num {

This comment has been minimized.

Copy link
@rphmeier

rphmeier Sep 21, 2019

Member

questionable for the same reason as before - the orphan chain may be longer than the peer's "best" chain because of the fork-choice rule.

This comment has been minimized.

Copy link
@arkpar

arkpar Sep 22, 2019

Author Member

The other case (r.number > best_num) is handled in a different code path in peer_block_request.

This comment has been minimized.

Copy link
@rphmeier

rphmeier Sep 23, 2019

Member

This is confusing. Why is the logic so scattered? I feel like I need a PhD to understand all the side effects and branches

This comment has been minimized.

Copy link
@rphmeier

rphmeier Sep 23, 2019

Member

What I mean is that there's a lot of implicit "but actually" in the interface of this function.

It's an explicit sync request...but actually only returns explicit requests for chains shorter than the best.

So the peer doesn't download that chain...but actually it does, it's covered in another function in another file (peer_block_request defers to needed_blocks)

This comment has been minimized.

Copy link
@rphmeier

rphmeier Sep 23, 2019

Member

I'd recommend that the function return a special type when there is implicit context that needs to be passed from the execution of this function into the execution of another, where the reader of the code can clearly see that the handling of some case has been deferred beyond the execution of the function.

This comment has been minimized.

Copy link
@arkpar

arkpar Sep 24, 2019

Author Member

This is fundamentally caused by the same issue described here: #3633 (comment)
To be refactored.

@rphmeier rphmeier added A5-grumble and removed A0-pleasereview labels Sep 21, 2019
@rphmeier

This comment has been minimized.

Copy link
Member

commented Sep 21, 2019

Looks good but needs accounting for fork-choice rules

@arkpar arkpar added A0-pleasereview and removed A5-grumble labels Sep 22, 2019
@rphmeier rphmeier added A5-grumble and removed A0-pleasereview labels Sep 23, 2019
arkpar added 2 commits Sep 24, 2019
…explicit
@arkpar

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2019

After a chat with @rphmeier we've agreed to address all remaining issues in the following PRs

@arkpar arkpar added A0-pleasereview and removed A5-grumble labels Sep 24, 2019
Copy link
Contributor

left a comment

I am still getting familiar with the code, so please interpret some of these comments as questions instead of suggestions.

/// Request syncing for the given block from given set of peers.
// This is similar to on_block_announce with unknown parent hash.
pub fn sync_fork(&mut self, peers: Vec<PeerId>, hash: &B::Hash, number: NumberFor<B>) {
if peers.is_empty() {

This comment has been minimized.

Copy link
@mxinden

mxinden Sep 27, 2019

Contributor

Do I understand correctly that calling sync_fork with an empty peers removes it from the sync_requests set? This is not what I would have expected from the function name. How about renaming it to set_sync_request or moving the logic to a separate function cancel_sync_request?

Please ignore this in case this is a pattern within Substrate.

This comment has been minimized.

Copy link
@arkpar

arkpar Sep 27, 2019

Author Member

renamed to set_fork_sync_request

@@ -449,6 +458,51 @@ impl<B: BlockT> ChainSync<B> {
})
}

/// Request syncing for the given block from given set of peers.
// This is similar to on_block_announce with unknown parent hash.

This comment has been minimized.

Copy link
@mxinden

mxinden Sep 27, 2019

Contributor
Suggested change
// This is similar to on_block_announce with unknown parent hash.
/// This is similar to on_block_announce with unknown parent hash.

I feel like this could be a doc comment as well, right?

This comment has been minimized.

Copy link
@arkpar

arkpar Sep 27, 2019

Author Member

This comment is about the implementation detail, not the interface.

@@ -496,6 +496,18 @@ impl<B: BlockT + 'static, S: NetworkSpecialization<B>, H: ExHashT> NetworkServic
Ok(())
}

/// Adds a `PeerId` and its address as reserved.

This comment has been minimized.

Copy link
@mxinden

mxinden Sep 27, 2019

Contributor
Suggested change
/// Adds a `PeerId` and its address as reserved.

This is a copy/paste error from above, right?

arkpar added 2 commits Sep 27, 2019
…header-chain
@rphmeier rphmeier merged commit 4f850d3 into master Sep 28, 2019
10 checks passed
10 checks passed
continuous-integration/gitlab-cargo-check-benches Build stage: test; status: success
Details
continuous-integration/gitlab-cargo-check-subkey Build stage: test; status: success
Details
continuous-integration/gitlab-check-line-width Build stage: test; status: success
Details
continuous-integration/gitlab-check-runtime Build stage: test; status: success
Details
continuous-integration/gitlab-check-web-wasm Build stage: test; status: success
Details
continuous-integration/gitlab-check_warnings Build stage: build; status: success
Details
continuous-integration/gitlab-node-exits Build stage: test; status: success
Details
continuous-integration/gitlab-test-linux-stable Build stage: test; status: success
Details
continuous-integration/gitlab-test-linux-stable-int Build stage: test; status: success
Details
continuous-integration/gitlab-test-srml-staking Build stage: test; status: success
Details
@rphmeier rphmeier deleted the a-sync-explicit branch Sep 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.