Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Consensus] Change the state synchronization flow #1491

Closed
dmitri-perelman opened this issue Oct 24, 2019 · 7 comments
Closed

[Consensus] Change the state synchronization flow #1491

dmitri-perelman opened this issue Oct 24, 2019 · 7 comments
Assignees

Comments

@dmitri-perelman
Copy link
Contributor

@dmitri-perelman dmitri-perelman commented Oct 24, 2019

State Synchronization without a target

Synchronization to the specific target LedgerInfo has a number of disadvantages:

  • introduces non-trivial complexity for state synchronizer, which needs to support two different flows for full nodes and validators.
  • unnecessary slowdown when syncing to the "not latest" LI
  • opens consensus to the state sync byzantine attack (referenced in #1324).

We'd like to approach this problem in a different way.
StateSynchronizer will try to synchronize to some LedgerInfo given the potential peers:

/// State synchronizer returns when it successfully fully synchronizes to **one** of the given peers.
/// Full synchronization means that all the retrieved transactions in the local storage
/// correspond to the latest ledger info sent by the remote peer.
/// The returned value is a pair of the latest ledger info sent by the given remote peer and its 
/// peer id.
/// Note that state synchronizer is not going to return unless it succeeds to fully synchronize to
/// some peer. It's going to retry synchronizing with different peers from the list until it 
/// eventually succeeds. TODO: should we have timeout instead?
StateSynchronizer::sync_to(peers: Vec<PeerId>) -> (LedgerInfoWithSignatures, PeerId)

Thus the SyncManager will run the following retry loop upon state synchronization:

SyncManager::synchronize() method:
// This method terminates when it succeeds to synchronize to one of the peers that signed
// the given LedgerInfo
Retry until succeed:
 - invoke state synchronizer `sync_to()`
 - retrieve the 3-chain for the returned LedgerInfo from the given peer
 - rebuild the BlockTree

Note that SyncManager::synchronize() does not guarantee that it'll be able to synchronize to a given LedgerInfo. This is our way to make sure the system is not stuck by a state sync byzantine attack described in #1324.
SyncManager::synchronize() might make little progress / no progress at all in case a Byzantine remote validator provides an outdated ledger info. In today's implementation it'd be a responsibility of a StateSynchronizer to retry different peers until it reaches the target. In the new implementation we're going to log the fact that a given peer provided some old ledger info and just move on. It's absolutely ok to not be able to do state synchronization and to drop the message processing for any incoming message. At some point we'll be able to pick an honest peer and catch up.

Restartability logic

Today we're maintaining an invariant that the root of a block tree in ConsensusDB is either equal or higher than the highest ledger info in LibraDB. This invariant is not going to hold in the new solution because we're going to first retrieve the transactions and then retrieve the 3-chain for the block tree.
We don't need this invariant. We're not using it anywhere.
Everything will continue working as usual without it. The only addition we need to have is the list of peers to resume the state synchronization in case we crashed in the middle of state sync (the pending list of peers we're syncing with will be stored in ConsensusDB for the duration of the state synchronization).

@bmaurer

This comment has been minimized.

Copy link
Contributor

@bmaurer bmaurer commented Oct 24, 2019

Does the consensus system share it's HLI with the state synchronizer at all? At least the way I'm reading this, in this design state synchronizer has a completely different view of the universe in terms of what the latest commit is.

I guess my intuition was somewhat different here -- that we might want to think about how to unify the knowledge of the HLI. For example, if during state sync we learn about some HLI, it seems like we'd ideally want to inform consensus ASAP -- even if we haven't synced to that round we'd want to avoid any kinds of votes / processing on information in that round. I guess my intuition is that there's three basic pieces of info in the system -- highest synced LI (the HLI that is stored locally) and the Highest Known LI which is the highest commit we know about (but haven't synced to) and the Highest QC and that one component could track these pieces of info across the entire node.

@bmaurer

This comment has been minimized.

Copy link
Contributor

@bmaurer bmaurer commented Oct 24, 2019

In the new implementation we're going to log the fact that a given peer provided some old ledger info and just move on

Can you clarify this. How do you know it's old?

retrieve the 3-chain for the returned LedgerInfo from the given peer

I still don't quite understand the logic of this part of syncing (why it's important that the specific 3-chain for the commit be synced) but probably easier to discuss the next time we chat in person :-)

@dmitri-perelman

This comment has been minimized.

Copy link
Contributor Author

@dmitri-perelman dmitri-perelman commented Oct 24, 2019

In case we don't have an opportunity to sync in person, just wanted to add some thoughts about it:

  • StateSynchronizer is a service, which is currently used in two very different fashions. Full nodes are sending the ChunkRequests without specific target to their parents in the hierarchy, which respond with the latest committed information. On the other hand validators are sending ChunkRequests for specific targets, which have been determined by consensus. Having specific targets is problematic for the reasons listed in the task description.
  • One of the weird fundamental challenges of consensus state synchronization is that it consists of two parts: 1) retrieving the committed information from LibraDB, 2) retrieving the corresponding Block information from ConsensusDB. Consensus cannot operate without a valid block tree, which at the very least should have a root. The sad irony is that the root of a consensus block tree cannot (!) be retrieved via state synchronizer. State sync is reading the committed data from LibraDB, which does not keep consensus block information. I think this part would be easier if our LibraDB storage would operate with the same blocks that we're using in consensus (in this case retrieving the committed info would automatically retrieve the root of the tree).
@bmaurer

This comment has been minimized.

Copy link
Contributor

@bmaurer bmaurer commented Oct 24, 2019

Consensus cannot operate without a valid block tree, which at the very least should have a root.

Is this a fundamental limitation to our on-chain structures, or a limitation of the blocktree today? i.e. is there a strategy one can use to propose a new block with the invariant that you never request to download a block that you know to be committed.

I'm wondering if the following approach works: a leader in round R must be able to assemble a block that has the HQC as it's parent. It seems one could build a block on top of the HQC by (1) synching to the highest committed block (2) syncing backwards from the HQC until you reach a block where the block's parent_id == the block_id in the HLI (thus avoiding downloading the committed block)

@dmitri-perelman

This comment has been minimized.

Copy link
Contributor Author

@dmitri-perelman dmitri-perelman commented Oct 24, 2019

@bmaurer oh I think we are kind of talking about the same thing then. Do you think the differences boil down to whether we're building a block tree eagerly or lazily? (our current paradigm is to build a block tree in an eager manner, while you say we could try to build it lazily on demand)

@zekun000

This comment has been minimized.

Copy link
Contributor

@zekun000 zekun000 commented Oct 24, 2019

I think some of the limitation comes from the pipelining, if we operate as 1-chain instead of 3-chain like block <- ledger_info <- block <- ledger_info, then the only thing we need to know is the highest ledger info, we had some discussion about the perf implications of this, it's not very clear what we get from the pipelining compared to make the block size 3x bigger.

For example, for reconfiguration block we gave up the pipelining and thus are able to only sync HLI without 3-chain to continue consensus(consensus would construct new genesis from the HLI), but it shows that the 3-chain requirements are there only for the two pending blocks.

The committed block is not required but rather optimization to maintain the tree property, we can retrieve 2-chain instead but make block store a forest structure.

@dmitri-perelman

This comment has been minimized.

Copy link
Contributor Author

@dmitri-perelman dmitri-perelman commented Nov 1, 2019

Unfortunately we realized that the new flow might not work well with the way Consensus is planning to do epoch changes.
Consider the following scenario: a node at epoch n receives a proposal that is still at epoch n but with a higher version that triggers state synchronization. When a remote peer receives a ChunkRequest it might have already moved to epoch n+1. In the new flow it would mean that the state synchronizer switches to the next epoch within the same EventProcessor.
However, consensus is now trying to make sure that EventProcessor corresponds to a specific epoch: the epoch switches should happen at the level above EventProcessor only. The old flow allowed us to control the fact that we never switch epochs within the code of EventProcessor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.