Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Fail to sync when a peer joins existing shard #35

Closed
Tracked by #28
adlrocha opened this issue Oct 5, 2021 · 4 comments
Closed
Tracked by #28

Fail to sync when a peer joins existing shard #35

adlrocha opened this issue Oct 5, 2021 · 4 comments

Comments

@adlrocha
Copy link
Collaborator

adlrocha commented Oct 5, 2021

When a peer joins a shard that has already mined a number of blocks, it doesn't seem to be syncing the previous state correctly. It successfully start listening to new blocks in the shard's pubsub topic but it throws the following error:

2021-10-05T12:26:56.423+0200    ERROR   chain   chain/sync.go:211       Received block with impossibly large height 163

Which means that it didn't synced with the shard.

We currently share the exchange.Client between the root chain and child shards. This may be related:

exchange exchange.Client //TODO: We may need to create a new one for every shard if syncing fails

@adlrocha
Copy link
Collaborator Author

adlrocha commented Oct 5, 2021

Problem found. When we start a new shard we need to:

  • Create a new exchange.Server that looks servers requests for the shard chain in other protocol handler
    • func RunChainExchange(h host.Host, svc exchange.Server) {
      h.SetStreamHandler(exchange.BlockSyncProtocolID, svc.HandleStream) // old
      h.SetStreamHandler(exchange.ChainExchangeProtocolID, svc.HandleStream) // new
      }
    • Override(new(exchange.Server), exchange.NewServer),
  • Start a new exchange.Client for the shard that points to the right protocol handler.
    • Override(new(exchange.Client), exchange.NewClient),
    • We'll probably need to make the client configurable so we can start a new client pointing to other protocol handler.
  • waitForSync before accepting new blocks in the shard.

@adlrocha
Copy link
Collaborator Author

adlrocha commented Oct 6, 2021

After the above the shard is unable to sync. This is because peers learn what is the current heaviest tipset from the HelloService so we need to run a specific Hello protocol for shards that peer can run when joining a shard.

func RunHello(mctx helpers.MetricsCtx, lc fx.Lifecycle, h host.Host, svc *hello.Service) error {

Override(RunHelloKey, modules.RunHello),

func NewHelloService(h host.Host, cs *store.ChainStore, syncer *chain.Syncer, cons consensus.Consensus, pmgr peermgr.MaybePeerMgr) *Service {

@adlrocha
Copy link
Collaborator Author

adlrocha commented Oct 6, 2021

hello protocol and chain sync fail because when a new peer joins the shard it generates the genesis by himself and it not deterministic (as I thought it was). The genesis blocks the two peers get to is not the same, making impossible the sync between the two chains.

We need to have the same genesis available for all peers in the shard. We can either:

  • Implement a function that generates the genesis block deterministically so it is the same for all peers.
  • Generate the genesis when the new shard is generated and make it available for all peers through the shard actor.

@adlrocha
Copy link
Collaborator Author

adlrocha commented Oct 7, 2021

Fixed in 4309e51

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant