Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit how lotus picks the initial sync target #1546

Closed
Schwartz10 opened this issue Apr 14, 2020 · 5 comments
Closed

Audit how lotus picks the initial sync target #1546

Schwartz10 opened this issue Apr 14, 2020 · 5 comments
Labels
area/chain Area: Chain

Comments

@Schwartz10
Copy link
Contributor

Describe the bug
after a clean install of lotus, running lotus daemon started syncing to the wrong chain.

The "wrong" chain is an assumption from running lotus sync status. The right chain (presumably) has a height diff close to chainhead from a fully synced node:

Healthy node syncing:

 Base:   [bafy2bzacedkkawxrwxua25ddmczczn2eitzipaz47bwfnidp3hyralblrxdw6]
        Target: [bafy2bzacedhgpna5ver52pesgvt6xlfpgeafijuztyasfk2d4f2dc6n2ud4f2] (25327)
        Height diff:    25327
        Stage: message sync
        Height: 1063
        Elapsed: 25m26.117600028s

Unhealthy node syncing?

worker 1:
        Base:   [bafy2bzacedkkawxrwxua25ddmczczn2eitzipaz47bwfnidp3hyralblrxdw6]
        Target: [bafy2bzacedr36iiszhhymbnv26an2ehi35oqmimbhr3adqpz63qfp6y47nlsq] (7758)
        Height diff:    7758
        Stage: message sync
        Height: 1413
        Elapsed: 48m24.022191639s

To Reproduce
Not exactly sure how we reproduced this, but it happened once on a local MacOS daemon, and once on a remote AWS instance, in a kubernetes cluster. It seems to happen randomly? Maybe we connected to bad bootstrap peers?

An observation:

When comparing lotus net peers on a daemon syncing to the correct chain (judging by its height diff) and a daemon syncing to (presumably) an unhealthy chain, some peers seem to overlap. In both cases (even when syncing to the presumably unhealthy chain) the peer set is large (> 20 peers).

Expected behavior
We should sync to the canonical chain.

Version (run lotus --version):
lotus version 0.3.0+gitfc3b42df

@whyrusleeping
Copy link
Member

@Schwartz10 Thanks for reporting! I've seen a number of people report this, we are looking into it

@arajasek
Copy link
Contributor

arajasek commented Nov 5, 2020

Interopnet doesn't exist anymore, but we should at least audit how we pick our initial sync target before closing this issue...it's a little concerning

@jennijuju jennijuju added area/chain Area: Chain need/team-input Hint: Needs Team Input need/analysis Hint: Needs Analysis and removed area/chain Area: Chain need/team-input Hint: Needs Team Input labels Nov 6, 2020
@jennijuju
Copy link
Member

@arajasek should this be in Stabilization milestone?

@arajasek
Copy link
Contributor

arajasek commented Nov 6, 2020

@jennijuju Nah, I don't think it's very prevalent. I think we can close this after convincing ourselves our target-picking logic is sound.

@jennijuju jennijuju added this to Backlog in Lotus+Actors Board Nov 6, 2020
@jennijuju jennijuju changed the title Interopnet node syncs to wrong chain Audit how lotus picks the initial sync target Jan 19, 2021
@jennijuju
Copy link
Member

We think this is no longer presists in the miannet.

Lotus+Actors Board automation moved this from Ready For Work to Closed Feb 24, 2021
@TippyFlitsUK TippyFlitsUK removed the need/analysis Hint: Needs Analysis label Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/chain Area: Chain
Projects
Development

No branches or pull requests

6 participants