Better resync checking and running #4516

mcdee · 2020-01-12T22:28:45Z

The check in the sync service to decide if to call Resync() had two problems. First, it was being called once per epoch as part of a separate periodic check, and second it didn't take in to account if the entire network was behind or just the node itself. The former caused issues with falling behind for a while before noticing, and the latter mean that often attempts to resync resulted in a fast failure and loop until other peers moved ahead of us.

This patch makes two changes. First it breaks out the resync check in to its own periodic function that runs at a faster tick (16 times per epoch). Second the periodic function itself does not loop, so if it fails to sync it will give up until the next tick.

There are also a couple of minor related tweaks. First, the resync check also now checks its knowledge of its peers to see if they are ahead of it before attempting a resync; if we're behind but so is everyone else there is no point in attempting a resync. Second, the peer status update now runs twice per epoch rather than the previous hard-coded value. This should allow us to have a more up-to-date view of other peers' views of the chain regardless of the configuration under which the chain is running.

codecov · 2020-01-12T22:32:51Z

Codecov Report

❗ No coverage uploaded for pull request base (master@da00f1b). Click here to learn what that means.
The diff coverage is 0%.

@@            Coverage Diff            @@
##             master    #4516   +/-   ##
=========================================
  Coverage          ?   27.29%           
=========================================
  Files             ?      192           
  Lines             ?    13201           
  Branches          ?        0           
=========================================
  Hits              ?     3603           
  Misses            ?     8964           
  Partials          ?      634

nisdas · 2020-01-14T14:23:49Z

beacon-chain/sync/rpc_status.go

-			if roughtime.Now().After(lastUpdated.Add(statusInterval)) {
-				if err := r.sendRPCStatusRequest(ctx, pid); err != nil {
+			if roughtime.Now().After(lastUpdated.Add(interval)) {
+				if err := r.sendRPCStatusRequest(r.ctx, pid); err != nil {


for each request I dont think we should be sending in the parent service's context.

The idea of passing down the service's context is if it is cancelled then the downstream call has a chance to notice it. If we just passed context.Background() (or even a cancellable context that was newly minted in this function) we'd lose this functionality.

If this service's context was canceled then we would most likely be disconnected with the peer by the time they receive. Maybe we could create a context with a span ? This would also help us in tracing

* Separate out fallen behind/resync check * Remove hard-coded resync interval * Merge branch 'master' into resync * Merge branch 'master' into resync * Merge branch 'master' into resync * Merge branch 'master' into resync * Merge branch 'master' into resync * Merge branch 'master' into resync * Merge branch 'master' into resync * Merge branch 'master' into resync * Merge branch 'master' into resync

mcdee added 3 commits January 12, 2020 21:09

Separate out fallen behind/resync check

5b338f1

Remove hard-coded resync interval

db61645

Merge branch 'master' into resync

51dc10d

mcdee added 2 commits January 13, 2020 09:51

Merge branch 'master' into resync

0d17e53

Merge branch 'master' into resync

33c1b9c

prestonvanloon requested review from nisdas and prestonvanloon January 14, 2020 06:54

Merge branch 'master' into resync

515eb82

nisdas reviewed Jan 14, 2020

View reviewed changes

mcdee and others added 4 commits January 14, 2020 17:11

Merge branch 'master' into resync

71f2b4b

Merge branch 'master' into resync

5f09c6a

Merge branch 'master' into resync

8a49443

Merge branch 'master' into resync

da00f1b

nisdas approved these changes Jan 16, 2020

View reviewed changes

prestonvanloon approved these changes Jan 16, 2020

View reviewed changes

Merge branch 'master' into resync

ccc241a

prestonvanloon added the OK to merge label Jan 16, 2020

prylabs-bulldozer bot merged commit d744aaa into prysmaticlabs:master Jan 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better resync checking and running #4516

Better resync checking and running #4516

mcdee commented Jan 12, 2020

codecov bot commented Jan 12, 2020 •

edited

nisdas Jan 14, 2020

mcdee Jan 14, 2020

nisdas Jan 16, 2020

Better resync checking and running #4516

Better resync checking and running #4516

Conversation

mcdee commented Jan 12, 2020

codecov bot commented Jan 12, 2020 • edited

Codecov Report

nisdas Jan 14, 2020

Choose a reason for hiding this comment

mcdee Jan 14, 2020

Choose a reason for hiding this comment

nisdas Jan 16, 2020

Choose a reason for hiding this comment

codecov bot commented Jan 12, 2020 •

edited