Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better resync checking and running #4516

Merged
merged 11 commits into from Jan 16, 2020
Merged

Better resync checking and running #4516

merged 11 commits into from Jan 16, 2020

Conversation

mcdee
Copy link
Contributor

@mcdee mcdee commented Jan 12, 2020

The check in the sync service to decide if to call Resync() had two problems. First, it was being called once per epoch as part of a separate periodic check, and second it didn't take in to account if the entire network was behind or just the node itself. The former caused issues with falling behind for a while before noticing, and the latter mean that often attempts to resync resulted in a fast failure and loop until other peers moved ahead of us.

This patch makes two changes. First it breaks out the resync check in to its own periodic function that runs at a faster tick (16 times per epoch). Second the periodic function itself does not loop, so if it fails to sync it will give up until the next tick.

There are also a couple of minor related tweaks. First, the resync check also now checks its knowledge of its peers to see if they are ahead of it before attempting a resync; if we're behind but so is everyone else there is no point in attempting a resync. Second, the peer status update now runs twice per epoch rather than the previous hard-coded value. This should allow us to have a more up-to-date view of other peers' views of the chain regardless of the configuration under which the chain is running.

@codecov
Copy link

codecov bot commented Jan 12, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@da00f1b). Click here to learn what that means.
The diff coverage is 0%.

@@            Coverage Diff            @@
##             master    #4516   +/-   ##
=========================================
  Coverage          ?   27.29%           
=========================================
  Files             ?      192           
  Lines             ?    13201           
  Branches          ?        0           
=========================================
  Hits              ?     3603           
  Misses            ?     8964           
  Partials          ?      634

if roughtime.Now().After(lastUpdated.Add(statusInterval)) {
if err := r.sendRPCStatusRequest(ctx, pid); err != nil {
if roughtime.Now().After(lastUpdated.Add(interval)) {
if err := r.sendRPCStatusRequest(r.ctx, pid); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for each request I dont think we should be sending in the parent service's context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of passing down the service's context is if it is cancelled then the downstream call has a chance to notice it. If we just passed context.Background() (or even a cancellable context that was newly minted in this function) we'd lose this functionality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this service's context was canceled then we would most likely be disconnected with the peer by the time they receive. Maybe we could create a context with a span ? This would also help us in tracing

@prylabs-bulldozer prylabs-bulldozer bot merged commit d744aaa into prysmaticlabs:master Jan 16, 2020
cryptomental pushed a commit to cryptomental/prysm that referenced this pull request Feb 24, 2020
* Separate out fallen behind/resync check
* Remove hard-coded resync interval
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
cryptomental pushed a commit to cryptomental/prysm that referenced this pull request Feb 28, 2020
* Separate out fallen behind/resync check
* Remove hard-coded resync interval
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
* Merge branch 'master' into resync
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants