Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make validator stable when beacon node goes offline #8278

Merged
merged 21 commits into from Feb 1, 2021

Conversation

shayzluf
Copy link
Contributor

@shayzluf shayzluf commented Jan 18, 2021

What type of PR is this?

Feature

What does this PR do? Why is it needed?
Currently validator client crashes before entering the mail loop. it can crash on few cases:

  • WaitForChainStart connection issue
  • WaitForSync connection issue
  • 'WaitForActivation' has inner retry flow as well (should i remove it?)
  • CanonicalHeadSlot connection issue

while investigating the right way to implement the fail safe mechanism for beacon client availability i found a case where validator could stop getting blocks
I have marked this problematic line with a comment

I tested it in runtime while crushing the beacon node on several cases (looks stable)

This pr polls the endpoint of beacon node while it is nor reachable and waits till its up to get on with its functionality

Which issues(s) does this PR fix?

Fixes #8188

}
connectionErrorChannel := make(chan error)
go v.ReceiveBlocks(ctx, connectionErrorChannel)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was no error handling for ReceiveBlocks while it was running in a separate goroutine

@shayzluf shayzluf changed the title [POC] Make validator stable when beacon node is offline [POC] Make validator stable when beacon node goes offline Jan 18, 2021
validator/client/runner.go Outdated Show resolved Hide resolved
validator/client/runner.go Outdated Show resolved Hide resolved
continue
}
if err != nil {
log.Fatalf("Could not determine if beacon chain started: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ned to fatal and can we instead do the backoff continue?

continue
}
if err != nil {
log.Fatalf("Could not determine if beacon node synced: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

validator/client/runner.go Outdated Show resolved Hide resolved
validator/client/runner.go Outdated Show resolved Hide resolved
validator/client/runner.go Outdated Show resolved Hide resolved
validator/client/runner.go Outdated Show resolved Hide resolved
validator/client/validator.go Outdated Show resolved Hide resolved
validator/client/validator.go Outdated Show resolved Hide resolved
@shayzluf shayzluf self-assigned this Jan 21, 2021
ctx, span := trace.StartSpan(ctx, "validator.processSlot")

select {
case <-ctx.Done():
log.Info("Context canceled, stopping validator")
span.End()
cancel()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing call to cancel? @rauljordan goland detected a possible context leak

…fail_safe_validator

# Conflicts:
#	validator/client/mock_validator.go
#	validator/client/runner.go
#	validator/client/runner_test.go
@shayzluf shayzluf marked this pull request as ready for review January 25, 2021 17:49
@shayzluf shayzluf requested a review from a team as a code owner January 25, 2021 17:49
@shayzluf shayzluf changed the title [POC] Make validator stable when beacon node goes offline Make validator stable when beacon node goes offline Jan 26, 2021
rauljordan
rauljordan previously approved these changes Jan 26, 2021
Copy link
Contributor

@rauljordan rauljordan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all makes sense to me, nice tests @shayzluf - I think this will be a very nice improvement

validator/client/runner.go Outdated Show resolved Hide resolved
validator/client/runner.go Show resolved Hide resolved
validator/client/runner.go Outdated Show resolved Hide resolved
validator/client/runner.go Outdated Show resolved Hide resolved
validator/client/runner.go Outdated Show resolved Hide resolved
@rauljordan rauljordan added OK to merge Ready For Review A pull request ready for code review labels Feb 1, 2021
@prylabs-bulldozer prylabs-bulldozer bot merged commit 4595789 into develop Feb 1, 2021
@delete-merged-branch delete-merged-branch bot deleted the beacon_fail_safe_validator branch February 1, 2021 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready For Review A pull request ready for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validator Client crash-loops until beacon is accessible and sync'd (to beacon chain?)
4 participants