New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validator Client crash-loops until beacon is accessible and sync'd (to beacon chain?) #8188
Comments
Hey @Karmastic Thanks for this report, we can definitely look at more graceful ways to handle situations like this where a beacon node is not yet up first, rather than causing the process to prematurely crash.
Hmm, this should not be happening. Once a validator is connected, it should simply just be waiting for the beacon node to be synced. Do you mind pasting the crash logs that occur once the beacon node is online but not synced ? |
Thanks for looking into this Nishant! I'll repro this with debug logs and
send them on. Do you want beacon logs as well?
In case it makes a difference, this is at least an issue when there are no validator keys present -- on a clean client. It may not be an issue when there are validator keys in the wallet.
…On Mon, Jan 4, 2021 at 2:51 AM Nishant Das ***@***.***> wrote:
Hey @Karmastic <https://github.com/Karmastic>
Thanks for this report, we can definitely look at more graceful ways to
handle situations like this where a beacon node is not yet up first, rather
than causing the process to prematurely crash.
Once it can connect, the client crash-loops for several minutes more until
the beacon node is sync'd (presumably to the eth2 beacon chain). This
crash-looping may be the intended behavior but it's not user-friendly or
platform-friendly.
Hmm, this should not be happening. Once a validator is connected, it
should simply just be waiting for the beacon node to be synced. Do you mind
pasting the crash logs that occur once the beacon node is online but not
synced ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8188 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGB6D53CEUM7ABG2B66BILDSYFXRTANCNFSM4VRK5MTA>
.
|
Yeap that would be great too |
In case you didn't see, I edited my last comment to reflect that this at least is an issue when there are no validator keys. Note that the Info-level logs from the client is the first example I provided (just the entries surrounding the connect/failure:
|
This log would mean that the beacon node process has shut down. Is that the case in your setup ? In any case we should be |
Actually this should have been resolved in #7339 , if this is coming up again it might signify a regression. @rauljordan any ideas on this ? Also this might be related to #6669 |
This is not the case. The beacon (primary and failover) startup and run fine - syncing with beacon and eth1 chains. Haven't seen them crash at all. |
working on it |
🚀 Feature Request
Description
When we deploy a new eth2 cluster (redundant beacons and multiple clients) to our platform, the beacon node takes some time for its DNS name to resolve; until this resolves, the client crash-loops failing to connect.
Once it can connect, the client crash-loops for several minutes more until the beacon node is sync'd (presumably to the eth2 beacon chain). This crash-looping may be the intended behavior but it's not user-friendly or platform-friendly.
e.g. after connection made:
When it finally stops crashing:
Describe the solution you'd like
A client should be able to be started before its beacon node and be resilient and patient. If nothing else, this behavior should be optional (
--no-fail-fast
).Describe alternatives you've considered
If we can't get this added, we'll need to build our own readiness probes to hold off starting the client until it's likely not going to crash. Pod crash-loops are something our platforms alerts on as, in general, crashing under 'normal operation' should be avoided.
The text was updated successfully, but these errors were encountered: