Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement liveness check #4007

Closed
daniellehrner opened this issue Jun 23, 2022 · 3 comments
Closed

Implement liveness check #4007

daniellehrner opened this issue Jun 23, 2022 · 3 comments
Labels
enhancement New feature or request TeamChupa GH issues worked on by Chupacabara Team

Comments

@daniellehrner
Copy link
Contributor

daniellehrner commented Jun 23, 2022

Besu offers a liveness RPC endpoint. But the endpoint does not do any check and just returns that is is UP no matter what: source code of the check.

We need to define what a proper liveness check looks like and implement it.

@ajsutton
Copy link
Contributor

That's typically all a liveness endpoint does. Something context dependent such as reporting if the node is in sync or not is usually a separate health check endpoint. Both are useful in different situations (ie you want a docker container to consider itself started once the liveness check passes - it would time out long before the node finishes syncing and the health check returns OK).

@iamhsk iamhsk added the TeamChupa GH issues worked on by Chupacabara Team label Jul 14, 2022
@antonydenyer
Copy link
Member

I think there's potential for some ordering issues here, the JsonRpcHttpService starts before ethereum main loop starts. So you could be responding with "UP" before besu is fully up, but we're talking nanoseconds. The other possibility is that something fails after the JsonRpcHttpService starts, meaning that you could erroneously report as being up when you're not.

I think the main scenario that would be helpful is around graceful shutdowns. At the moment when you stop besu you may still be serving http requests. Ideally you'd want to indicate that you're shutting down on the LivenessCheck, this would allow k8s/whatever to remove the instance from rotation and direct requests elsewhere, then you could gracefully shutdown besu as normal (let me know if you want me to elaborate more).

@daniellehrner
Copy link
Contributor Author

We discussed this a while ago among the developers and the reason why we always return UPis that there is no metric to tell if a Besu node needs a restart or not. Every issue could potentially just be temporary or an external factor, like bad peers or problems with the CL client. A restart would not fix that.

We have updated the documentation of the endpoint to make this behavior clear: https://besu.hyperledger.org/en/stable/public-networks/how-to/use-besu-api/json-rpc/?h=liveness#liveness

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request TeamChupa GH issues worked on by Chupacabara Team
Projects
None yet
Development

No branches or pull requests

4 participants