Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance RPC Health Status #148

Open
sreuland opened this issue Apr 19, 2024 · 1 comment
Open

Enhance RPC Health Status #148

sreuland opened this issue Apr 19, 2024 · 1 comment

Comments

@sreuland
Copy link
Contributor

What problem does your feature solve?

  • RPC's getHealth response doesn't represent the actual run time status levels that can be present on RPC.
  • RPC's getHealth requires POST and json-rpc request payload, and parsing of the json by client afterwards to determine status, which puts more lifting on the operator side to support doing the health checks, in general terms, health endpoints are usually exposed as HTTP GET endpoints to simplify what the operator side needs to perform to determine the status from response, i.e. just check the HTTP response code.

What would you like to see?

note - these requirements were elided from design discussion on https://github.com/stellar/kube/pull/2098#pullrequestreview-2005913742

  • RPC provides a /health/{qos_level} endpoint that is retrievable via HTTP GET.
  • RPC provides the http service that publishes this endpoint immediately after the rpc process is started, no delayed start of the http service.
  • RPC provides a 503 or 200 HTTP response code in response to a HTTP GET /health/{qos_level}, to indicate the level is active or not.

The new endpoint supports a notion of QoS levels for representing the different potential run time states that RPC can be in:

level 1 - service is completely unhealthy, the process is running but ingestion isn't stable yet to network, unable to process requests.
level 2 - service is running and forward ingestion with network is happening, data retention window is not fully caught up yet, but can process some json-rpc request endpoints.
level 3 - service is running, forward ingestion with network is happening, data retention window is full, all rpc request endpoints are up.

What alternatives are there?

use the current json-rpc getHealth

@overcat
Copy link
Contributor

overcat commented Nov 7, 2024

I strongly support this feature. When I was configuring failover for sorobanrpc.com, I had to write an additional simple API service to proxy the getHealth interface, and then have the health checker access this API service. If soroban-rpc supported direct GET access to the getHealth interface, I wouldn't need to an extra API service.

(I'm unsure how many health checkers support posting JSON body during their health checks.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

4 participants