Skip to content

Health checks http API #10496

@JaySon-Huang

Description

@JaySon-Huang

Enhancement

Currently, TiFlash provide the "/tifash/store-status" http API for health check. If the API return "Running", it means that TiFlash is ready for serving request. But the API is hard to extend.

To align with industry best practices, we propose introducing a new, more comprehensive health check API.

Proposal

Aligned with tikv/pd#9846.

We propose introducing a new set of health check endpoints, livez and readyz, aligning with Kubernetes and etcd conventions (KEP-4331).

Reference: https://kubernetes.io/docs/reference/using-api/health-checks/

Introduce a New /tiflash/readyz Endpoint

  • GET /tiflash/readyz

    • The endpoint will return 200 OK only if all underlying readiness checks pass. It acts as a logical AND for all sub-checks.
    • If any sub-check fails (e.g., leader_promotion is false), the main endpoint will return a non-200 status code.
  • GET /tiflash/readyz/<check_name>

    • This allows for checking a specific, granular readiness condition.
  • GET /tiflash/readyz?exclude=<check_name>

    • The primary use case is for Kubernetes readiness probes, where we want to know if the pod is ready to serve traffic, but we might want to exclude the raft check. A request to GET /readyz?exclude=raft would be suitable for this.

Introduce /tiflash/livez for Liveness Checks

  • GET /tiflash/livez
    • This endpoint will simply check if the process is alive

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions