Liveness and Readiness probes

In order to enable automatic status recognition from kubernetes of microservices deployed within the cluster, endpoints for liveness and readiness probes can be added in the definition of the deployments. In the following sections, we will cover the way of kubernetes to handle both probes and the changes that we need to implement in the microservices to enable it.

Concepts

Extracted from kubernetes documentation:

Liveness probe:

Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. Kubernetes provides liveness probes to detect and remedy such situations.

Readiness probe:

Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup. In such cases, you don’t want to kill the application, but you don’t want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.

Probes types:

There are three kinds of probes that can be created for liveness and readiness:

HTTP
TCP
Command

Liveness probe example, the tng-rep microservice:

We have three options to implement the readiness probe for the tng-rep microservice. We will address all of them for this microservice.

HTTP:

In case of http, tng-rep is a web server based microservice and we have implemented a response in the endpoint http://tng-rep:/ping with the content '{"ping": "pong"}' and 200. If the service is not alive or not yet started then we will have a timeout.

NOTE: Any code greater than or equal to 200 and less than 400 indicates success. Any other code indicates failure.

The configuration of the kubernetes deployment will have the following fields for the readiness probe.

    livenessProbe:
      httpGet:
        path: /ping
        port: 4012
      initialDelaySeconds: 3
      periodSeconds: 3

Aditionally, the periodSeconds field specifies that the kubelet should perform a liveness probe every 3 seconds. The initialDelaySeconds field tells the kubelet that it should wait 3 seconds before performing the first probe

TCP:

In case of TCP, the kubelet will attempt to open a socket to your container on the specified port. If it can establish a connection, the container is considered healthy, if it can’t it is considered a failure. For this microservice the port we will check is the 4012.

The configuration of the kubernetes deployment will look like:

    livenessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20

The initialDelaySeconds and periodSecods are the same used in HTTP.

Command:

In case we can't implement HTTP or TCP probes due our service is not listening in any port, then we could use a command that exit with 0 in case of everything is OK and returns other in case something is wrong.

An example can be a script that grabs the PID of the service from PID file and check if the PID exists in the list of processes. Another example can be to implement a script that implements checks using your code as databases connections opened, requests in the queue. The script itself has to have the heuristic to determine if the service is ok or not.

In kubernetes the snipet to add the liveness probe will be:

readinessProbe:
  exec:
    command:
    - python
    - /app/check.py
  initialDelaySeconds: 5
  periodSeconds: 5

Probes specific parameters:

initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated.
periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. Minimum value is 1.
failureThreshold: When a Pod starts and the probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the Pod. In case of readiness probe the Pod will be marked Unready. Defaults to 3. Minimum value is 1.

HTTP probes have additional fields that can be set on httpGet:

host: Host name to connect to, defaults to the pod IP. You probably want to set “Host” in httpHeaders instead.
scheme: Scheme to use for connecting to the host (HTTP or HTTPS). Defaults to HTTP.
path: Path to access on the HTTP server.
httpHeaders: Custom headers to set in the request. HTTP allows repeated headers.
port: Name or number of the port to access on the container. Number must be in the range 1 to 65535.

The picture below is a case where the readiness and liveness probes are enabled. The POD is initiated and in the second 10 the readiness probe get OK and the traffic is enabled to the POD. After 10 seconds, the service is not working and the readiness probe detects it. Note that the readiness probe needs 3 tries (one try each 5 seconds that will be 15 seconds) to not send more traffic to the POD. In parallel we have the liveness probe that is fetching the status of the POD each 10 seconds and require 30 seconds (Three retries) to kill the POD.

If the service is recovered before liveness probe schedule it for deletion, then the readiness probe will mark as OK and send again traffic to the POD.

Liveness and readiness probes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Liveness and Readiness probes

Liveness and Readiness probes

Concepts

Liveness probe:

Readiness probe:

Probes types:

Liveness probe example, the tng-rep microservice:

HTTP:

TCP:

Command:

Probes specific parameters:

Clone this wiki locally