Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-v1.71] Fix the slow startup and termination of the vali-0 pod #7982

Commits on May 25, 2023

  1. Fix the slow startup and termination of the vali-0 pod

    The vali-0 pod takes more than a minute to start up and 30s to terminate
    eventually, "ungracefully". It turns out that this is the result of a few
    configuration issues.
    
    <details>
    <summary>Startup phase</summary>
    
    When vali starts up in the default configuration, it waits for 1 minute until it
    reports a ready status via its `/ready` endpoint. This is reproducible directly
    with the docker image:
    
    ```text
    docker run --rm -it -p 3100:3100 ghcr.io/credativ/vali:v2.2.5
    while true; do curl localhost:3100/ready; sleep 1; done
    Ingester not ready: waiting for 1m0s after startup
    ...
    ```
    
    > This is to work around race conditions with ingesters exiting and updating the
    > ring.
    
    https://github.com/credativ/vali/blob/d8985fc135e85518a5829b255afa7757d4714349/vendor/github.com/cortexproject/cortex/pkg/ring/lifecycler.go#L89
    
    The reasoning behind this default value is not applicable in the Gardener setup
    of vali. Vali is executed in "single binary mode", so there are no other
    ingester instances involved. Hence we can set this delay to a lower value.
    
    ```text
    docker run --rm --entrypoint sh ghcr.io/credativ/vali:v2.2.5 -c "cat /etc/vali/local-config.yaml" | sed '/lifecycler:/a \    min_ready_duration: 1s' > local-config.yaml
    docker run --rm -it -p 3100:3100 -v ./local-config.yaml:/etc/vali/local-config.yaml ghcr.io/credativ/vali:v2.2.5
    while true; do curl localhost:3100/ready; sleep 1; done
    Ingester not ready: waiting for 1s after startup
    ready
    ...
    ```
    
    Furthermore, the `initialDelaySeconds` setting of the readiness probe of vali is
    removed to let it default to 0s: even instances with significant amount of logs
    actually start up in less than 1s. There is no need to delay the initial
    readiness probe.
    
    </details>
    
    <details>
    <summary>Termination phase</summary>
    
    The `final-sleep` configuration parameter is changed to 0s from its default
    value of 30s. The default value of 30s meant that due to the same default 30s
    pod termination grace period, whenever vali was gracefully stopped (received a
    SIGTERM signal), it waited for 30(+epsilon)s until it would have shut down
    gracefully. However, shortly before that (after 30s), the kubelet sends a SIGKILL
    to forcefully stop the process: an undesired unclean shutdown, each and every
    time.
    
    The motivation of allowing for a final scrape of the exposed metrics before the
    shutdown is not relevant in the Gardener case, given that the Prometheus scrape
    interval is 1m anyways. So this commit changes the `final-sleep` configuration
    to 0s, which happens to be the default value in the "single binary mode".
    
    https://github.com/credativ/vali/blob/d8985fc135e85518a5829b255afa7757d4714349/vendor/github.com/cortexproject/cortex/pkg/ring/lifecycler.go#L90
    https://github.com/credativ/vali/blob/d8985fc135e85518a5829b255afa7757d4714349/cmd/vali/vali-local-config.yaml#L17
    
    Furthermore, the telegraf container of the vali-0 pod in the control plane
    namespace was not properly handling the SIGTERM signal: it was always killed by
    the kubelet after 30s with a SIGKILL. This was the other reason of the slow
    termination of the vali-0 pod. By letting bash execute the telegraf process in
    the background and waiting for it indefinitely with wait, the bash process with
    PID 1 can handle the SIGTERM signal that is sent by the kubelet. It sends a
    SIGTERM signal to its child bash process upon receiving a SIGTERM signal. This
    in turn handles the SIGTERM signal as well and sends it to its telegraf child
    process. Both bash processes wait for their child process to terminate. This way
    all the processes of the telegraf container terminate gracefully and in less
    than 1s when the SIGTERM signal is sent by the kubelet.
    
    Note that testing the signal propagation in the "minimized" telegraf image is
    tricky. On a linux based "local setup" where the "nested" docker&containerd
    containers are just regular processes of the host, the following approach can
    help to understand the signal propagation.
    
    Find the 3 processes of the telegraf container from the _host_:
    
    ```text
    ps aux | grep telegraf
    
    root 2483208 /bin/bash -c trap 'kill %1; wait' SIGTERM bash /etc/telegraf/start.sh & wait
    root 2483255 bash /etc/telegraf/start.sh
    root 2483260 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
    root 2494748 grep telegraf
    ```
    
    Attach strace to the 3 processes with `strace -p <pid>`. Send a SIGTERM signal
    to PID 1 with `k exec vali-0 -c telegraf -- bash -c "kill 1"` to check the
    signal propagation.
    
    </details>
    
    <details>
    <summary>With this change, the vali-0 pod starts up and terminates in less than 10
    seconds in the examples below.</summary>
    
    ```text
    k get pods vali-0 -w | ts
    
    17:25:09 vali-0   4/4     Terminating
    17:25:11 vali-0   4/4     Terminating
    17:25:11 vali-0   0/4     Terminating
    17:25:11 vali-0   0/4     Terminating
    17:25:11 vali-0   0/4     Terminating
    17:25:11 vali-0   0/4     Pending
    17:25:11 vali-0   0/4     Pending
    17:25:11 vali-0   0/4     Init:0/2
    17:25:12 vali-0   0/4     Init:0/2
    17:25:13 vali-0   0/4     Init:1/2
    17:25:14 vali-0   0/4     PodInitializing
    17:25:15 vali-0   3/4     Running
    17:25:16 vali-0   4/4     Running
    
    17:27:38 vali-0   2/2     Terminating
    17:27:41 vali-0   2/2     Terminating
    17:27:42 vali-0   0/2     Terminating
    17:27:42 vali-0   0/2     Terminating
    17:27:42 vali-0   0/2     Terminating
    17:27:42 vali-0   0/2     Pending
    17:27:42 vali-0   0/2     Pending
    17:27:42 vali-0   0/2     Init:0/2
    17:27:42 vali-0   0/2     Init:0/2
    17:27:43 vali-0   0/2     Init:1/2
    17:27:44 vali-0   0/2     PodInitializing
    17:27:45 vali-0   1/2     Running
    17:27:46 vali-0   2/2     Running
    ```
    
    </details>
    
    Co-authored-by: Istvan Zoltan Ballok <istvan.zoltan.ballok@sap.com>
    Co-authored-by: Jeremy Rickards <jeremy.rickards@sap.com>
    2 people authored and gardener-ci-robot committed May 25, 2023
    Configuration menu
    Copy the full SHA
    406b310 View commit details
    Browse the repository at this point in the history