Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-v1.71] Fix the slow startup and termination of the vali-0 pod #7982

Conversation

gardener-ci-robot
Copy link
Contributor

This is an automated cherry-pick of #7979

/assign shafeeqes

A configuration issue that resulted in a relatively slow startup and termination of the vali pods is fixed.

The vali-0 pod takes more than a minute to start up and 30s to terminate
eventually, "ungracefully". It turns out that this is the result of a few
configuration issues.

<details>
<summary>Startup phase</summary>

When vali starts up in the default configuration, it waits for 1 minute until it
reports a ready status via its `/ready` endpoint. This is reproducible directly
with the docker image:

```text
docker run --rm -it -p 3100:3100 ghcr.io/credativ/vali:v2.2.5
while true; do curl localhost:3100/ready; sleep 1; done
Ingester not ready: waiting for 1m0s after startup
...
```

> This is to work around race conditions with ingesters exiting and updating the
> ring.

https://github.com/credativ/vali/blob/d8985fc135e85518a5829b255afa7757d4714349/vendor/github.com/cortexproject/cortex/pkg/ring/lifecycler.go#L89

The reasoning behind this default value is not applicable in the Gardener setup
of vali. Vali is executed in "single binary mode", so there are no other
ingester instances involved. Hence we can set this delay to a lower value.

```text
docker run --rm --entrypoint sh ghcr.io/credativ/vali:v2.2.5 -c "cat /etc/vali/local-config.yaml" | sed '/lifecycler:/a \    min_ready_duration: 1s' > local-config.yaml
docker run --rm -it -p 3100:3100 -v ./local-config.yaml:/etc/vali/local-config.yaml ghcr.io/credativ/vali:v2.2.5
while true; do curl localhost:3100/ready; sleep 1; done
Ingester not ready: waiting for 1s after startup
ready
...
```

Furthermore, the `initialDelaySeconds` setting of the readiness probe of vali is
removed to let it default to 0s: even instances with significant amount of logs
actually start up in less than 1s. There is no need to delay the initial
readiness probe.

</details>

<details>
<summary>Termination phase</summary>

The `final-sleep` configuration parameter is changed to 0s from its default
value of 30s. The default value of 30s meant that due to the same default 30s
pod termination grace period, whenever vali was gracefully stopped (received a
SIGTERM signal), it waited for 30(+epsilon)s until it would have shut down
gracefully. However, shortly before that (after 30s), the kubelet sends a SIGKILL
to forcefully stop the process: an undesired unclean shutdown, each and every
time.

The motivation of allowing for a final scrape of the exposed metrics before the
shutdown is not relevant in the Gardener case, given that the Prometheus scrape
interval is 1m anyways. So this commit changes the `final-sleep` configuration
to 0s, which happens to be the default value in the "single binary mode".

https://github.com/credativ/vali/blob/d8985fc135e85518a5829b255afa7757d4714349/vendor/github.com/cortexproject/cortex/pkg/ring/lifecycler.go#L90
https://github.com/credativ/vali/blob/d8985fc135e85518a5829b255afa7757d4714349/cmd/vali/vali-local-config.yaml#L17

Furthermore, the telegraf container of the vali-0 pod in the control plane
namespace was not properly handling the SIGTERM signal: it was always killed by
the kubelet after 30s with a SIGKILL. This was the other reason of the slow
termination of the vali-0 pod. By letting bash execute the telegraf process in
the background and waiting for it indefinitely with wait, the bash process with
PID 1 can handle the SIGTERM signal that is sent by the kubelet. It sends a
SIGTERM signal to its child bash process upon receiving a SIGTERM signal. This
in turn handles the SIGTERM signal as well and sends it to its telegraf child
process. Both bash processes wait for their child process to terminate. This way
all the processes of the telegraf container terminate gracefully and in less
than 1s when the SIGTERM signal is sent by the kubelet.

Note that testing the signal propagation in the "minimized" telegraf image is
tricky. On a linux based "local setup" where the "nested" docker&containerd
containers are just regular processes of the host, the following approach can
help to understand the signal propagation.

Find the 3 processes of the telegraf container from the _host_:

```text
ps aux | grep telegraf

root 2483208 /bin/bash -c trap 'kill %1; wait' SIGTERM bash /etc/telegraf/start.sh & wait
root 2483255 bash /etc/telegraf/start.sh
root 2483260 /usr/bin/telegraf --config /etc/telegraf/telegraf.conf
root 2494748 grep telegraf
```

Attach strace to the 3 processes with `strace -p <pid>`. Send a SIGTERM signal
to PID 1 with `k exec vali-0 -c telegraf -- bash -c "kill 1"` to check the
signal propagation.

</details>

<details>
<summary>With this change, the vali-0 pod starts up and terminates in less than 10
seconds in the examples below.</summary>

```text
k get pods vali-0 -w | ts

17:25:09 vali-0   4/4     Terminating
17:25:11 vali-0   4/4     Terminating
17:25:11 vali-0   0/4     Terminating
17:25:11 vali-0   0/4     Terminating
17:25:11 vali-0   0/4     Terminating
17:25:11 vali-0   0/4     Pending
17:25:11 vali-0   0/4     Pending
17:25:11 vali-0   0/4     Init:0/2
17:25:12 vali-0   0/4     Init:0/2
17:25:13 vali-0   0/4     Init:1/2
17:25:14 vali-0   0/4     PodInitializing
17:25:15 vali-0   3/4     Running
17:25:16 vali-0   4/4     Running

17:27:38 vali-0   2/2     Terminating
17:27:41 vali-0   2/2     Terminating
17:27:42 vali-0   0/2     Terminating
17:27:42 vali-0   0/2     Terminating
17:27:42 vali-0   0/2     Terminating
17:27:42 vali-0   0/2     Pending
17:27:42 vali-0   0/2     Pending
17:27:42 vali-0   0/2     Init:0/2
17:27:42 vali-0   0/2     Init:0/2
17:27:43 vali-0   0/2     Init:1/2
17:27:44 vali-0   0/2     PodInitializing
17:27:45 vali-0   1/2     Running
17:27:46 vali-0   2/2     Running
```

</details>

Co-authored-by: Istvan Zoltan Ballok <istvan.zoltan.ballok@sap.com>
Co-authored-by: Jeremy Rickards <jeremy.rickards@sap.com>
@gardener-prow gardener-prow bot added cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels May 25, 2023
@shafeeqes
Copy link
Contributor

/kind enhancement
/lgtm
/approve

@gardener-prow gardener-prow bot added kind/enhancement Enhancement, improvement, extension size/S Denotes a PR that changes 10-29 lines, ignoring generated files. lgtm Indicates that a PR is ready to be merged. labels May 25, 2023
@gardener-prow
Copy link
Contributor

gardener-prow bot commented May 25, 2023

LGTM label has been added.

Git tree hash: 0341b5c6c9f01490dd0499380b763803f0920d40

@gardener-prow gardener-prow bot removed the do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label May 25, 2023
@gardener-prow
Copy link
Contributor

gardener-prow bot commented May 25, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shafeeqes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 25, 2023
@gardener-prow
Copy link
Contributor

gardener-prow bot commented May 25, 2023

@gardener-ci-robot: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gardener-e2e-kind-ha-multi-zone-upgrade-release-v1-71 406b310 link false /test pull-gardener-e2e-kind-ha-multi-zone-upgrade-release-v1-71
pull-gardener-e2e-kind-ha-single-zone-upgrade-release-v1-71 406b310 link false /test pull-gardener-e2e-kind-ha-single-zone-upgrade-release-v1-71

Full PR test history. Your PR dashboard. Command help for this repository.
Please help us cut down on flakes by linking this test failure to an open flake report or filing a new flake report if you can't find an existing one. Also see our testing guideline for how to avoid and hunt flakes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@gardener-prow gardener-prow bot merged commit f9463f9 into gardener:release-v1.71 May 25, 2023
13 of 15 checks passed
@gardener-ci-robot gardener-ci-robot deleted the cherry-pick-7979-to-release-v1.71 branch July 14, 2023 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants