Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readiness probe stops too early at eviction #124648

Open
hshiina opened this issue Apr 30, 2024 · 6 comments · May be fixed by #125558
Open

Readiness probe stops too early at eviction #124648

hshiina opened this issue Apr 30, 2024 · 6 comments · May be fixed by #125558
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@hshiina
Copy link

hshiina commented Apr 30, 2024

What happened?

When kubelet evicts a pod, the ready condition doesn’t get NotReady during the pod termination even if a readinessProbe is configured.

What did you expect to happen?

A readiness probe works during a pod termination so that the pod gets NotReady as early as possible.

How can we reproduce it (as minimally and precisely as possible)?

Use this readiness.yaml:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: readiness-script-configmap
data:
  readiness-script.sh: |
    #!/bin/sh
    handler() {
      rm /tmp/ready
      sleep 20
    }
    touch /tmp/ready
    trap handler SIGTERM
    while true; do
      sleep 1
    done
---
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: readiness-container
  name: readiness-test
spec:
  containers:
  - command:
    - sh
    - /script/readiness-script.sh
    image: busybox
    name: readiness
    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/ready
      initialDelaySeconds: 3
      periodSeconds: 3
    volumeMounts:
    - name: readiness-script
      mountPath: /script
  - command:
    - sh
    - -c
    - for i in `seq 100`; do dd if=/dev/random of=file${i} bs=1048576 count=1 2>/dev/null; sleep .1; done; while true; do sleep 5; done
    name: disk-consumer
    image: busybox
    resources:
      limits:
        ephemeral-storage: "100Mi"
      requests:
        ephemeral-storage: "100Mi"
  volumes:
  - name: readiness-script
    configMap:
      name: readiness-script-configmap

Create resources and wait an eviction happens:

$ kubectl create -f readiness.yaml; kubectl get pods readiness-test -w
configmap/readiness-script-configmap created
pod/readiness-test created
NAME         	READY   STATUS          	RESTARTS   AGE
readiness-test   0/2 	ContainerCreating   0      	0s
readiness-test   1/2 	Running         	0      	3s
readiness-test   2/2 	Running         	0      	7s
readiness-test   0/2 	Error           	0      	46s
readiness-test   0/2 	Error           	0      	47s

When deleting this pod, the readiness probe works during termination:

$ kubectl create -f readiness.yaml; (sleep 15; kubectl delete pod readiness-test) & kubectl get pods -w
configmap/readiness-script-configmap created
pod/readiness-test created
[1] 137999
NAME             READY   STATUS              RESTARTS   AGE
readiness-test   0/2     ContainerCreating   0          0s
readiness-test   1/2     Running             0          2s
readiness-test   2/2     Running             0          6s
pod "readiness-test" deleted
readiness-test   2/2     Terminating         0          15s
readiness-test   1/2     Terminating         0          21s
readiness-test   0/2     Terminating         0          45s
readiness-test   0/2     Terminating         0          45s
readiness-test   0/2     Terminating         0          45s
readiness-test   0/2     Terminating         0          45s

Anything else we need to know?

I guess this issue is caused as follows:

At eviction, a pod phase is set to PodFailed internally in podStatusFn before stopping containers in the pod:

if podStatusFn != nil {
podStatusFn(&apiPodStatus)
}
kl.statusManager.SetPodStatus(pod, apiPodStatus)

err := m.killPodFunc(pod, true, &gracePeriodOverride, func(status *v1.PodStatus) {
status.Phase = v1.PodFailed
status.Reason = Reason
status.Message = evictMsg

Because the internal pod phase is PodFailed, the probe worker finishes working without probing containers at termination:

status, ok := w.probeManager.statusManager.GetPodStatus(w.pod.UID)
if !ok {
// Either the pod has not been created yet, or it was already deleted.
klog.V(3).InfoS("No status for pod", "pod", klog.KObj(w.pod))
return true
}
// Worker should terminate if pod is terminated.
if status.Phase == v1.PodFailed || status.Phase == v1.PodSucceeded {
klog.V(3).InfoS("Pod is terminated, exiting probe worker",
"pod", klog.KObj(w.pod), "phase", status.Phase)
return false
}

Kubernetes version

$ kubectl version
# paste output here

Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

Cloud provider

None. I reproduced on my laptop.

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

kind

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@hshiina hshiina added the kind/bug Categorizes issue or PR as related to a bug. label Apr 30, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 30, 2024
@hshiina
Copy link
Author

hshiina commented Apr 30, 2024

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 30, 2024
@SergeyKanzhelev SergeyKanzhelev added this to Triage in SIG Node Bugs May 1, 2024
@AnishShah
Copy link
Contributor

/triage accepted
/priority backlog

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 1, 2024
@AnishShah AnishShah moved this from Triage to Triaged in SIG Node Bugs May 1, 2024
@bitoku
Copy link
Contributor

bitoku commented May 2, 2024

/assign

@aojea
Copy link
Member

aojea commented May 15, 2024

In any case it seems this behavior will leave the pod out of the endpoints #110255 so it does not seem related to #116965

@smarterclayton
Copy link
Contributor

I would expect readiness probes to continue to work after a pod is terminated - the ready condition is orthogonal to pod deletion status. Once the ready condition becomes false, the probe can be stopped. If the ready condition is already false, we can stop the probe right away.

External system components can use deletion timestamp to make a decision independent of ready (and historically do), but since we added unreadyEndpoints it is appropriate for kubelet to probe ready status until such a time as the probe returns false (or if the pod never started, etc).

@smarterclayton
Copy link
Contributor

For eviction, you could maybe argue that stopping the probe is justified, but not for graceful shutdown. I think that code is simply out of date and a relic of a period before the kubelet pod workers had a more central role in the lifecycle of the pod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging a pull request may close this issue.

6 participants