Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backoff restart container with liveness probing failure #22241

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
15 changes: 11 additions & 4 deletions pkg/kubelet/dockertools/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -1949,9 +1949,17 @@ func getUidFromUser(id string) string {
// backoff deadline. However, because that won't cause error and the chance is really slim, we can just ignore it for now.
// If a container is still in backoff, the function will return a brief backoff error and a detailed error message.
func (dm *DockerManager) doBackOff(pod *api.Pod, container *api.Container, podStatus *kubecontainer.PodStatus, backOff *util.Backoff) (bool, error, string) {
containerStatus := podStatus.FindContainerStatusByName(container.Name)
if containerStatus != nil && containerStatus.State == kubecontainer.ContainerStateExited && !containerStatus.FinishedAt.IsZero() {
ts := containerStatus.FinishedAt
var cStatus *kubecontainer.ContainerStatus
// Use the finished time of the latest exited container as the start point to calculate whether to do back-off.
// TODO(random-liu): Better define backoff start point; add unit and e2e test after we finalize this. (See github issue #22240)
for _, c := range podStatus.ContainerStatuses {
if c.Name == container.Name && c.State == kubecontainer.ContainerStateExited {
cStatus = c
break
}
}
if cStatus != nil {
ts := cStatus.FinishedAt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is this check no longer required? - !containerStatus.FinishedAt.IsZero()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original PR, FinishedAt.IsZero() is used to check whether container status is found in the previous loop. I kept it before just in case, but in fact it is not required now. :)

// found a container that requires backoff
dockerName := KubeletContainerName{
PodFullName: kubecontainer.GetPodFullName(pod),
Expand All @@ -1968,7 +1976,6 @@ func (dm *DockerManager) doBackOff(pod *api.Pod, container *api.Container, podSt
return true, kubecontainer.ErrCrashLoopBackOff, err.Error()
}
backOff.Next(stableName, ts)

}
return false, nil, ""
}
Expand Down
43 changes: 1 addition & 42 deletions test/e2e/pods.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ import (
"bytes"
"fmt"
"io"
"math"
"strconv"
"strings"
"time"
Expand Down Expand Up @@ -199,7 +198,7 @@ func getRestartDelay(c *client.Client, pod *api.Pod, ns string, name string, con
if status.State.Waiting == nil && status.State.Running != nil && status.LastTerminationState.Terminated != nil && status.State.Running.StartedAt.Time.After(beginTime) {
startedAt := status.State.Running.StartedAt.Time
finishedAt := status.LastTerminationState.Terminated.FinishedAt.Time
Logf("getRestartDelay: finishedAt=%s restartedAt=%s (%s)", finishedAt, startedAt, startedAt.Sub(finishedAt))
Logf("getRestartDelay: restartCount = %d, finishedAt=%s restartedAt=%s (%s)", status.RestartCount, finishedAt, startedAt, startedAt.Sub(finishedAt))
return startedAt.Sub(finishedAt), nil
}
}
Expand Down Expand Up @@ -982,46 +981,6 @@ var _ = Describe("Pods", func() {
}
})

It("should not back-off restarting a container on LivenessProbe failure [Serial]", func() {
podClient := framework.Client.Pods(framework.Namespace.Name)
podName := "pod-back-off-liveness"
containerName := "back-off-liveness"
pod := &api.Pod{
ObjectMeta: api.ObjectMeta{
Name: podName,
Labels: map[string]string{"test": "liveness"},
},
Spec: api.PodSpec{
Containers: []api.Container{
{
Name: containerName,
Image: "gcr.io/google_containers/busybox:1.24",
Command: []string{"/bin/sh", "-c", "echo ok >/tmp/health; sleep 5; rm -rf /tmp/health; sleep 600"},
LivenessProbe: &api.Probe{
Handler: api.Handler{
Exec: &api.ExecAction{
Command: []string{"cat", "/tmp/health"},
},
},
InitialDelaySeconds: 5,
},
},
},
},
}

defer func() {
By("deleting the pod")
podClient.Delete(pod.Name, api.NewDeleteOptions(0))
}()

delay1, delay2 := startPodAndGetBackOffs(framework, pod, podName, containerName, buildBackOffDuration)

if math.Abs(float64(delay2-delay1)) > float64(syncLoopFrequency) {
Failf("back-off increasing on LivenessProbe failure delay1=%s delay2=%s", delay1, delay2)
}
})

// Slow issue #19027 (20 mins)
It("should cap back-off at MaxContainerBackOff [Slow]", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing this test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed with @dchen1107 , and we didn't quite understand why kubelet should not back-off restarting a container on LivenessProbe failure, and in fact the test is a little out-of-date here. So I just removed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this test is because it tests a wrong thing.

podClient := framework.Client.Pods(framework.Namespace.Name)
Expand Down