Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop container in unknown state before recreate or remove. #73802

Merged
merged 1 commit into from Feb 14, 2019

Conversation

@Random-Liu
Copy link
Member

Random-Liu commented Feb 7, 2019

Fixes #69060.

This makes sure that containers (both regular container and init container) in unknown state will be successfully stopped once before restart or remove.

Please note that based CRI StopContainer "must not return an error if the container has stopped.", so even if the container in unknown state is already stopped, we should not get an error. This is true for both containerd and Docker. /cc @mrunalp Is this true for cri-o?

Please also note that an existing behavior that KillPod and podKiller doesn't kill containers in unknown state. https://github.com/kubernetes/kubernetes/blob/v1.14.0-alpha.2/pkg/kubelet/kubelet_pods.go#L788 (both runningPod and ConvertPodStatusToRunningPod don't include containers in unknown state). This is EXPECTED.

The reason is that we don't know whether containers in unknown state will become exited after a successful stop, this is not defined in CRI yet, and hard to enforce. Although this is true for containerd, but it may not be true for other runtimes, e.g. for docker Dead state:

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
8f431ffdec45        alpine              "sh"                41 seconds ago      Dead                                    dead-experiment
$ docker stop 8
8
$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED              STATUS              PORTS               NAMES
8f431ffdec45        alpine              "sh"                About a minute ago   Dead                                    dead-experiment

remove and recreate doesn't expect the container state to transit into exited, they just expect the stop not to return error. Once the container is removed or recreated, we'll not care about the unknown state any more.

However, for killPod or killContainer does expect that. Or else, when we kill a container in unknown state, we may end up killing the container again and again, because its state may stay unknown.

Kubelet now tries to stop containers in unknown state once before restart or remove.

/cc @kubernetes/sig-node-pr-reviews

@Random-Liu

This comment has been minimized.

Copy link
Member Author

Random-Liu commented Feb 7, 2019

I've verified this change with containerd:

$ crictl ps -a
CONTAINER ID        IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID
d53dd39fdbdac       6dc8ef8287d38       49 years ago        Unknown             dnsmasq             0                   778e8a4e83150
9989c77318d0c       4b2e93f0133d3       49 years ago        Unknown             sidecar             0                   778e8a4e83150
aeb84b2a6394c       55a3c5209c5ea       49 years ago        Unknown             kubedns             0                   778e8a4e83150
$ crictl ps -a
CONTAINER ID        IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID
a22d1329eac45       4b2e93f0133d3       2 seconds ago       Running             sidecar             1                   778e8a4e83150
3736ab301c548       6dc8ef8287d38       2 seconds ago       Running             dnsmasq             1                   778e8a4e83150
126c900939bc7       55a3c5209c5ea       3 seconds ago       Running             kubedns             1                   778e8a4e83150
9989c77318d0c       4b2e93f0133d3       49 years ago        Exited              sidecar             0                   778e8a4e83150
aeb84b2a6394c       55a3c5209c5ea       49 years ago        Exited              kubedns             0                   778e8a4e83150
d53dd39fdbdac       6dc8ef8287d38       49 years ago        Exited              dnsmasq             0                   778e8a4e83150

@Random-Liu Random-Liu force-pushed the Random-Liu:handle-unknown-state branch from 0a50ca6 to 51e2f2c Feb 7, 2019

@Random-Liu

This comment has been minimized.

Copy link
Member Author

Random-Liu commented Feb 11, 2019

/test pull-kubernetes-e2e-gce

@@ -692,20 +697,21 @@ func (m *kubeGenericRuntimeManager) purgeInitContainers(pod *v1.Pod, podStatus *
}

// findNextInitContainerToRun returns the status of the last failed container, the
// next init container to start, or done if there are no further init containers.
// index of next init container to start, or done if there are no further init containers.

This comment has been minimized.

@yujuhong

yujuhong Feb 12, 2019

Member

Why changing this to index?

This comment has been minimized.

@Random-Liu

Random-Liu Feb 14, 2019

Author Member

Sorry, I remember there was a reason... But I can't remember it now... Will change it back.

This comment has been minimized.

@Random-Liu

Random-Liu Feb 14, 2019

Author Member

Done

@yujuhong

This comment has been minimized.

Copy link
Member

yujuhong commented Feb 12, 2019

Could you also make sure dockershim correctly returns no error when trying to remove a "dead" container?

@Random-Liu Random-Liu force-pushed the Random-Liu:handle-unknown-state branch from 51e2f2c to 48a2043 Feb 14, 2019

@k8s-ci-robot k8s-ci-robot added the size/L label Feb 14, 2019

@Random-Liu Random-Liu force-pushed the Random-Liu:handle-unknown-state branch from 48a2043 to de8ee94 Feb 14, 2019

@Random-Liu

This comment has been minimized.

Copy link
Member Author

Random-Liu commented Feb 14, 2019

Could you also make sure dockershim correctly returns no error when trying to remove a "dead" container?

Do you mean stop or remove? Stop is validated in the PR description. For remove, I can try to do it today. :)

@yujuhong

This comment has been minimized.

Copy link
Member

yujuhong commented Feb 14, 2019

Do you mean stop or remove? Stop is validated in the PR description. For remove, I can try to do it today. :)

Stop, actually. I missed the PR description. That's exactly what I wanted.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 14, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 14, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Random-Liu, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Feb 14, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 14, 2019

@Random-Liu: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-local-e2e-containerized de8ee94 link /test pull-kubernetes-local-e2e-containerized

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot merged commit 6a9902d into kubernetes:master Feb 14, 2019

16 of 17 checks passed

pull-kubernetes-local-e2e-containerized Job failed.
Details
cla/linuxfoundation Random-Liu authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-godeps Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.