Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop container in unknown state before recreate or remove. #73802

Merged
merged 1 commit into from Feb 14, 2019

Conversation

Random-Liu
Copy link
Member

@Random-Liu Random-Liu commented Feb 7, 2019

Fixes #69060.

This makes sure that containers (both regular container and init container) in unknown state will be successfully stopped once before restart or remove.

Please note that based CRI StopContainer "must not return an error if the container has stopped.", so even if the container in unknown state is already stopped, we should not get an error. This is true for both containerd and Docker. /cc @mrunalp Is this true for cri-o?

Please also note that an existing behavior that KillPod and podKiller doesn't kill containers in unknown state. https://github.com/kubernetes/kubernetes/blob/v1.14.0-alpha.2/pkg/kubelet/kubelet_pods.go#L788 (both runningPod and ConvertPodStatusToRunningPod don't include containers in unknown state). This is EXPECTED.

The reason is that we don't know whether containers in unknown state will become exited after a successful stop, this is not defined in CRI yet, and hard to enforce. Although this is true for containerd, but it may not be true for other runtimes, e.g. for docker Dead state:

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
8f431ffdec45        alpine              "sh"                41 seconds ago      Dead                                    dead-experiment
$ docker stop 8
8
$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED              STATUS              PORTS               NAMES
8f431ffdec45        alpine              "sh"                About a minute ago   Dead                                    dead-experiment

remove and recreate doesn't expect the container state to transit into exited, they just expect the stop not to return error. Once the container is removed or recreated, we'll not care about the unknown state any more.

However, for killPod or killContainer does expect that. Or else, when we kill a container in unknown state, we may end up killing the container again and again, because its state may stay unknown.

Kubelet now tries to stop containers in unknown state once before restart or remove.

/cc @kubernetes/sig-node-pr-reviews

@Random-Liu Random-Liu added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Feb 7, 2019
@Random-Liu Random-Liu added this to the v1.14 milestone Feb 7, 2019
@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 7, 2019
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Feb 7, 2019
@Random-Liu
Copy link
Member Author

I've verified this change with containerd:

$ crictl ps -a
CONTAINER ID        IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID
d53dd39fdbdac       6dc8ef8287d38       49 years ago        Unknown             dnsmasq             0                   778e8a4e83150
9989c77318d0c       4b2e93f0133d3       49 years ago        Unknown             sidecar             0                   778e8a4e83150
aeb84b2a6394c       55a3c5209c5ea       49 years ago        Unknown             kubedns             0                   778e8a4e83150
$ crictl ps -a
CONTAINER ID        IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID
a22d1329eac45       4b2e93f0133d3       2 seconds ago       Running             sidecar             1                   778e8a4e83150
3736ab301c548       6dc8ef8287d38       2 seconds ago       Running             dnsmasq             1                   778e8a4e83150
126c900939bc7       55a3c5209c5ea       3 seconds ago       Running             kubedns             1                   778e8a4e83150
9989c77318d0c       4b2e93f0133d3       49 years ago        Exited              sidecar             0                   778e8a4e83150
aeb84b2a6394c       55a3c5209c5ea       49 years ago        Exited              kubedns             0                   778e8a4e83150
d53dd39fdbdac       6dc8ef8287d38       49 years ago        Exited              dnsmasq             0                   778e8a4e83150

@Random-Liu
Copy link
Member Author

/test pull-kubernetes-e2e-gce

@@ -692,20 +697,21 @@ func (m *kubeGenericRuntimeManager) purgeInitContainers(pod *v1.Pod, podStatus *
}

// findNextInitContainerToRun returns the status of the last failed container, the
// next init container to start, or done if there are no further init containers.
// index of next init container to start, or done if there are no further init containers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing this to index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I remember there was a reason... But I can't remember it now... Will change it back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@yujuhong
Copy link
Contributor

Could you also make sure dockershim correctly returns no error when trying to remove a "dead" container?

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 14, 2019
@Random-Liu
Copy link
Member Author

Could you also make sure dockershim correctly returns no error when trying to remove a "dead" container?

Do you mean stop or remove? Stop is validated in the PR description. For remove, I can try to do it today. :)

@yujuhong
Copy link
Contributor

Do you mean stop or remove? Stop is validated in the PR description. For remove, I can try to do it today. :)

Stop, actually. I missed the PR description. That's exactly what I wanted.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 14, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Random-Liu, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Feb 14, 2019

@Random-Liu: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-local-e2e-containerized de8ee94 link /test pull-kubernetes-local-e2e-containerized

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants