Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove containers from deleted pods once containers have exited #53233

Merged
merged 1 commit into from
Oct 4, 2017

Conversation

dashpole
Copy link
Contributor

@dashpole dashpole commented Sep 28, 2017

Issue #51899
Since container deletion is currently done through periodic garbage collection every 30 seconds, it takes a long time for pods to be deleted, and causes the kubelet to send all delete pod requests at the same time, which has performance issues. This PR makes the kubelet actively remove containers of deleted pods rather than wait for them to be removed in periodic garbage collection.

Fixes a performance issue (#51899) identified in large-scale clusters when deleting thousands of pods simultaneously across hundreds of nodes, by actively removing containers of deleted pods, rather than waiting for periodic garbage collection and batching resulting pod API deletion requests.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 28, 2017
@k8s-github-robot k8s-github-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Sep 28, 2017
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 29, 2017
@dashpole
Copy link
Contributor Author

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 29, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 29, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 29, 2017
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 29, 2017
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 2, 2017
@dashpole dashpole changed the title [WIP] Actively remove containers from deleted pods Remove containers from deleted pods once containers have exited Oct 2, 2017
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 2, 2017
@dashpole
Copy link
Contributor Author

dashpole commented Oct 2, 2017

I tested this PR by doing the following:
I disabled periodic container gc by commenting out that code (just for my testing). I did this to ensure this PR is what is causing containers to be cleaned up.
I ran a node-e2e test which creates and deletes a pod.

I checked the kubelet log during the test, and found that the pod is stopped, cleaned up, and removed within 2 seconds of observing the deletion timestamp:
16:55:59 SyncLoop (DELETE ...
16:56:00 GenericPLEG ... running -> exited
16:56:00 Removing container ...
16:56:01 Volume detached for volume ...
16:56:01 GenericPLEG ... exited -> non-existent
16:56:01 ... fully terminated and removed from etcd

@dashpole
Copy link
Contributor Author

dashpole commented Oct 2, 2017

flake seems unrelated:
error: failed to run Kubelet: cannot create certificate signing request: Post https://35.184.83.219/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: dial tcp 35.184.83.219:443: getsockopt: connection refused

@dashpole
Copy link
Contributor Author

dashpole commented Oct 2, 2017

/test pull-kubernetes-e2e-gce-etcd3

@shyamjvs
Copy link
Member

shyamjvs commented Oct 3, 2017

Seems to help quite a bit:

without the PR:

Oct  3 17:14:21.822: INFO: WARNING Top latency metric: {Resource:pods Subresource: Verb:DELETE Scope:namespace Latency:{Perc50:4.445ms Perc90:612.189ms Perc99:5.041177s Perc100:0s} Count:124002}

with the PR:

Oct  3 14:12:06.169: INFO: Top latency metric: {Resource:pods Subresource: Verb:DELETE Scope:namespace Latency:{Perc50:3.77ms Perc90:8.846ms Perc99:426.462ms Perc100:0s} Count:124000}

@shyamjvs
Copy link
Member

shyamjvs commented Oct 3, 2017

@dashpole @dchen1107 Can we have this PR in soon'ish? Need to get the scale tests back to green.

@smarterclayton
Copy link
Contributor

No further comments from me?

@wojtek-t
Copy link
Member

wojtek-t commented Oct 3, 2017

LGTM

/approve no-issue

@wojtek-t wojtek-t added this to the v1.8 milestone Oct 3, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 3, 2017
@wojtek-t
Copy link
Member

wojtek-t commented Oct 3, 2017

Once merged we should cherrypick it to 1.8

@smarterclayton smarterclayton added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 3, 2017
@dchen1107
Copy link
Member

/lgtm

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, dchen1107, wojtek-t

Associated issue: 51899

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 53403, 53233). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 9386228 into kubernetes:master Oct 4, 2017
@liggitt
Copy link
Member

liggitt commented Oct 4, 2017

pick to 1.8 opened at #53422, and added release note (please review/tweak language as desired)

@liggitt liggitt added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Oct 4, 2017
k8s-github-robot pushed a commit that referenced this pull request Oct 4, 2017
…3-upstream-release-1.8

Automatic merge from submit-queue.

Automated cherry pick of #53233

Fixes #51899

Cherry pick of #53233 on release-1.8.

#53233: remove containers of deleted pods once all containers have

```release-note
Fixes a performance issue (#51899) identified in large-scale clusters when deleting thousands of pods simultaneously across hundreds of nodes, by actively removing containers of deleted pods, rather than waiting for periodic garbage collection and batching resulting pod API deletion requests.
```
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.8" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

openshift-merge-robot added a commit to openshift/origin that referenced this pull request Oct 18, 2017
Automatic merge from submit-queue (batch tested with PRs 16888, 16911, 16913, 16904).

UPSTREAM: 53233: Remove containers from deleted pods once containers have exited

xref kubernetes/kubernetes#53233

Note: I had to edit `podId` to `podID` to make it apply cleanly.
@dashpole dashpole deleted the kubelet_gc_faster branch October 25, 2017 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants