New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet: parallelize cleaning up containers in unwanted pods #7048

Merged
merged 1 commit into from Apr 20, 2015

Conversation

Projects
None yet
4 participants
@yujuhong
Contributor

yujuhong commented Apr 20, 2015

Kubelet kills unwanted pods in SyncPods, which directly impact the latency of a
sync iteration. This change parallelizes the cleanup to lessen the effect.

Eventually, we should leverage per-pod workers for cleanup, with the exception
of truly orphaned pods.

@googlebot googlebot added the cla: yes label Apr 20, 2015

@vmarmol vmarmol self-assigned this Apr 20, 2015

@yujuhong

This comment has been minimized.

Show comment
Hide comment
@yujuhong

yujuhong Apr 20, 2015

Contributor

This addresses #6764

Contributor

yujuhong commented Apr 20, 2015

This addresses #6764

Show outdated Hide outdated pkg/kubelet/kubelet.go
wg.Add(1)
go func(container *kubecontainer.Container) {
defer util.HandleCrash()
// call the networking plugin for teardown

This comment has been minimized.

@vmarmol

vmarmol Apr 20, 2015

Contributor

Can we add a TODO to handle this without singling our the pod infra container? Main motivation is for Rocket integration which does not have one. We're working on moving the pod infra to be a Docker-only concept.

@vmarmol

vmarmol Apr 20, 2015

Contributor

Can we add a TODO to handle this without singling our the pod infra container? Main motivation is for Rocket integration which does not have one. We're working on moving the pod infra to be a Docker-only concept.

This comment has been minimized.

@yujuhong

yujuhong Apr 20, 2015

Contributor

Done

@yujuhong

yujuhong Apr 20, 2015

Contributor

Done

Show outdated Hide outdated pkg/kubelet/kubelet.go
wg.Add(1)
go func(container *kubecontainer.Container) {
defer util.HandleCrash()
// call the networking plugin for teardown
if container.Name == dockertools.PodInfraContainerName {
err := kl.networkPlugin.TearDownPod(pod.Namespace, pod.Name, dockertools.DockerID(container.ID))

This comment has been minimized.

@vmarmol

vmarmol Apr 20, 2015

Contributor

This change seems unrelated from the rest of this PR. Is it required by it or can we split it out into its own logical comit/PR?

@vmarmol

vmarmol Apr 20, 2015

Contributor

This change seems unrelated from the rest of this PR. Is it required by it or can we split it out into its own logical comit/PR?

This comment has been minimized.

@yujuhong

yujuhong Apr 20, 2015

Contributor

I don't think it's unrelated. It was used as part of the logic to kill pods in SyncPods. I moved it here since I was going to resue this function. I am not sure why this function didn't call TearDownPod initially though.

@yujuhong

yujuhong Apr 20, 2015

Contributor

I don't think it's unrelated. It was used as part of the logic to kill pods in SyncPods. I moved it here since I was going to resue this function. I am not sure why this function didn't call TearDownPod initially though.

This comment has been minimized.

@vmarmol

vmarmol Apr 20, 2015

Contributor

Ah I see what you mean.

@vmarmol

vmarmol Apr 20, 2015

Contributor

Ah I see what you mean.

Show outdated Hide outdated pkg/kubelet/kubelet.go
glog.Errorf("Network plugin pre-delete method returned an error: %v", err)
}
numWorkers++
go func(pod *kubecontainer.Pod, ch chan result) {

This comment has been minimized.

@vmarmol

vmarmol Apr 20, 2015

Contributor

Lets break this into a different method, SyncPods() is huge already.

@vmarmol

vmarmol Apr 20, 2015

Contributor

Lets break this into a different method, SyncPods() is huge already.

This comment has been minimized.

@yujuhong

yujuhong Apr 20, 2015

Contributor

I moved the whole block (including result aggregation) to a different function. This should improve the readability a little bit.

@yujuhong

yujuhong Apr 20, 2015

Contributor

I moved the whole block (including result aggregation) to a different function. This should improve the readability a little bit.

Show outdated Hide outdated pkg/kubelet/kubelet.go
running = append(running, m.containers...)
}
if len(errs) != 0 {
err := fmt.Errorf("%s", strings.Join(errs, ", "))

This comment has been minimized.

@vmarmol

vmarmol Apr 20, 2015

Contributor

Consider using errors.Aggregate so we have similar error messages:

https://github.com/yujuhong/kubernetes/blob/para_cleanup/pkg/util/errors/errors.go#L23

@vmarmol

vmarmol Apr 20, 2015

Contributor

Consider using errors.Aggregate so we have similar error messages:

https://github.com/yujuhong/kubernetes/blob/para_cleanup/pkg/util/errors/errors.go#L23

This comment has been minimized.

@yujuhong

yujuhong Apr 20, 2015

Contributor

Done.

@yujuhong

yujuhong Apr 20, 2015

Contributor

Done.

@vmarmol

This comment has been minimized.

Show comment
Hide comment
@vmarmol

vmarmol Apr 20, 2015

Contributor

Minor comments, looks good overall. Thanks @yujuhong!

Contributor

vmarmol commented Apr 20, 2015

Minor comments, looks good overall. Thanks @yujuhong!

for i, c := range pod.Containers {
containerIDs[i] = string(c.ID)
}
return kl.containerManager.GetRunningContainers(containerIDs)

This comment has been minimized.

@yifan-gu

yifan-gu Apr 20, 2015

Member

Thought not related to this PR, but I always have a question about this, I am not sure why the running containers are necessary for removing volumes. I suspect that if the volume is still being used by some container, TearDown() will just return an error (such as device busy).I think if we can remove this logic, it would be great.

As I can remember, this is part of a fix for volume leaks, but I am not sure the original problem is caused by deleting alive volumes..

Let me find out the issue number...

@yifan-gu

yifan-gu Apr 20, 2015

Member

Thought not related to this PR, but I always have a question about this, I am not sure why the running containers are necessary for removing volumes. I suspect that if the volume is still being used by some container, TearDown() will just return an error (such as device busy).I think if we can remove this logic, it would be great.

As I can remember, this is part of a fix for volume leaks, but I am not sure the original problem is caused by deleting alive volumes..

Let me find out the issue number...

This comment has been minimized.

This comment has been minimized.

@vmarmol

vmarmol Apr 20, 2015

Contributor

I believe that in theory we should be able to remove it and the volume plugins would handle it. However, it seems like we've had quite a few bugs in this area which lead to data loss. That is what has kept the current logic I believe.

@vmarmol

vmarmol Apr 20, 2015

Contributor

I believe that in theory we should be able to remove it and the volume plugins would handle it. However, it seems like we've had quite a few bugs in this area which lead to data loss. That is what has kept the current logic I believe.

This comment has been minimized.

@yifan-gu

yifan-gu Apr 20, 2015

Member

ack

@yifan-gu
@vmarmol

This comment has been minimized.

Show comment
Hide comment
@vmarmol

vmarmol Apr 20, 2015

Contributor

LGTM, we can squash and merge

Contributor

vmarmol commented Apr 20, 2015

LGTM, we can squash and merge

Kubelet: parallelize cleaning up containers in unwanted pods
Kubelet kills unwanted pods in SyncPods, which directly impact the latency of a
sync iteration. This change parallelizes the cleanup to lessen the effect.

Eventually, we should leverage per-pod workers for cleanup, with the exception
of truly orphaned pods.
@yujuhong

This comment has been minimized.

Show comment
Hide comment
@yujuhong

yujuhong Apr 20, 2015

Contributor

Squashed. Thanks!

Contributor

yujuhong commented Apr 20, 2015

Squashed. Thanks!

vmarmol added a commit that referenced this pull request Apr 20, 2015

Merge pull request #7048 from yujuhong/para_cleanup
Kubelet: parallelize cleaning up containers in unwanted pods

@vmarmol vmarmol merged commit d44e9b4 into kubernetes:master Apr 20, 2015

2 of 3 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
Shippable Shippable builds completed
Details
cla/google All necessary CLAs are signed

@yujuhong yujuhong deleted the yujuhong:para_cleanup branch May 8, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment