Add container garbage collection. #2022

brendandburns · 2014-10-28T00:31:00Z

Work in progress, will add unit tests this evening, assuming this looks ok.

derekwaynecarr · 2014-10-28T00:40:03Z

Will this only remove images that were previously pulled by the kubelet? For example, if I have other docker processes running on the minion-host, this will not look to delete those images, correct? Making sure I understand the code when I read through it.

brendandburns · 2014-10-28T02:03:09Z

This doesn't remove images at all. It only removes the write layer
associated with a specific container run.

Brendan
On Oct 27, 2014 5:40 PM, "Derek Carr" notifications@github.com wrote:

Will this only remove images that were previously pulled by the kubelet?
For example, if I have other docker processes running on the minion-host,
this will not look to delete those images, correct? Making sure I
understand the code when I read through it.

—
Reply to this email directly or view it on GitHub
#2022 (comment)
.

brendandburns · 2014-10-28T02:04:20Z

Also, it does filter to only those containers that are run by the kubelet.

Thanks
Brendan
On Oct 27, 2014 7:03 PM, "Brendan Burns" bburns@google.com wrote:

This doesn't remove images at all. It only removes the write layer
associated with a specific container run.

Brendan
On Oct 27, 2014 5:40 PM, "Derek Carr" notifications@github.com wrote:

Will this only remove images that were previously pulled by the kubelet?
For example, if I have other docker processes running on the minion-host,
this will not look to delete those images, correct? Making sure I
understand the code when I read through it.

—
Reply to this email directly or view it on GitHub
#2022 (comment)
.

smarterclayton · 2014-10-28T03:51:45Z

cmd/kubelet/kubelet.go

@@ -186,6 +186,15 @@ func main() {
 		*registryBurst)

 	go func() {
+		util.Forever(func() {


Why the anonymous fund around util.Forever?

util.Forever is blocking so that puts it in a go-routine.

I meant go util.Forever(...)

----- Original Message -----

@@ -186,6 +186,15 @@ func main() {
*registryBurst)

go func() {

util.Forever(func() {

util.Forever is blocking.

Reply to this email directly or view it on GitHub:
https://github.com/GoogleCloudPlatform/kubernetes/pull/2022/files#r19452991

brendandburns · 2014-10-28T04:43:33Z

Comments addressed. Please re-check.

Thanks!
--brendan

brendandburns · 2014-10-28T05:00:55Z

Unit tests added. I also tested this empirically on my own cluster. I believe this is ready to merge.

Thanks!
-brendan

brendandburns · 2014-10-28T05:01:05Z

Closes #157

proppy · 2014-10-28T16:53:27Z

pkg/kubelet/kubelet.go

+		}
+	}
+	sort.Sort(ByCreated(dockerData))
+	if len(dockerData) <= 5 {


move 5 to a constant?

brendandburns · 2014-10-28T17:04:50Z

Comments addressed. ptal.

Thanks!
--brendan

thockin · 2014-10-28T17:13:20Z

cmd/kubelet/kubelet.go

@@ -67,6 +67,7 @@ var (
 	registryBurst           = flag.Int("registry_burst", 10, "Maximum size of a bursty pulls, temporarily allows pulls to burst to this number, while still not exceeding registry_qps.  Only used if --registry_qps > 0")
 	runonce                 = flag.Bool("runonce", false, "If true, exit after spawning pods from local manifests or remote urls. Exclusive with --etcd_servers and --enable-server")
 	enableDebuggingHandlers = flag.Bool("enable_debugging_handlers", true, "Enables server endpoints for log collection and local running of containers and commands")
+	minimumGCAge            = flag.Duration("minimum_container_gc_age", 0, "Minimum age for a finished container before it is garbage collected.")


"minimum age before removal" is a complicated way to say "ttl", whereas ttl is a well-understood concept. Any reason NOT to just say "container_ttl" or "container_attempt_ttl" ? I think "attempt" is an important concept here.

Also comment units (seconds?)

I don't like TTL because TTL implies that we will delete it when it's that old, whereas here we just guarantee that we won't delete it if it is younger than this.

Flag is a Duration object, so any units are accepted.

Fair point on implied semantics. container_attempt_min_ttl?

How does a user specify duration as a flag? "1s" ? What if they just say "1" - what does it mean?

http://golang.org/pkg/time/#ParseDuration

renamed.

dchen1107 · 2014-10-28T17:24:42Z

LGTM in general. Two small things:

This affects the restart count correctness reported as part to ContainerStatus since there is no checkpoint for such data. Please add a comment to API so that no user would rely on that number.
Should we also add a max cap for total dead containers? I did see a case once due to a bug in upper layers before, the same podSpec, but different UUID being bound to and destroy the node. I am ok to deal with this in later PR, and with a TODO today.

thockin · 2014-10-28T17:26:30Z

pkg/kubelet/kubelet.go

@@ -39,6 +40,7 @@ import (
 )

 const defaultChanSize = 1024
+const maxContainerCount = 5


Why is TTL a flag and this is not?

thockin · 2014-10-28T17:35:38Z

Adding a max failures or max-failure/time setting is a good idea, but as a different PR.

If we can't accurately count restarts, we should not offer a wrong count.

My biggest concern is documenting it well enough that someone could understand it.

thockin · 2014-10-28T17:36:00Z

pkg/kubelet/kubelet.go

+	}
+	dockerData = dockerData[maxContainerCount:]
+	for _, data := range dockerData {
+		if err := kl.dockerClient.RemoveContainer(docker.RemoveContainerOptions{ID: data.ID}); err != nil {


Why? The error is going to get passed upstream, we log there.

dchen1107 · 2014-10-28T17:52:54Z

@thockin I guess your last comments are related to my earlier comment.

If we can't accurately count restarts, we should not offer a wrong count.
Initially restart count was introduced is that we want to allow the user specify max allowed failure for containerSpec. But later we remove that feature.

Even we couldn't provide accurate restart count, but some rough number provide an indication for the user the failure is one-time, or crash-loop. For usability on debugging purpose, I think it is quite important. We could change restart_count field from int to string later, and report something like "restarted more than N times". But if we do so, we really need to embed the gc logic to SyncPods.

dchen1107 · 2014-10-28T18:26:13Z

cmd/kubelet/kubelet.go

@@ -67,6 +67,8 @@ var (
 	registryBurst           = flag.Int("registry_burst", 10, "Maximum size of a bursty pulls, temporarily allows pulls to burst to this number, while still not exceeding registry_qps.  Only used if --registry_qps > 0")
 	runonce                 = flag.Bool("runonce", false, "If true, exit after spawning pods from local manifests or remote urls. Exclusive with --etcd_servers and --enable-server")
 	enableDebuggingHandlers = flag.Bool("enable_debugging_handlers", true, "Enables server endpoints for log collection and local running of containers and commands")
+	minimumGCAge            = flag.Duration("minimum_container_ttl_duration", 0, "Minimum age for a finished container before it is garbage collected.  Examples: '300ms', '10s' or '2h45m'")
+	maxContainerCount       = flag.Int("maximum_dead_containers_per_pod", 5, "Maximum number of old containers to retain per pod.  Each container takes up some disk space.  Default: 5.")


If a pod has more than 5 containers, and all of them are restarted, some of them might end up without any dead container left behind for debugging? Is that what we want? Also restart count is specified for per-container, and this max is at pod base.

Good catch! Switched to do uuid + container name, and renamed this flag.

dchen1107 · 2014-10-28T18:57:45Z

LGTM. Once Travis is green, I will merge it.

brendandburns · 2014-10-28T20:33:40Z

This is now green (earlier break was due to a break @ head)

Add container garbage collection.

thockin · 2014-10-29T03:55:31Z

pkg/api/types.go

-	RestartCount int            `json:"restartCount" yaml:"restartCount"`
+	State ContainerState `json:"state,omitempty" yaml:"state,omitempty"`
+	// Note that this is calculated from dead containers.  But those containers are subject to
+	// garbage collection.  This value will get capped at 5 by GC.


nit: s/5/a flag-configured limit/

smarterclayton reviewed Oct 28, 2014
View reviewed changes

brendandburns force-pushed the gc branch 4 times, most recently from 1225f62 to f7b1a4e Compare October 28, 2014 04:42

brendandburns force-pushed the gc branch from f7b1a4e to 3dbaac5 Compare October 28, 2014 04:47

brendandburns changed the title ~~WIP: Add container garbage collection.~~ Add container garbage collection. Oct 28, 2014

bgrant0607 assigned dchen1107 Oct 28, 2014

proppy reviewed Oct 28, 2014
View reviewed changes

brendandburns force-pushed the gc branch from 3dbaac5 to 067b043 Compare October 28, 2014 17:03

thockin reviewed Oct 28, 2014
View reviewed changes

brendandburns force-pushed the gc branch from 067b043 to 33c5f0c Compare October 28, 2014 17:32

thockin reviewed Oct 28, 2014
View reviewed changes

brendandburns force-pushed the gc branch from 33c5f0c to 6f1ce71 Compare October 28, 2014 18:14

brendandburns force-pushed the gc branch from 6f1ce71 to 04897bf Compare October 28, 2014 18:16

dchen1107 reviewed Oct 28, 2014
View reviewed changes

brendandburns force-pushed the gc branch 2 times, most recently from e68dc1a to 7321ed8 Compare October 28, 2014 18:48

Add container garbage collection.

51bf451

brendandburns force-pushed the gc branch from 7321ed8 to 51bf451 Compare October 28, 2014 19:58

dchen1107 added a commit that referenced this pull request Oct 28, 2014

Merge pull request #2022 from brendandburns/gc

f6db096

Add container garbage collection.

dchen1107 merged commit f6db096 into kubernetes:master Oct 28, 2014

thockin reviewed Oct 29, 2014
View reviewed changes

ryanleary mentioned this pull request Dec 8, 2014

GC old dead containers and unused images #157

Closed

brendandburns deleted the gc branch August 7, 2015 04:41

DaoWen mentioned this pull request Dec 5, 2018

Create a fresh working dir for each k8s waiter-app container run twosigma/waiter#525

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add container garbage collection. #2022

Add container garbage collection. #2022

brendandburns commented Oct 28, 2014

derekwaynecarr commented Oct 28, 2014

brendandburns commented Oct 28, 2014

brendandburns commented Oct 28, 2014

smarterclayton Oct 28, 2014

brendandburns Oct 28, 2014

smarterclayton Oct 28, 2014

brendandburns commented Oct 28, 2014

brendandburns commented Oct 28, 2014

brendandburns commented Oct 28, 2014

proppy Oct 28, 2014

brendandburns Oct 28, 2014

brendandburns commented Oct 28, 2014

thockin Oct 28, 2014

thockin Oct 28, 2014

brendandburns Oct 28, 2014

thockin Oct 28, 2014

brendandburns Oct 28, 2014

dchen1107 commented Oct 28, 2014

thockin Oct 28, 2014

brendandburns Oct 28, 2014

thockin commented Oct 28, 2014

thockin Oct 28, 2014

brendandburns Oct 28, 2014

dchen1107 commented Oct 28, 2014

dchen1107 Oct 28, 2014

brendandburns Oct 28, 2014

dchen1107 commented Oct 28, 2014

brendandburns commented Oct 28, 2014

thockin Oct 29, 2014

Add container garbage collection. #2022

Add container garbage collection. #2022

Conversation

brendandburns commented Oct 28, 2014

derekwaynecarr commented Oct 28, 2014

brendandburns commented Oct 28, 2014

brendandburns commented Oct 28, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brendandburns commented Oct 28, 2014

brendandburns commented Oct 28, 2014

brendandburns commented Oct 28, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brendandburns commented Oct 28, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dchen1107 commented Oct 28, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin commented Oct 28, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dchen1107 commented Oct 28, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dchen1107 commented Oct 28, 2014

brendandburns commented Oct 28, 2014

Choose a reason for hiding this comment