Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add container garbage collection. #2022

Merged
merged 1 commit into from
Oct 28, 2014
Merged

Conversation

brendandburns
Copy link
Contributor

Work in progress, will add unit tests this evening, assuming this looks ok.

@derekwaynecarr
Copy link
Member

Will this only remove images that were previously pulled by the kubelet? For example, if I have other docker processes running on the minion-host, this will not look to delete those images, correct? Making sure I understand the code when I read through it.

@brendandburns
Copy link
Contributor Author

This doesn't remove images at all. It only removes the write layer
associated with a specific container run.

Brendan
On Oct 27, 2014 5:40 PM, "Derek Carr" notifications@github.com wrote:

Will this only remove images that were previously pulled by the kubelet?
For example, if I have other docker processes running on the minion-host,
this will not look to delete those images, correct? Making sure I
understand the code when I read through it.


Reply to this email directly or view it on GitHub
#2022 (comment)
.

@brendandburns
Copy link
Contributor Author

Also, it does filter to only those containers that are run by the kubelet.

Thanks
Brendan
On Oct 27, 2014 7:03 PM, "Brendan Burns" bburns@google.com wrote:

This doesn't remove images at all. It only removes the write layer
associated with a specific container run.

Brendan
On Oct 27, 2014 5:40 PM, "Derek Carr" notifications@github.com wrote:

Will this only remove images that were previously pulled by the kubelet?
For example, if I have other docker processes running on the minion-host,
this will not look to delete those images, correct? Making sure I
understand the code when I read through it.


Reply to this email directly or view it on GitHub
#2022 (comment)
.

@@ -186,6 +186,15 @@ func main() {
*registryBurst)

go func() {
util.Forever(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the anonymous fund around util.Forever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

util.Forever is blocking so that puts it in a go-routine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant go util.Forever(...)

----- Original Message -----

@@ -186,6 +186,15 @@ func main() {
*registryBurst)

go func() {
  •   util.Forever(func() {
    

util.Forever is blocking.


Reply to this email directly or view it on GitHub:
https://github.com/GoogleCloudPlatform/kubernetes/pull/2022/files#r19452991

@brendandburns
Copy link
Contributor Author

Comments addressed. Please re-check.

Thanks!
--brendan

@brendandburns
Copy link
Contributor Author

Unit tests added. I also tested this empirically on my own cluster. I believe this is ready to merge.

Thanks!
-brendan

@brendandburns
Copy link
Contributor Author

Closes #157

@brendandburns brendandburns changed the title WIP: Add container garbage collection. Add container garbage collection. Oct 28, 2014
}
}
sort.Sort(ByCreated(dockerData))
if len(dockerData) <= 5 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move 5 to a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@brendandburns
Copy link
Contributor Author

Comments addressed. ptal.

Thanks!
--brendan

@@ -67,6 +67,7 @@ var (
registryBurst = flag.Int("registry_burst", 10, "Maximum size of a bursty pulls, temporarily allows pulls to burst to this number, while still not exceeding registry_qps. Only used if --registry_qps > 0")
runonce = flag.Bool("runonce", false, "If true, exit after spawning pods from local manifests or remote urls. Exclusive with --etcd_servers and --enable-server")
enableDebuggingHandlers = flag.Bool("enable_debugging_handlers", true, "Enables server endpoints for log collection and local running of containers and commands")
minimumGCAge = flag.Duration("minimum_container_gc_age", 0, "Minimum age for a finished container before it is garbage collected.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"minimum age before removal" is a complicated way to say "ttl", whereas ttl is a well-understood concept. Any reason NOT to just say "container_ttl" or "container_attempt_ttl" ? I think "attempt" is an important concept here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also comment units (seconds?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like TTL because TTL implies that we will delete it when it's that old, whereas here we just guarantee that we won't delete it if it is younger than this.

Flag is a Duration object, so any units are accepted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point on implied semantics. container_attempt_min_ttl?

How does a user specify duration as a flag? "1s" ? What if they just say "1" - what does it mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dchen1107
Copy link
Member

LGTM in general. Two small things:

  1. This affects the restart count correctness reported as part to ContainerStatus since there is no checkpoint for such data. Please add a comment to API so that no user would rely on that number.
  2. Should we also add a max cap for total dead containers? I did see a case once due to a bug in upper layers before, the same podSpec, but different UUID being bound to and destroy the node. I am ok to deal with this in later PR, and with a TODO today.

@@ -39,6 +40,7 @@ import (
)

const defaultChanSize = 1024
const maxContainerCount = 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is TTL a flag and this is not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flagged.

@thockin
Copy link
Member

thockin commented Oct 28, 2014

Adding a max failures or max-failure/time setting is a good idea, but as a different PR.

If we can't accurately count restarts, we should not offer a wrong count.

My biggest concern is documenting it well enough that someone could understand it.

}
dockerData = dockerData[maxContainerCount:]
for _, data := range dockerData {
if err := kl.dockerClient.RemoveContainer(docker.RemoveContainerOptions{ID: data.ID}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? The error is going to get passed upstream, we log there.

@dchen1107
Copy link
Member

@thockin I guess your last comments are related to my earlier comment.

If we can't accurately count restarts, we should not offer a wrong count.
Initially restart count was introduced is that we want to allow the user specify max allowed failure for containerSpec. But later we remove that feature.

Even we couldn't provide accurate restart count, but some rough number provide an indication for the user the failure is one-time, or crash-loop. For usability on debugging purpose, I think it is quite important. We could change restart_count field from int to string later, and report something like "restarted more than N times". But if we do so, we really need to embed the gc logic to SyncPods.

@@ -67,6 +67,8 @@ var (
registryBurst = flag.Int("registry_burst", 10, "Maximum size of a bursty pulls, temporarily allows pulls to burst to this number, while still not exceeding registry_qps. Only used if --registry_qps > 0")
runonce = flag.Bool("runonce", false, "If true, exit after spawning pods from local manifests or remote urls. Exclusive with --etcd_servers and --enable-server")
enableDebuggingHandlers = flag.Bool("enable_debugging_handlers", true, "Enables server endpoints for log collection and local running of containers and commands")
minimumGCAge = flag.Duration("minimum_container_ttl_duration", 0, "Minimum age for a finished container before it is garbage collected. Examples: '300ms', '10s' or '2h45m'")
maxContainerCount = flag.Int("maximum_dead_containers_per_pod", 5, "Maximum number of old containers to retain per pod. Each container takes up some disk space. Default: 5.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a pod has more than 5 containers, and all of them are restarted, some of them might end up without any dead container left behind for debugging? Is that what we want? Also restart count is specified for per-container, and this max is at pod base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Switched to do uuid + container name, and renamed this flag.

@dchen1107
Copy link
Member

LGTM. Once Travis is green, I will merge it.

@brendandburns
Copy link
Contributor Author

This is now green (earlier break was due to a break @ head)

dchen1107 added a commit that referenced this pull request Oct 28, 2014
Add container garbage collection.
@dchen1107 dchen1107 merged commit f6db096 into kubernetes:master Oct 28, 2014
RestartCount int `json:"restartCount" yaml:"restartCount"`
State ContainerState `json:"state,omitempty" yaml:"state,omitempty"`
// Note that this is calculated from dead containers. But those containers are subject to
// garbage collection. This value will get capped at 5 by GC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/5/a flag-configured limit/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants