-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation for Event Compression #4372
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# Kubernetes Event Compression | ||
|
||
This document captures the design of event compression. | ||
|
||
|
||
## Background | ||
|
||
Kubernetes components can get into a state where they generate tons of events which are identical except for the timestamp. For example, when pulling a non-existing image, Kubelet will repeatedly generate ```image_not_existing``` and ```container_is_waiting``` events until upstream components correct the image. When this happens, the spam from the repeated events makes the entire event mechanism useless. It also appears to cause memory pressure in etcd (see [#3853](https://github.com/GoogleCloudPlatform/kubernetes/issues/3853)). | ||
|
||
## Proposal | ||
Each binary that generates events (for example, ```kubelet```) should keep track of previously generated events so that it can collapse recurring events into a single event instead of creating a new instance for each new event. | ||
|
||
Event compression should be best effort (not guaranteed). Meaning, in the worst case, ```n``` identical (minus timestamp) events may still result in ```n``` event entries. | ||
|
||
## Design | ||
Instead of a single Timestamp, each event object [contains](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/api/types.go#L1111) the following fields: | ||
* ```FirstTimestamp util.Time``` | ||
* The date/time of the first occurrence of the event. | ||
* ```LastTimestamp util.Time``` | ||
* The date/time of the most recent occurrence of the event. | ||
* On first occurrence, this is equal to the FirstTimestamp. | ||
* ```Count int``` | ||
* The number of occurrences of this event between FirstTimestamp and LastTimestamp | ||
* On first occurrence, this is 1. | ||
|
||
Each binary that generates events will: | ||
* Maintain a new global hash table to keep track of previously generated events (see ```pkg/client/record/events_cache.go```). | ||
* The code that “records/writes” events (see ```StartRecording``` in ```pkg/client/record/event.go```), uses the global hash table to check if any new event has been seen previously. | ||
* The key for the hash table is generated from the event object minus timestamps/count/transient fields (see ```pkg/client/record/events_cache.go```), specifically the following events fields are used to construct a unique key for an event: | ||
* ```event.Source.Component``` | ||
* ```event.Source.Host``` | ||
* ```event.InvolvedObject.Kind``` | ||
* ```event.InvolvedObject.Namespace``` | ||
* ```event.InvolvedObject.Name``` | ||
* ```event.InvolvedObject.UID``` | ||
* ```event.InvolvedObject.APIVersion``` | ||
* ```event.Reason``` | ||
* ```event.Message``` | ||
* If the key for a new event matches the key for a previously generated events (meaning all of the above fields match between the new event and some previously generated event), then the event is considered to be a duplicate: | ||
* Instead of the usual POST/create event API, the new PUT (update) event API is called to update the existing event entry in etcd with the new last seen timestamp and count. | ||
* The event is also updated in the global hash table with an incremented count, updated last seen timestamp, name, and new resource version (all required to issue a future event update). | ||
* If the key for a new event does not match the key for any previously generated event (meaning none of the above fields match between the new event and any previously generated events), then the event is considered to be new/unique: | ||
* The usual POST/create event API is called to create a new event entry in etcd. | ||
* An entry for the event is also added to the global hash table. | ||
|
||
## Issues/Risks | ||
* Hash table clean up | ||
* If the component (e.g. kubelet) runs for a long period of time and generates a ton of unique events, the hash table could grow very large in memory. | ||
* *Future consideration:* remove entries from the hash table that are older than some specified time. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be handled now, not in the future. |
||
* Event history is not preserved across application restarts | ||
* Each component keeps track of event history in memory, a restart causes event history to be cleared. | ||
* That means that compression will not occur across component restarts. | ||
* Similarly, if in the future events are aged out of the hash table, then events will only be compressed until they age out of the hash table, at which point any new instance of the event will cause a new entry to be created in etcd. | ||
|
||
## Example | ||
Sample kubectl output | ||
``` | ||
FIRSTTIME LASTTIME COUNT NAME KIND SUBOBJECT REASON SOURCE MESSAGE | ||
Thu, 12 Feb 2015 01:13:02 +0000 Thu, 12 Feb 2015 01:13:02 +0000 1 kubernetes-minion-4.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-4.c.saad-dev-vms.internal} Starting kubelet. | ||
Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-1.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-1.c.saad-dev-vms.internal} Starting kubelet. | ||
Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-3.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-3.c.saad-dev-vms.internal} Starting kubelet. | ||
Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-2.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-2.c.saad-dev-vms.internal} Starting kubelet. | ||
Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 monitoring-influx-grafana-controller-0133o Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods | ||
Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 elasticsearch-logging-controller-fplln Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods | ||
Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 kibana-logging-controller-gziey Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods | ||
Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 skydns-ls6k1 Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods | ||
Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 monitoring-heapster-controller-oh43e Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods | ||
Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey BoundPod implicitly required container POD pulled {kubelet kubernetes-minion-4.c.saad-dev-vms.internal} Successfully pulled image "kubernetes/pause:latest" | ||
Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey Pod scheduled {scheduler } Successfully assigned kibana-logging-controller-gziey to kubernetes-minion-4.c.saad-dev-vms.internal | ||
|
||
``` | ||
|
||
This demonstrates what would have been 20 separate entries (indicating scheduling failure) collapsed/compressed down to 5 entries. | ||
|
||
## Related Pull Requests/Issues | ||
* Issue [#4073](https://github.com/GoogleCloudPlatform/kubernetes/issues/4073): Compress duplicate events | ||
* PR [#4157](https://github.com/GoogleCloudPlatform/kubernetes/issues/4157): Add "Update Event" to Kubernetes API | ||
* PR [#4206](https://github.com/GoogleCloudPlatform/kubernetes/issues/4206): Modify Event struct to allow compressing multiple recurring events in to a single event | ||
* PR [#4306](https://github.com/GoogleCloudPlatform/kubernetes/issues/4073): Compress recurring events in to a single event to optimize etcd storage |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are not careful, people may put ever-changing vaules into the strings of Reason and/or Message.
We can wait to see if this is actually a problem, but experience suggests it will.
Some options then are:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is not hypothetical. I think it can easily happen today.
If a pod is repeatedly terminating and being restarted by an RC, and you have a lot of nodes, then you are going to see this Message:
Successfully assigned kibana-logging-controller-$FOO to kubernetes-minion-$BAR.c.saad-dev-vms.internal
a bunch of times with different values of $FOO and $BAR.
So, this needs attention sooner rather than later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have lots more to say about this topic, which might be better said tomorrow in the office, over a tasty beverage.