Today, there is no cap on incoming events to the cluster. Events are stored in etcd with a TTL and must remain in memory. It is possible for a a cluster to generate more events than expire in a given time window and thus grow memory proportional to the rate. Since the failure mode for many components may be a hot loop, we have to pick a window of time (TTL) and the amount of memory we can afford and rate limit within that.
There is a bug in Kubernetes that similar events are not deduplicated on the server. However, even with this bug fixed, it's possible for large amounts of non-similar events to grow the cluster
One reproduction scenario involved events that look like: "Deployment config \\\"xxxx-nginx-rails\\\" blocked by multiple errors:\\n\\n\\t* \\n\\t* \\n\\t*"
"Deployment config \\\"xxxx-nginx-rails\\\" blocked by multiple errors:\\n\\n\\t* \\n\\t* \\n\\t*"
@ironcladlou @derekwaynecarr let's use this as the tracker for the unbounded growth problem (distinct from similar event compression). Can you link the compression issue here?
client side event correlation: kubernetes/kubernetes#16798
Need a final decision about whether we can fix this via simply reducing the TTL by default or documenting.
Documenting as a release note here: openshift/openshift-docs#1131 (comment)
Better fix will be in 1.1.1