Too many events created in a time window can result in unbounded memory growth #5694

Open
smarterclayton opened this Issue Nov 4, 2015 · 5 comments

Projects

None yet

5 participants

@smarterclayton
Member

Today, there is no cap on incoming events to the cluster. Events are stored in etcd with a TTL and must remain in memory. It is possible for a a cluster to generate more events than expire in a given time window and thus grow memory proportional to the rate. Since the failure mode for many components may be a hot loop, we have to pick a window of time (TTL) and the amount of memory we can afford and rate limit within that.

There is a bug in Kubernetes that similar events are not deduplicated on the server. However, even with this bug fixed, it's possible for large amounts of non-similar events to grow the cluster

@smarterclayton
Member

One reproduction scenario involved events that look like: "Deployment config \\\"xxxx-nginx-rails\\\" blocked by multiple errors:\\n\\n\\t* \\n\\t* \\n\\t*"

@smarterclayton smarterclayton added this to the 1.1.0 milestone Nov 4, 2015
@smarterclayton
Member

@ironcladlou @derekwaynecarr let's use this as the tracker for the unbounded growth problem (distinct from similar event compression). Can you link the compression issue here?

@ncdc ncdc was assigned by danmcp Nov 4, 2015
@ncdc ncdc assigned derekwaynecarr and unassigned ncdc Nov 4, 2015
@derekwaynecarr
Member

client side event correlation: kubernetes/kubernetes#16798

@smarterclayton smarterclayton modified the milestone: 1.1.x, 1.1.0 Nov 4, 2015
@smarterclayton
Member

Need a final decision about whether we can fix this via simply reducing the TTL by default or documenting.

@smarterclayton
Member

Documenting as a release note here: openshift/openshift-docs#1131 (comment)

Better fix will be in 1.1.1

@smarterclayton smarterclayton modified the milestone: 1.1.x, 1.1.0 Nov 4, 2015
@danmcp danmcp added priority/P2 and removed priority/P1 labels Nov 5, 2015
@smarterclayton smarterclayton modified the milestone: 1.1.x, 1.2.x Feb 20, 2016
@smarterclayton smarterclayton modified the milestone: 1.2.x, 1.3.0 Jul 12, 2016
@liggitt liggitt modified the milestone: 1.3.0, 1.4.0 Sep 1, 2016
@smarterclayton smarterclayton modified the milestone: 1.4.0, 1.5.0 Jan 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment