Dynamic Cardinality Enforcement #1692

logicalhan · 2020-04-16T03:43:42Z

This KEP intends to propose a strategy/feature so that we can introduce cardinality constraints for existing metric labels in such a way that we don't have to do it manually in the code base, so that we can decouple metric fixes from the Kubernetes release cycle.

/sig instrumentation

k8s-ci-robot · 2020-04-16T03:44:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: logicalhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-instrumentation/OWNERS~~ [logicalhan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

logicalhan · 2020-04-16T03:46:30Z

/cc @lilic @brancz

logicalhan · 2020-04-16T04:59:02Z

keps/sig-instrumentation/20200415-cardinality-enforcement.md

+This design allows us to optionally adopt @lilic's excellent idea about simplifying the interface for component owners, who can then opt to just specify a metric and label pair *without* having to specify a whitelist. Personally, I like that idea since it simplifies how a component owner can implement our cardinality enforcing helpers without having to necessary plumb through complicated maps. This would make it considerably easier to feed this data in through the command line since you could do something like this:
+
+```bash
+$ kube-apiserver --bind-metric-labels "some_metric=label_too_many_values"


I'm not terribly crazy about my wording on the flag here.

How about --supported-label-values or --accepted-label-values?

ehashman · 2020-04-16T15:36:44Z

keps/sig-instrumentation/20200415-cardinality-enforcement.md

+This design allows us to optionally adopt @lilic's excellent idea about simplifying the interface for component owners, who can then opt to just specify a metric and label pair *without* having to specify a whitelist. Personally, I like that idea since it simplifies how a component owner can implement our cardinality enforcing helpers without having to necessary plumb through complicated maps. This would make it considerably easier to feed this data in through the command line since you could do something like this:
+
+```bash
+$ kube-apiserver --bind-metric-labels "some_metric=label_too_many_values"


How about --supported-label-values or --accepted-label-values?

keps/sig-instrumentation/20200415-cardinality-enforcement.md

erain · 2020-04-16T16:39:34Z

/cc @erain

k8s-ci-robot · 2020-04-16T16:39:35Z

@erain: GitHub didn't allow me to request PR reviews from the following users: erain.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @erain

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dashpole · 2020-04-16T19:18:39Z

keps/sig-instrumentation/20200415-cardinality-enforcement.md

+
+TLDR; metrics with unbounded dimensions can cause memory issues in the components they instrument.
+
+The simple solution to this problem is to say "don't do that". We (SIG instrumentation) have already done in our instrumentation guidelines, which specifically states that ["one should know a comprehensive list of all possible values for a label at instrumentation time."](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/instrumentation.md#dimensionality--cardinality).


This definitely seems like more of a recommendation than a hard rule. Even in components we own (e.g. kube-state-metrics), there are labels with unbounded cardinality (e.g. pod or namespace). If we really wanted to prevent unbounded cardinality, we would do so at compile-time and do another migration. But I don't think that is actually what we want... We need to be able to use labels with unbounded cardinality in some places, but want an escape hatch for when we mess up.

Definitely the node/pod stuff is exceptional. But you guys are doing manual GC for metrics no? Manual cleanup of metrics doesn't really feel to me like the general use-case.

Ah, got it. So if we can definitively make the statement that all calls to metrics.NewGaugeVec (for example) should have labels with a specific set of values, should we start enforcing that all metrics have a whitelist at compile-time? It would obviously be more intrusive than whats proposed here, but wouldn't that actually allow us to get to 100% bounded cardinality metrics?

Yeah, but emphasis on the intrusive part. I don't know how realistically achievable that would be since it would mean auditing/updating every single metric in the codebase.

got it, makes sense.

FWIW kube-state-metrics is so special it doesn't even use the prometheus client anymore, but as already mentioned there are other exceptions to the rule like kubelet_node_name, which is practically bound to 1 series per node, but the value is free form in itself.

logicalhan · 2020-04-16T19:49:17Z

keps/sig-instrumentation/20200415-cardinality-enforcement.md

+This design allows us to optionally adopt @lilic's excellent idea about simplifying the interface for component owners, who can then opt to just specify a metric and label pair *without* having to specify a whitelist. Personally, I like that idea since it simplifies how a component owner can implement our cardinality enforcing helpers without having to necessary plumb through complicated maps. This would make it considerably easier to feed this data in through the command line since you could do something like this:
+


Placeholder comment for the thing that @brancz and @lilic brought up in the meeting today, re: special casing buckets.

Potentially we would want to treat buckets completely separately (as in a separate flag just for bucket configuration of histograms). @bboreham opened the original PR for apiserver request duration bucket reduction, maybe he has some input as well.

My biggest concern I think with all of this is, it's going to be super easy to have extremely customized kubernetes setups where our existing dashboards and alerting rule definitions are just not going to apply generally anymore. I'd like to make sure we emphasize that these flags are really only meant to be used as escape hatches, and we must always strive to truly fix the root of the issue.

I will make it clearer that this is intended to be an escape hatch. I can't really imagine many cases where someone would make such drastic changes to their own metrics though without realizing that they are making drastic changes to their own metrics (hence affecting their alerts and such).

That feels inherently a lot less dangerous since you can't get to that state without some intention from a cluster admin.

x13n · 2020-04-20T11:16:38Z

keps/sig-instrumentation/20200415-cardinality-enforcement.md

+
+```
+
+Since we already have an interception layer built into our Kubernetes monitoring stack (from the metrics stability effort), we can leverage existing wrappers to provide a global entrypoint for intercepting and enforcing rules for individual metrics.


Since we already have an interception layer, why not simply enforce the cardinality limit there directly?

If we could specify that a certain metric allows up to N unique label sets (or K values per label), an attempt to add anything beyond that would just use some predefined constant instead. This has the downside (compared to your proposal) of being less predictable, because you never know what will get filtered out. However, in practice if K/N are sufficiently high, this shouldn't be a concern (all frequently used label values will show up early, so will be reported). Additionally, the approach I'm suggesting will:

enforce that your metrics "just work" instead of putting cluster admins in a position where they have to react based on alerts, when something is already broken.

allow monitoring entities that are of high cardinality, but when interesting label values cannot be listed up front.

If the "not predictable enough" bit is problematic, we can also combine both ideas and treat whitelists as "always working" and enforce explicit cardinality limits only on values that were not whitelisted. This would slightly change the semantics of whitelisting - lack of a whitelist would be equivalent to an empty whitelist.

We considered this for the recent security vulnerability. The problem is exactly as you describe which is it is 'not predictable enough'. We basically lose deterministic metric output if we adopt this (not to mention metric fidelity). While the remaining metrics will 'work' in the sense that they will be bound to N number of dimensions and will thus not cause memory leaks, they will not be reliable, since the N number of dimensions you end up getting will be determined at runtime.

The alternative you mention is problematic from a practical perspective (it is an alternative way of implementing the thing that @dashpole mentioned in an earlier comment). We'd effectively be blanking out all label values for every metric in the Kubernetes codebase until people explicitly laid out a whitelist. That means every single metric would have to be audited, which is quite invasive.

Alternatively, we could use a ratcheting approach, where we enforce that all new metrics must have a whitelist explicitly specified (this could potentially be a requirement of a metric being promoted to the STABLE class).

You can think of my proposal as an extension to yours, such that instead of getting label_too_many_values right away, this would still work until certain label cardinality limit is reached. Whitelisting would guarantee a value will not be dropped, but other values wouldn't be dropped either unless there is too many of them. Cluster admin can configure the per metric and per label limits once and can get alerted on "some metric labels are dropped" instead of "your metrics storage is getting blown up".

I think the idea of specifying a cardinality limit instead of explicit whitelist has a lot of benefits. It allows us to potentially turn any memory leak, which is a DOS security vulnerability, into a degradation of metrics, which is easy to monitor.

What if we adopted a form of both approaches? We could have a global per-label cardinality limit e.g. 1000 values. IMO it shouldn't even be configurable. It should be high enough that we are confident such a label is a "bug". This would bound the number of metric streams at compile time since metrics, labels, and now label values are all bounded. However, we should have some way for operators to "save" their metrics when labels start being messed up by the cardinality limit. This is where this proposal could help. It doesn't allow exceeding the cardinality limit; it just allows specifying which labels are kept.

Having a cardinality limit plus a whitelist is essentially what I'm suggesting. I think we should have both per-label and per-metric though. 1000 label values on every label works fine when there is a single problematic label, but without per-metric limits, any metric having multiple labels can still eat a ton of memory.

This would be slightly more complicated to implement correctly than purely per-label limiting approach, but would give us a realistic hard memory limit for every metric. If I have 10 labels, each of them allowing up to 1000 values, the number of possible label combinations is bound, but we can eat entire memory on any machine anyway.

@x13n's comment is actually why I don't want to have the limit. Something like 1000 series limit isn't really going to help since it's the multiplicative effect of the label values which cause cardinality issues.

Also, I am personally wary of setting a limit for something like request_total or request_duration since they are disproportionately large metrics relevant to the norm. We have like 50 buckets on durations, so I suspect we hit 1k timeseries pretty easily/often.

Regardless, it's a slightly orthogonal dimension to the KEP. It sounds like people are actually on board with the label whitelist approach, doing so will not prevent us from also doing stuff like introducing label limits in the future (it's not something I am completely convinced of at the moment).

yep. I was thinking that as well. We can definitely consider each (label whitelist or cardinality limit) separately, and i'm in favor of the current proposed whitelist mechanism.

SGTM, limiting the cardinality can be done (or not) in a separate proposal, to switch from reacting to preventing issues. Whitelisting itself is indeed already giving cluster admins a way of manually mitigating large metric issues on selected metrics.

lilic

So tl;dr if I am understanding it all completely:

we would implement helper to enforce cardinality
component owners would enforce cardinality by default for certain metrics
flag would allow users to disable that enforcement for passed in allow-list of labels?

logicalhan · 2020-04-30T15:27:26Z

we would implement helper to enforce cardinality

Yes

component owners would enforce cardinality by default for certain metrics

This one is TBD, but it has been suggested. I think there is a reasonable argument for this as a stability requirement (excluding ad hoc custom collectors for reasons mentioned in comments above).

flag would allow users to disable that enforcement for passed in allow-list of labels?

Oh, actually, the flag would enable one to specify a whitelist of allowed labels. By default, alpha metrics would not have any such restrictions.

dashpole · 2020-05-14T19:37:28Z

keps/sig-instrumentation/20200415-cardinality-enforcement.md

+
+- @dashpole
+
+> Should have labels with a specific set of values, should we start enforcing that all metrics have a whitelist at compile-time?


This can be removed, or merged with @x13n's comment below. As discussed above can't enforce that all labels have a whitelist, as some labels are determined at runtime (e.g. node name).

lilic · 2020-05-19T14:28:22Z

As discussed with @logicalhan I reworded a few things to help get this merged in by todays KEP deadline. @kubernetes/sig-instrumentation-feature-requests please take a look, thanks!

The main changes:

wording
clarified that counter type metric will not be effected by invalidity but gauge will be
removed open questions as they seem to have been solved in their respective comment threads.

…fy a few points Also removes open questions as those were solved in the discussion.

brancz

I'm happy with the mechanism in general. There are a few clarifications and fixes then I'd be happy to merge this.

brancz · 2020-05-19T16:12:40Z

keps/sig-instrumentation/20200415-cardinality-enforcement.md

+
+We will expose the machinery and tools to bind a metric's labels to a discrete set of values.
+
+It is *not a goal* to implement and plumb this solution for each Kubernetes component (there are many SIGs and a number of verticals, which may have their own preferred way of doing things). As such it will be up to component owners to leverage this functionality that we provide, by feeding configuration data through whatever mechanism deemed appropriate (i.e. command line flags or reading from a file).


We should still track which components have adopted the mechanism and advocate for it no? If we build the mechanism and nobody ends up using it, then that kind of defeats the purpose no?

Yes, we should track it. I am personally inclined to implement this for apimachinery, so it will be used.

that works for me

keps/sig-instrumentation/20200415-cardinality-enforcement.md

brancz · 2020-05-19T17:32:29Z

/lgtm
/hold

Thanks a lot @lilic and @logicalhan for pushing this. This lgtm now.

Giving at least one other instrumentation person a chance to review though. Feel free to remove hold when that's done.

dashpole · 2020-05-19T18:26:08Z

keps/sig-instrumentation/20200415-cardinality-enforcement.md

+some_metric{label_too_many_values="1"} 1
+some_metric{label_too_many_values="2"} 1
+some_metric{label_too_many_values="3"} 1
+some_metric{label_too_many_values="unexpected"} 1000000


nit: Should we blacklist this label value across the code base? Or are we ok with collisions with the unexpected label.

You mean across metrics? My original intent was to target whitelists for a metric label (so the conjunction of 'metric' and 'label_too_many_values'.

I mean, should we disallow calling WithLabelValues("unexpected")? It doesn't return an error today, so i'm not entirely sure how we would do it. Maybe we should log a warning when someone does that?

Won't the default behavior just be the right one?

The default behavior would be to lump together label values that are the string literal "unexpected" with label values that were filtered out with the whitelist. Given that "unexpected" is now a special cased label, i'm suggesting we could/should discourage its use as a normal label value.

You'd have to explicitly whitelist a label value called 'unexpected' for that to happen.

Agreed. It probably wouldn't ever happen. Someone would have to use the "unexpected" value in a metric label, and then want to whitelist values for that label, including the literal "unexpected"...

dashpole · 2020-05-19T18:27:10Z

My comment above is not blocking. This lgtm. Feel free to remove the hold when you are satisfied.

dashpole · 2020-05-19T21:16:44Z

/hold cancel

dashpole · 2020-05-19T21:19:43Z

/lgtm

ehashman · 2020-05-19T22:02:22Z

/lgtm

(for completeness)

logicalhan added 2 commits April 15, 2020 11:43

initialize kep for metrics cardinality

c3cab79

add verbage

7a4e303

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Apr 16, 2020

k8s-ci-robot requested review from ehashman and mrbobbytables April 16, 2020 03:44

k8s-ci-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 16, 2020

logicalhan force-pushed the cardinality-kep branch from 85c5a72 to 1833b03 Compare April 16, 2020 03:46

k8s-ci-robot requested review from brancz and lilic April 16, 2020 03:46

logicalhan force-pushed the cardinality-kep branch from 1833b03 to 620e387 Compare April 16, 2020 03:48

add code samples

d103b95

logicalhan force-pushed the cardinality-kep branch from 620e387 to d103b95 Compare April 16, 2020 03:50

whoops

ca83c32

logicalhan commented Apr 16, 2020

View reviewed changes

ehashman reviewed Apr 16, 2020

View reviewed changes

dashpole reviewed Apr 16, 2020

View reviewed changes

logicalhan commented Apr 16, 2020

View reviewed changes

piosz requested review from x13n and serathius April 17, 2020 08:41

x13n reviewed Apr 20, 2020

View reviewed changes

logicalhan changed the title ~~[WIP] Dynamic Cardinality Enforcement~~ Dynamic Cardinality Enforcement Apr 29, 2020

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 29, 2020

fix language in KEP add open discussion points

6551a76

logicalhan force-pushed the cardinality-kep branch from 17deca1 to 6551a76 Compare April 29, 2020 20:55

lilic reviewed Apr 30, 2020

View reviewed changes

dashpole reviewed May 14, 2020

View reviewed changes

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label May 19, 2020

keps/../20200415-cardinality-enforcement.md: Adjust wording and clari…

b77bd7a

…fy a few points Also removes open questions as those were solved in the discussion.

lilic force-pushed the cardinality-kep branch from 63974c9 to b77bd7a Compare May 19, 2020 14:31

brancz reviewed May 19, 2020

View reviewed changes

add reviewers and address histogram bucket issue

b9009c4

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 19, 2020

k8s-ci-robot assigned brancz May 19, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 19, 2020

dashpole reviewed May 19, 2020

View reviewed changes

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 19, 2020

move KEP to implementable

9a010d7

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 19, 2020

k8s-ci-robot assigned dashpole May 19, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 19, 2020

k8s-ci-robot merged commit f13935c into kubernetes:master May 19, 2020

k8s-ci-robot added this to the v1.19 milestone May 19, 2020

ehashman mentioned this pull request Jan 21, 2021

Metric cardinality enforcement #2305

Open

8 tasks

logicalhan mentioned this pull request Jul 9, 2021

⚠️ Drop rest client request latecy metric kubernetes-sigs/controller-runtime#1586

Closed


		TLDR; metrics with unbounded dimensions can cause memory issues in the components they instrument.

		The simple solution to this problem is to say "don't do that". We (SIG instrumentation) have already done in our instrumentation guidelines, which specifically states that ["one should know a comprehensive list of all possible values for a label at instrumentation time."](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/instrumentation.md#dimensionality--cardinality).

		This design allows us to optionally adopt @lilic's excellent idea about simplifying the interface for component owners, who can then opt to just specify a metric and label pair without having to specify a whitelist. Personally, I like that idea since it simplifies how a component owner can implement our cardinality enforcing helpers without having to necessary plumb through complicated maps. This would make it considerably easier to feed this data in through the command line since you could do something like this:


		- @dashpole

		> Should have labels with a specific set of values, should we start enforcing that all metrics have a whitelist at compile-time?


		We will expose the machinery and tools to bind a metric's labels to a discrete set of values.

		It is not a goal to implement and plumb this solution for each Kubernetes component (there are many SIGs and a number of verticals, which may have their own preferred way of doing things). As such it will be up to component owners to leverage this functionality that we provide, by feeding configuration data through whatever mechanism deemed appropriate (i.e. command line flags or reading from a file).

Dynamic Cardinality Enforcement #1692

Dynamic Cardinality Enforcement #1692

Conversation

logicalhan commented Apr 16, 2020

k8s-ci-robot commented Apr 16, 2020

logicalhan commented Apr 16, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erain commented Apr 16, 2020

k8s-ci-robot commented Apr 16, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lilic left a comment

Choose a reason for hiding this comment

logicalhan commented Apr 30, 2020

Choose a reason for hiding this comment

lilic commented May 19, 2020

brancz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brancz commented May 19, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dashpole commented May 19, 2020

dashpole commented May 19, 2020

dashpole commented May 19, 2020

ehashman commented May 19, 2020