[WIP] Advanced Configuration #46

DirectXMan12 · 2018-02-16T23:17:02Z

This commit introduces advanced configuration. The rate-interval and
label-prefix flags are deprecated, and replaced by a configuration file
that allows you to specify series queries and the rules for transforming
those into metrics queries and API resources.

DirectXMan12 · 2018-02-16T23:18:25Z

cc @brancz

An example config can be found in the config package, as generated by the DefaultConfig function. I want to write a few more unit tests, and maybe simplify the config a bit, and/or make whitelisting easier/less verbose.

DirectXMan12 · 2018-02-16T23:24:38Z

(P.S. feel free to tell me if you find the configuration overly convoluted)

brancz · 2018-02-17T16:07:05Z

I’m on vacation until the 26., but I overflow this and generally this goes in the same direction as I was thinking, so if urgent feel free to move forward without a thorough review by me.

DirectXMan12 · 2018-02-19T14:20:20Z

We can wait a bit. I'd like to get a more detailed review from you, and it gives time to get end-user feedback.

tcolgate · 2018-02-28T19:28:32Z

docs/sample-config.yaml

+  # this also introduces an implicit filter on metric family names
+  naming:
+    preifx: "container_"
+    suffix: "_seconds_total"


Could this be generalized to a regexp replace maybe?

tcolgate · 2018-02-28T19:41:17Z

This looks like a more useful approach. A couple of things that would be handy:

dynamic config reload, either watch the config and reload, or provide a web endpoint
potentially multiple config files to allow them to be written separately.
The prefix/suffic options might be better done as a regex match and replace

I need to play with it more to see how the resource mapping stuff works.
It's a shame that the custom metrics API seem to reclude the idea of a user dumping a query straight into a HPA config, but if we must preconfigure the metrics that are available, this seems like the most adptable approach.
(I'd rebuilding our prom sidecar at the moment and am probably going to include a customresource for providing chunks of config for k8s-prometheus-adapter, so users will be able to add custom metrics by creating these rules)

brancz

I'd have to play with this in the wild a bit more, but broadly this is really awesome and looks good to me.

I think @tcolgate's comment on regexing is justified. Personally I think hot-reloading the config is not really necessary, I'd expect one to roll out a new deployment + config if there is a change, meaning a completely new pod. This makes failure scenarios a lot easier to deal with as they appear at start rather than at runtime.

brancz · 2018-03-02T12:41:20Z

cmd/adapter/app/start.go

@@ -85,6 +86,12 @@ func NewCommandStartPrometheusAdapterServer(out, errOut io.Writer, stopCh <-chan
 	flags.StringVar(&o.LabelPrefix, "label-prefix", o.LabelPrefix,
 		"Prefix to expect on labels referring to pod resources.  For example, if the prefix is "+
 			"'kube_', any series with the 'kube_pod' label would be considered a pod metric")
+	flags.StringVar(&o.MetricsConfigFile, "metrics-config", o.MetricsConfigFile,


I'd say this should just be --config, there's nothing about metrics of the adapter in here. Just a nit, feel free to ignore if you disagree.

+1 for the name of the flag

brancz · 2018-03-02T15:49:53Z

docs/sample-config.yaml

+  # specify that the `container_` and `_seconds_total` suffixes should be removed.
+  # this also introduces an implicit filter on metric family names
+  naming:
+    preifx: "container_"


brancz · 2018-03-02T15:50:29Z

docs/sample-config.yaml

+    # attach only pod and namespace resources by mapping label names to group-resources
+    overrides:
+      namespace: {resource: "namespace"},
+      pod_name: {resource: "pod"},


honestly I think this is a change we should just make upstream, it's annoying that cadvisor labels are not following the official guidelines for instrumentation, it makes joining metrics unnecessarily hard

agreed, I've proposed it before. It'll take some doing though :-/

I've heard "metrics are completely left out of stability guarantees for Kubernetes" often enough that I think it's a legit change for the better. Let's bring it up in the next sig meeting.

DirectXMan12 · 2018-03-13T18:45:40Z

That look better re the regex thing?

brancz

Just a quick check in from my review, I'm not completely done yet.

brancz · 2018-03-14T02:44:38Z

pkg/config/default.go

+// will be of the form `<prefix>${.Resource}$`, cadvisor series will be
+// of the form `container_`, and have the label `pod_name`.  Any series ending
+// in total will be treated as a rate metric.
+func DefaultConfig(rateInterval time.Duration, labelPrefix string) *MetricsDiscoveryConfig {


Any reason why this can't just be a default config as in a yaml file? It seems odd that we have this "default" but double-parameterizable config.

The in-program one makes the existing flags still work, so it's an easier switch-over. If we have no desire to do that, the default config can be moved to a YAML file. It also provides a nice out-of-the-box experience.

The way people break is the same once we remove the flags, so I'd say let's break immediately and use the YAML file config option.

JoelSpeed · 2018-03-16T16:18:50Z

docs/sample-config.yaml

+  # This is a Go template where the `.Series` and `.LabelMatchers` string values
+  # are available, and the delimiters are `${` and `}$` to avoid conflicts with
+  # the prometheus query language
+  metricsQuery: "sum(rate(${.Series}${${.LabelMatchers}$,container_name!="POD"}[2m])) by (${.GroupBy}$)"


I've been trying to test this PR and tried the example config. I found this line should be metricsQueries and not metricsQuery based on the value in config.go#L34.

Similarly for the other rules in this config

JoelSpeed · 2018-04-16T08:49:25Z

pkg/custom-provider/provider.go

@@ -302,17 +285,49 @@ func (l *cachingMetricsLister) RunUntil(stopChan <-chan struct{}) {
 func (l *cachingMetricsLister) updateMetrics() error {
 	startTime := pmodel.Now().Add(-1 * l.updateInterval)

-	sels := l.Selectors()
+	// don't do duplicate queries when it's just the matchers that change
+	seriesCacheByQuery := make(map[prom.Selector][]prom.Series)


This map has concurrency issues. Getting concurrent map writes and concurrent map read and map write.

Would it be worth using a syncmap for this? I've built locally and wrapped all uses within a mutex locak and unlock cycle, seems to fix the issue.

bradenwright · 2018-05-15T18:59:14Z

Any update on this, I'd like to start using this. And don't mind contributing back if needed, but we need to scale on rabbitmq queue length which currently isn't possible. I think this PR should fix #42 though.

lumeche · 2018-05-24T18:59:00Z

Do you have any ETA on when this will fixed? Im having issues in my environment where the prometheus adapter is taking way too much RAM.

I could do a temporal fix in my side (fork) until this is merged.

This commit introduces advanced configuration. The rate-interval and label-prefix flags are removed, and replaced by a configuration file that allows you to specify series queries and the rules for transforming those into metrics queries and API resources.

DirectXMan12 · 2018-06-22T19:41:42Z

Hey, sorry this took so long to update. I think this should address remaining concerns (concurrent map access should be fixed, better docs in place, default config is just a YAML file, etc). Still have to run through final in-cluster tests, but if those pass, this should be good to merge monday :-).

Please take a look and let me know what you think!

NB: the delimiters in the templates in the config changed from ${ and }$ to << and >>. This should make things much easier for a human to read, while still not conflicting with PromQL.

I find it to be a bit faster in some cases, and easier to work with.

brancz · 2018-06-25T07:49:18Z

If as you said the in cluster tests succeed smoothly this lgtm

This moves the DefaultConfig method out into a helper to generate legacy configuration. Passing in a config file is now required.

This updates the documentation and README to have information on the configuration file format.

This fixes asynchronous read/write issues to when performing series discovery by pushing series results onto a channel, instead of trying to write them directly to a map.

This makes the makefile's build target have actual dependencies, so that it only rebuilds any given adapter if that adapter's actual go files have changed (yes, this is mostly redundant with Go 1.10, but it makes working on read-only filesystems a bit nicer).

This updates our dependencies to the Kubernetes 1.11 versions. In the future, this will also allow us to support the external metrics API.

DirectXMan12 · 2018-06-27T20:58:07Z

In-cluster testing has succeeded. Need to stand up integration tests for that at some point. Merging when tests complete.

…ncy-openshift-4.8-ose-prometheus-adapter Updating ose-prometheus-adapter builder & base images to be consistent with ART

DirectXMan12 changed the title ~~Advanced Configuration~~ [WIP] Advanced Configuration Feb 16, 2018

DirectXMan12 force-pushed the feature/advanced-config branch from 0a25e1e to 659b921 Compare February 16, 2018 23:23

DirectXMan12 mentioned this pull request Feb 22, 2018

Option to treat other metric suffixes the same as _total #48

Closed

DirectXMan12 force-pushed the feature/advanced-config branch from 659b921 to eb283ba Compare February 23, 2018 21:17

DirectXMan12 force-pushed the master branch from 6cb0037 to df48f2a Compare February 28, 2018 19:22

tcolgate reviewed Feb 28, 2018

View reviewed changes

brancz reviewed Mar 2, 2018

View reviewed changes

DirectXMan12 mentioned this pull request Mar 5, 2018

Questions about name translations #51

Closed

DirectXMan12 force-pushed the feature/advanced-config branch from eb283ba to 53b7828 Compare March 12, 2018 21:25

DirectXMan12 mentioned this pull request Mar 13, 2018

Use config file defining mapping between prometheus names/labels and k8s metrics #43

Closed

DirectXMan12 force-pushed the feature/advanced-config branch 2 times, most recently from 8449f96 to af9f11c Compare March 13, 2018 18:45

brancz mentioned this pull request Mar 14, 2018

Add support for auth when connecting to Prometheus #53

Merged

brancz reviewed Mar 14, 2018

View reviewed changes

JoelSpeed reviewed Mar 16, 2018

View reviewed changes

JoelSpeed reviewed Apr 16, 2018

View reviewed changes

This was referenced Apr 24, 2018

Custom filters for the series query #63

Closed

unable to update list of all available metrics #58

Closed

DirectXMan12 force-pushed the feature/advanced-config branch 2 times, most recently from 3342bfb to 9c7f48a Compare April 30, 2018 21:43

bradenwright mentioned this pull request May 4, 2018

Need deduplication when both metric and metric_total exist #42

Closed

DirectXMan12 force-pushed the feature/advanced-config branch from 9c7f48a to 46f6349 Compare May 10, 2018 19:40

DirectXMan12 force-pushed the feature/advanced-config branch from 46f6349 to 20d6b49 Compare June 22, 2018 19:35

Advanced Configuration

2984604

This commit introduces advanced configuration. The rate-interval and label-prefix flags are removed, and replaced by a configuration file that allows you to specify series queries and the rules for transforming those into metrics queries and API resources.

DirectXMan12 force-pushed the feature/advanced-config branch from 20d6b49 to 3783784 Compare June 22, 2018 19:38

Switch to dep for dependency management

ad1837e

I find it to be a bit faster in some cases, and easier to work with.

DirectXMan12 force-pushed the feature/advanced-config branch from 3783784 to 031c793 Compare June 22, 2018 20:06

DirectXMan12 and others added 4 commits June 27, 2018 16:56

Add a helper to generate legacy configuration

40a9ee2

This moves the DefaultConfig method out into a helper to generate legacy configuration. Passing in a config file is now required.

Advanced Config Docs Updates

32e4c5b

This updates the documentation and README to have information on the configuration file format.

Use channel for series aggregation

6089fa8

This fixes asynchronous read/write issues to when performing series discovery by pushing series results onto a channel, instead of trying to write them directly to a map.

Makefile with actual deps

1e5cd68

This makes the makefile's build target have actual dependencies, so that it only rebuilds any given adapter if that adapter's actual go files have changed (yes, this is mostly redundant with Go 1.10, but it makes working on read-only filesystems a bit nicer).

DirectXMan12 force-pushed the feature/advanced-config branch from 031c793 to 3dbd510 Compare June 27, 2018 20:56

Update dependencies to Kubernetes 1.11

be018f7

This updates our dependencies to the Kubernetes 1.11 versions. In the future, this will also allow us to support the external metrics API.

DirectXMan12 force-pushed the feature/advanced-config branch from 3dbd510 to be018f7 Compare June 27, 2018 20:57

DirectXMan12 merged commit 7b606a7 into master Jun 27, 2018

JoelSpeed mentioned this pull request Jul 3, 2018

Fix asynchronous read/writes to seriesCache #59

Closed

DirectXMan12 deleted the feature/advanced-config branch August 23, 2018 19:28

philipgough mentioned this pull request Feb 22, 2022

Implement configmap-reloader for prometheus-adapter prometheus-operator/kube-prometheus#302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Advanced Configuration #46

[WIP] Advanced Configuration #46

DirectXMan12 commented Feb 16, 2018 •

edited

Loading

DirectXMan12 commented Feb 16, 2018

DirectXMan12 commented Feb 16, 2018

brancz commented Feb 17, 2018

DirectXMan12 commented Feb 19, 2018

tcolgate Feb 28, 2018

tcolgate commented Feb 28, 2018

brancz left a comment

brancz Mar 2, 2018

yastij Mar 12, 2018

brancz Mar 2, 2018

brancz Mar 2, 2018

DirectXMan12 Mar 2, 2018 •

edited

Loading

brancz Mar 2, 2018

DirectXMan12 commented Mar 13, 2018

brancz left a comment

brancz Mar 14, 2018

DirectXMan12 Mar 15, 2018

brancz Mar 16, 2018

JoelSpeed Mar 16, 2018

JoelSpeed Apr 16, 2018

bradenwright commented May 15, 2018

lumeche commented May 24, 2018

DirectXMan12 commented Jun 22, 2018

brancz commented Jun 25, 2018

DirectXMan12 commented Jun 27, 2018 •

edited

Loading

[WIP] Advanced Configuration #46

[WIP] Advanced Configuration #46

Conversation

DirectXMan12 commented Feb 16, 2018 • edited Loading

DirectXMan12 commented Feb 16, 2018

DirectXMan12 commented Feb 16, 2018

brancz commented Feb 17, 2018

DirectXMan12 commented Feb 19, 2018

Choose a reason for hiding this comment

tcolgate commented Feb 28, 2018

brancz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DirectXMan12 Mar 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DirectXMan12 commented Mar 13, 2018

brancz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bradenwright commented May 15, 2018

lumeche commented May 24, 2018

DirectXMan12 commented Jun 22, 2018

brancz commented Jun 25, 2018

DirectXMan12 commented Jun 27, 2018 • edited Loading

DirectXMan12 commented Feb 16, 2018 •

edited

Loading

DirectXMan12 Mar 2, 2018 •

edited

Loading

DirectXMan12 commented Jun 27, 2018 •

edited

Loading