-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Advanced Configuration #46
Conversation
cc @brancz An example config can be found in the |
0a25e1e
to
659b921
Compare
(P.S. feel free to tell me if you find the configuration overly convoluted) |
I’m on vacation until the 26., but I overflow this and generally this goes in the same direction as I was thinking, so if urgent feel free to move forward without a thorough review by me. |
We can wait a bit. I'd like to get a more detailed review from you, and it gives time to get end-user feedback. |
659b921
to
eb283ba
Compare
docs/sample-config.yaml
Outdated
# this also introduces an implicit filter on metric family names | ||
naming: | ||
preifx: "container_" | ||
suffix: "_seconds_total" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be generalized to a regexp replace maybe?
This looks like a more useful approach. A couple of things that would be handy:
I need to play with it more to see how the resource mapping stuff works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to play with this in the wild a bit more, but broadly this is really awesome and looks good to me.
I think @tcolgate's comment on regexing is justified. Personally I think hot-reloading the config is not really necessary, I'd expect one to roll out a new deployment + config if there is a change, meaning a completely new pod. This makes failure scenarios a lot easier to deal with as they appear at start rather than at runtime.
cmd/adapter/app/start.go
Outdated
@@ -85,6 +86,12 @@ func NewCommandStartPrometheusAdapterServer(out, errOut io.Writer, stopCh <-chan | |||
flags.StringVar(&o.LabelPrefix, "label-prefix", o.LabelPrefix, | |||
"Prefix to expect on labels referring to pod resources. For example, if the prefix is "+ | |||
"'kube_', any series with the 'kube_pod' label would be considered a pod metric") | |||
flags.StringVar(&o.MetricsConfigFile, "metrics-config", o.MetricsConfigFile, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say this should just be --config
, there's nothing about metrics of the adapter in here. Just a nit, feel free to ignore if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the name of the flag
docs/sample-config.yaml
Outdated
# specify that the `container_` and `_seconds_total` suffixes should be removed. | ||
# this also introduces an implicit filter on metric family names | ||
naming: | ||
preifx: "container_" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
# attach only pod and namespace resources by mapping label names to group-resources | ||
overrides: | ||
namespace: {resource: "namespace"}, | ||
pod_name: {resource: "pod"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly I think this is a change we should just make upstream, it's annoying that cadvisor labels are not following the official guidelines for instrumentation, it makes joining metrics unnecessarily hard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I've proposed it before. It'll take some doing though :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've heard "metrics are completely left out of stability guarantees for Kubernetes" often enough that I think it's a legit change for the better. Let's bring it up in the next sig meeting.
eb283ba
to
53b7828
Compare
8449f96
to
af9f11c
Compare
That look better re the regex thing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick check in from my review, I'm not completely done yet.
pkg/config/default.go
Outdated
// will be of the form `<prefix>${.Resource}$`, cadvisor series will be | ||
// of the form `container_`, and have the label `pod_name`. Any series ending | ||
// in total will be treated as a rate metric. | ||
func DefaultConfig(rateInterval time.Duration, labelPrefix string) *MetricsDiscoveryConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this can't just be a default config as in a yaml file? It seems odd that we have this "default" but double-parameterizable config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The in-program one makes the existing flags still work, so it's an easier switch-over. If we have no desire to do that, the default config can be moved to a YAML file. It also provides a nice out-of-the-box experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way people break is the same once we remove the flags, so I'd say let's break immediately and use the YAML file config option.
docs/sample-config.yaml
Outdated
# This is a Go template where the `.Series` and `.LabelMatchers` string values | ||
# are available, and the delimiters are `${` and `}$` to avoid conflicts with | ||
# the prometheus query language | ||
metricsQuery: "sum(rate(${.Series}${${.LabelMatchers}$,container_name!="POD"}[2m])) by (${.GroupBy}$)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to test this PR and tried the example config. I found this line should be metricsQueries
and not metricsQuery
based on the value in config.go#L34.
Similarly for the other rules in this config
@@ -302,17 +285,49 @@ func (l *cachingMetricsLister) RunUntil(stopChan <-chan struct{}) { | |||
func (l *cachingMetricsLister) updateMetrics() error { | |||
startTime := pmodel.Now().Add(-1 * l.updateInterval) | |||
|
|||
sels := l.Selectors() | |||
// don't do duplicate queries when it's just the matchers that change | |||
seriesCacheByQuery := make(map[prom.Selector][]prom.Series) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This map has concurrency issues. Getting concurrent map writes
and concurrent map read and map write
.
Would it be worth using a syncmap
for this? I've built locally and wrapped all uses within a mutex locak and unlock cycle, seems to fix the issue.
3342bfb
to
9c7f48a
Compare
9c7f48a
to
46f6349
Compare
Any update on this, I'd like to start using this. And don't mind contributing back if needed, but we need to scale on rabbitmq queue length which currently isn't possible. I think this PR should fix #42 though. |
Do you have any ETA on when this will fixed? Im having issues in my environment where the prometheus adapter is taking way too much RAM. I could do a temporal fix in my side (fork) until this is merged. |
46f6349
to
20d6b49
Compare
This commit introduces advanced configuration. The rate-interval and label-prefix flags are removed, and replaced by a configuration file that allows you to specify series queries and the rules for transforming those into metrics queries and API resources.
20d6b49
to
3783784
Compare
Hey, sorry this took so long to update. I think this should address remaining concerns (concurrent map access should be fixed, better docs in place, default config is just a YAML file, etc). Still have to run through final in-cluster tests, but if those pass, this should be good to merge monday :-). Please take a look and let me know what you think! NB: the delimiters in the templates in the config changed from |
I find it to be a bit faster in some cases, and easier to work with.
3783784
to
031c793
Compare
If as you said the in cluster tests succeed smoothly this lgtm |
This moves the DefaultConfig method out into a helper to generate legacy configuration. Passing in a config file is now required.
This updates the documentation and README to have information on the configuration file format.
This fixes asynchronous read/write issues to when performing series discovery by pushing series results onto a channel, instead of trying to write them directly to a map.
This makes the makefile's build target have actual dependencies, so that it only rebuilds any given adapter if that adapter's actual go files have changed (yes, this is mostly redundant with Go 1.10, but it makes working on read-only filesystems a bit nicer).
031c793
to
3dbd510
Compare
This updates our dependencies to the Kubernetes 1.11 versions. In the future, this will also allow us to support the external metrics API.
3dbd510
to
be018f7
Compare
In-cluster testing has succeeded. Need to stand up integration tests for that at some point. Merging when tests complete. |
…ncy-openshift-4.8-ose-prometheus-adapter Updating ose-prometheus-adapter builder & base images to be consistent with ART
This commit introduces advanced configuration. The rate-interval and
label-prefix flags are deprecated, and replaced by a configuration file
that allows you to specify series queries and the rules for transforming
those into metrics queries and API resources.