Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the instance name for standard pod scraping to be unique #261

Merged
merged 2 commits into from
May 18, 2020

Conversation

beorn7
Copy link
Contributor

@beorn7 beorn7 commented May 15, 2020

@tomwilkie @woodsaj @malcolmholmes Please have a careful look here. This is a biggie. It changes almost every metric we have. I went through all the code underneath deployment_tools/ksonnet and tried to find any code that depends on the current instance naming. I found only grafana/loki#2080 , but of course, this is subtle enough that there might be many more code paths that break due to this change. However, we have to do something about it, and I think what I propose here is the way to go.

Commit description:

Any of the potentially many containers in a pod can expose one or more
ports with Prometheus metrics. However, with our current target
labels, all of these targets get the same instance label (just the pod
name), which leads to the dreaded PrometheusOutOfOrderTimestamps
alert, see https://github.com/grafana/deployment_tools/issues/3441 .

(In fact, if we get the alert, we are already lucky, because the
problem can go unnoticed until someone actually needs one of the time
series that receive samples from different targets, rendering them
useless.)

In practice, we rarely have more than one port to scrape per pod, but
it does happen, and it's totally within the intended usage pattern of
K8s, which means it can happen more at any time.

The two examples I'm aware of:

  • Kube-state-metrics (KSM) has only one container it its pod, but that
    container exposes two metrics ports (http-metrics and self-metrics).

  • Consul pods run a container with the consul-exporter and a container
    with the statsd-exporter, each exposing their metrics on a different
    port. Both ports are named http-metrics, which is possible because
    they are exposed by different containers. (This is the case that
    triggered the above linked issue.)

To avoid the metric duplication, we could add a container and a port
label, but it is a Prometheus convention that the instance label alone
should be unique within a job.

Which brings us to what I'm proposing in this commit: Create the
instance label by joining pod name, container name, and port name with
: in between. In most cases, the resulting instance value will
appear redundant, but I believe the consistency has some
value. Applying same magic to shorten the instance label when possible
would add complexity and remove the consistency.

Any of the potentially many containers in a pod can expose one or more
ports with Prometheus metrics. However, with our current target
labels, all of these targets get the same instance label (just the pod
name), which leads to the dreaded `PrometheusOutOfOrderTimestamps`
alert, see grafana/deployment_tools#3441 .

(In fact, if we get the alert, we are already lucky, because the
problem can go unnoticed until someone actually needs one of the time
series that receive samples from different targets, rendering them
useless.)

In practice, we rarely have more than one port to scrape per pod, but
it does happen, and it's totally within the intended usage pattern of
K8s, which means it can happen more at any time.

The two examples I'm aware of:

- Kube-state-metrics (KSM) has only one container it its pod, but that
  container exposes two metrics ports (http-metrics and self-metrics).

- Consul pods run a container with the consul-exporter and a container
  with the statsd-exporter, each exposing their metrics on a different
  port. Both ports are named http-metrics, which is possible because
  they are exposed by different containers. (This is the case that
  triggered the above linked issue.)

To avoid the metric duplication, we could add a container and a port
label, but it is a Prometheus convention that the instance label alone
should be unique within a job.

Which brings us to what I'm proposing in this commit: Create the
instance label by joining pod name, container name, and port name with
`:` in between. In most cases, the resulting instance value will
appear redundant, but I believe the consistency has some
value. Applying same magic to shorten the instance label when possible
would add complexity and remove the consistency.
@gouthamve
Copy link
Member

Hrm, having hit this before, I see the value. But damn, it can potentially break things in subtle ways.

Joins would be difficult b/w cAdvisor data and metrics, if we ever want to do that. Hrm. Not sure, but we should fix this imo.

"but it is a Prometheus convention that the instance label alone should be unique within a job."

Just thinking out loud: we could break this but realised scrape_samples_scraped would be broken anyways (or would it be?). Like can we afford to break this convention? And whats the reasoning behind it?

@beorn7
Copy link
Contributor Author

beorn7 commented May 15, 2020

Joins would be difficult b/w cAdvisor data and metrics, if we ever want to do that. Hrm. Not sure, but we should fix this imo.

That's a good point. Joins with cAdvisor metrics isn't properly possible at the moment anyway because we do not attach the container name anywhere, i.e. in the consul case, you couldn't join because cAdvisor would give you metrics for the statsd-exporter and the consul-exporter container, both with the same pod name label. The "join with cAdvisor" use case is actually a reason to have a pod and container target label. I would still not "abuse" the instance label to be the same as the pod name.

About the convention: I guess it is often helpful in grouping and label matching to know that any_given_metric_name{instance="something"} will only ever have at most one match. Otherwise, you have to know what combination of labels creates a unique match, and if everyone does it differently, it gets harder to share rules and dashboards.

@beorn7
Copy link
Contributor Author

beorn7 commented May 15, 2020

I'll add a commit that also adds container and pod target labels, just for demonstration. We can rip it out of this PR if you don't like it.

This allows joining with cAdvisor metrics.
@tomwilkie
Copy link
Contributor

@beorn7 can you follow up to ensure the Loki scrape config is consistent?

@beorn7
Copy link
Contributor Author

beorn7 commented May 18, 2020

I did a quick check that we don't have any regular application metrics that have a container or pod label on their own.

@beorn7
Copy link
Contributor Author

beorn7 commented May 18, 2020

can you follow up to ensure the Loki scrape config is consistent?

Working on it.

@beorn7
Copy link
Contributor Author

beorn7 commented May 18, 2020

As this needs a vendor update to push it to production, I merge this one already.

The big and scary change will be the vendoring update for this and the Loki changes.

@beorn7 beorn7 merged commit 811ccb0 into master May 18, 2020
@beorn7 beorn7 deleted the beorn7/prom-config branch May 18, 2020 13:06
beorn7 added a commit to grafana/loki that referenced this pull request May 18, 2020
This is triggered by grafana/jsonnet-libs#261 .

The above PR changes the `instance` label to be actually unique within
a scrape config. It also adds a `pod` and a `container` target label
so that metrics can easily be joined with metrics from cAdvisor, KSM,
and the Kubelet.

This commit adds the same to the Loki scrape config. It also removes
the `container_name` label. It is the same as the `container` label
and was already added to Loki previously. However, the
`container_name` label is deprecated and has disappeared in K8s 1.16,
so that it will soon become useless for direct joining.
@beorn7
Copy link
Contributor Author

beorn7 commented May 18, 2020

@tomwilkie Follow-up for Loki: grafana/loki#2091

beorn7 added a commit that referenced this pull request May 22, 2020
beorn7 added a commit to grafana/loki that referenced this pull request May 22, 2020
This is triggered by grafana/jsonnet-libs#261 .

The above PR removes the `instance` label. As it has turned out (see
PR linked above), a sane `instance` label in Prometheus has to be
unique, and that includes the case where a single container exposes
metrics on two different endpoints. However, that scenario would still
only result in one log stream for Loki to scrape.

Therefore, Loki and Prometheus need to sync via target labels uniquely
identifying a container (rather than a metrics endpoint). Those labels
are namespace, pod, container, also added here.

This commit removes the `container_name` label. It is the same as the
`container` label and was already added to Loki previously. However,
the `container_name` label is deprecated and has disappeared in K8s
1.16, so that it will soon become useless for direct joining.
beorn7 added a commit that referenced this pull request May 26, 2020
beorn7 added a commit to grafana/loki that referenced this pull request May 27, 2020
This is triggered by grafana/jsonnet-libs#261 .

The above PR removes the `instance` label. As it has turned out (see
PR linked above), a sane `instance` label in Prometheus has to be
unique, and that includes the case where a single container exposes
metrics on two different endpoints. However, that scenario would still
only result in one log stream for Loki to scrape.

Therefore, Loki and Prometheus need to sync via target labels uniquely
identifying a container (rather than a metrics endpoint). Those labels
are namespace, pod, container, also added here.

This commit removes the `container_name` label. It is the same as the
`container` label and was already added to Loki previously. However,
the `container_name` label is deprecated and has disappeared in K8s
1.16, so that it will soon become useless for direct joining.
rfratto added a commit to rfratto/agent that referenced this pull request May 28, 2020
grafana/jsonnet-libs#261 updates labels to make instance labels unique.
This commit sycnes with that change, but subsequently makes an overdue
change by going through all the dashboards and fixing various issues
with them.

The new dashboards are compatible with the new labeling scheme, but also
fix some problems:

1. Make sure the unloved Agent and Agent Prometheus Remote Write run
   correct queries and account for the instance_name labels
2. Use proper graph label values in Agent Operational
3. Allow to filter Agent Operational graph by container

As part of making the Agent dashboard useful, a new metric has been
added to track samples added to the WAL over time.

Closes #73.
rfratto added a commit to grafana/agent that referenced this pull request May 28, 2020
grafana/jsonnet-libs#261 updates labels to make instance labels unique.
This commit sycnes with that change, but subsequently makes an overdue
change by going through all the dashboards and fixing various issues
with them.

The new dashboards are compatible with the new labeling scheme, but also
fix some problems:

1. Make sure the unloved Agent and Agent Prometheus Remote Write run
   correct queries and account for the instance_name labels
2. Use proper graph label values in Agent Operational
3. Allow to filter Agent Operational graph by container

As part of making the Agent dashboard useful, a new metric has been
added to track samples added to the WAL over time.

Closes #73.
torstenwalter pushed a commit to torstenwalter/grafana-helm-charts that referenced this pull request Oct 3, 2020
This is triggered by grafana/jsonnet-libs#261 .

The above PR removes the `instance` label. As it has turned out (see
PR linked above), a sane `instance` label in Prometheus has to be
unique, and that includes the case where a single container exposes
metrics on two different endpoints. However, that scenario would still
only result in one log stream for Loki to scrape.

Therefore, Loki and Prometheus need to sync via target labels uniquely
identifying a container (rather than a metrics endpoint). Those labels
are namespace, pod, container, also added here.

This commit removes the `container_name` label. It is the same as the
`container` label and was already added to Loki previously. However,
the `container_name` label is deprecated and has disappeared in K8s
1.16, so that it will soon become useless for direct joining.
mattdurham pushed a commit to grafana/agent that referenced this pull request Nov 11, 2021
grafana/jsonnet-libs#261 updates labels to make instance labels unique.
This commit sycnes with that change, but subsequently makes an overdue
change by going through all the dashboards and fixing various issues
with them.

The new dashboards are compatible with the new labeling scheme, but also
fix some problems:

1. Make sure the unloved Agent and Agent Prometheus Remote Write run
   correct queries and account for the instance_name labels
2. Use proper graph label values in Agent Operational
3. Allow to filter Agent Operational graph by container

As part of making the Agent dashboard useful, a new metric has been
added to track samples added to the WAL over time.

Closes #73.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants