Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to turn off timestamps in exported metrics #2526

Open
muzammil360 opened this issue Apr 29, 2020 · 22 comments
Open

how to turn off timestamps in exported metrics #2526

muzammil360 opened this issue Apr 29, 2020 · 22 comments

Comments

@muzammil360
Copy link

Hi. I am using cadvisor with pushgateway. It so happens that cadvisor exports timestamps along with metrics. This is a problem with pushgateway as it doesn't accept metrics with timestamps. Following is the error returned if I push the metrics using curl

pushed metrics are invalid or inconsistent with existing metrics: pushed metrics must not have timestamps

It seems that pushgateway doesn't seem to honor time stamps. Is there anyway I can turn them off in cadvisor? Or if you know an option where pushgateway will ignore the time stamp, even that is acceptable.

NOTE: I know that pushing cadvisor metrics to pushgateway is an anti-pattern but for the time being, I don't have much option.

@dashpole
Copy link
Collaborator

Removing timestamps makes many metrics unusable, since we collect metrics out-of-band. I don't think we should support turning off timestamps.

@muzammil360
Copy link
Author

muzammil360 commented Apr 29, 2020 via email

@dashpole
Copy link
Collaborator

I'm not aware of any ways to make that work.

@SuperQ
Copy link

SuperQ commented Oct 1, 2020

Timestamps in Prometheus metrics cause more problems than just the pushgateway. For example, exposing timetsamps breaks staleness handling. This causes containers that have been removed to be still visible in the data for 5 minutes.

I don't know why you think the metrics are "unusable" without timestamps. For Prometheus monitoring, we expect no timestamps for metrics for almost every use case. The metric scrape is intended to be "When Prometheus last saw this data".

Exposing timestamps is causing problems for Kubernetes users.

CC @paulfantom @brian-brazil @roidelapluie @brancz

@dashpole
Copy link
Collaborator

dashpole commented Oct 1, 2020

At scrape time, the metrics returned may be up to 15 seconds old. Rates (e.g. cpu usage) didn't really work without timestamps.

We might be able to drop timestamps if the metrics are collected "on demand", which was added a few releases back: #1989.

@SuperQ
Copy link

SuperQ commented Oct 1, 2020

On demand would be much preferred, but exposing timestamps, even a few minutes stale, is just fine. I'm not sure what you did to determine "didn't work", but what cAdvisor is doing right now is much worse.

Exposing timestamps is a violation of Prometheus metrics best practices and should not be done. The linked Kubernetes operator issue describes this well.

@dashpole
Copy link
Collaborator

dashpole commented Oct 1, 2020

See #2059. CPU rates can be inaccurate by +- 50%, which most users consider "unusable".

Can you link to the best practices documentation which states that timestamps should never be done?

@SuperQ
Copy link

SuperQ commented Oct 1, 2020

Pre-compute rates shouldn't be exposed on a Promteheus endpoint. They're not useful for Prometheus users. cAdvisor should only expose the raw container CPU counters.

I will try and find the documentation on timestamps.

@dashpole
Copy link
Collaborator

dashpole commented Oct 1, 2020

These aren't precomputed. These are the raw CPU counters. That is exactly the problem. If the "real" time at which a counter is collected differs significantly from the scrape time, the rate won't be correct. For example, if we are collecting (in the background) and caching:

t0: 0
t10: 10
t25: 25

If we then scrape at t0, t9, t11, t24, t26 (I know prometheus does regular intervals, but this problem still occurs, just not as dramatically), we get rates:

t0-t9: 0/9 = 0
t9-t11: 10/2 = 5
t11-t24: 0/2 = 0
t24-t26: 13/2 = 6.5

The correct rate is 1 for the entire interval, but prometheus would graph numbers that are dramatically incorrect.

From my understanding, it is a best practice to perform collection at scrape time, and thus not expose timestamps. However, given that we do not perform collection at scrape time, it seems like we must attach timestamps so that rate computations are correct. Collecting all metrics at once causes problems when running a non-trivial number of containers (e.g. 100), which is why we don't do that by default. However, we did recently add the ability to trigger collection at scrape time. For users that are running low pod density, this could be a good option, and we could remove timestamps in that case.

Keep in mind that prometheus server isn't the only consumer of cAdvisor metrics. Not attaching timestamps for cached metrics would break rate calculations for all backends, so doing that across the board doesn't seem like a viable option.

@SuperQ
Copy link

SuperQ commented Oct 1, 2020

I will have to look at the other issue more closely, but what you're describing is not how Prometheus does calculations.

I will have to go over the linked issue, but the conclusions of the linked PR are incorrect. There is not enough information in that PR to show what's going on for real. They have one graph with 6 hours data, and one with 1 hour of data. This means that the default view is going to have a step of 14 seconds in the 1 hour view, and 86 seconds in the 6 hour view.

My first guess with #2059 based on what they're showing is that they've configured a scrape interval of 30 seconds. This is going to lead to the weird +- 50% artifacts when you have a miss-matched collection to the scrape. Then when you combine 14 second steps and Prometheus rate extrapolation, you're going to see this +-50% problem.

Basically, they've got a self-induced problem, and it's neither cAdvisors or Prometheus that is the cause.

@dashpole
Copy link
Collaborator

dashpole commented Oct 1, 2020

Another thing that is relevant here is that cAdvisor jitters the interval to spread out load. Collection occurs every 10-15 seconds. I'm not entirely sure if that matters for this problem.

@SuperQ
Copy link

SuperQ commented Oct 1, 2020

From my understanding, it is a best practice to perform collection at scrape time, and thus not expose timestamps.

Yes, you're very right about this. The best practice is to collect at scrape time.

Not attaching timestamps for cached metrics would break rate calculations for all backends

No, this is not likely a problem.

Collection occurs every 10-15 seconds. I'm not entirely sure if that matters for this problem.

No, this shouldn't be a problem.

tigrannajaryan pushed a commit to open-telemetry/opentelemetry-collector that referenced this issue Oct 21, 2020
Adds a `send_timestamps` option to the prometheus exporter to allow it to send scrape timestamps.

By default, when scraping, Prometheus records data assuming that the presented sample is at the instant of the scrape. For data sources that cache the underlying information and do not refresh on scrape, this can lead to metric samples being recorded at the wrong timestamp. For ex, cadvisor caches for many seconds (4-20 in our experience), and so a sample taken "now" may actually be a sample from 20s ago.

To handle this situation, the exposition format allows an exporter to advise the Prometheus server of the timestamp of the underlying scrape. OpenTelemetry is aware of the timestamp of the scrape.

This change adds an option to have OpenTelemetry send the timestamps of underlying samples out with the Prometheus exporter.

Visually, the image shows existing behavior prior to 9.45am and with `send_timestamps: true` set from 9.45am onwards. This is metrics for a job using a single CPU.
![image](https://user-images.githubusercontent.com/3196528/95892425-62fe5a00-0d3b-11eb-9023-af8e59652157.png)

**Related issues:**
google/cadvisor#2526
orijtech/prometheus-go-metrics-exporter#11

**Testing:**
Test cases have been added. In addition, for e2e test, see screenshot from our environment above.

**Documentation:**
The `prometheusexporter` README has been updated.
@Wayde2014
Copy link

I also encountered the same confusion, is there a solution to this problem now?

Does cadvisor support turn off timestamps in exported metrics?

@muzammil360

@muzammil360
Copy link
Author

@Wayde2014 no i didn't find anything else. But i suppose that was like 2 years ago. Cadvisor might have improved considerably during this time. I ended up running a regexp and dropping time field (as far i can remember).

@muzammil360
Copy link
Author

@Wayde2014 actually, if you look just above your comment, it seems jasonk000 attempted to fix it here

image

@Wayde2014
Copy link

Wayde2014 commented Feb 15, 2022

Thanks for the reply.

Do you still remember how the regular expression was written, thank you very much.

below is the command and its execution result:

$ curl -k -m 5 --header "Authorization: Bearer ${TOKEN}" "https://localhost:10250/metrics/cadvisor"
......
container_cpu_load_average_10s{container="",id="/",image="",name="",namespace="",pod=""} 0 1644893651118
container_cpu_load_average_10s{container="",id="/kubepods",image="",name="",namespace="",pod=""} 0 1644893653633
container_cpu_load_average_10s{container="",id="/kubepods/besteffort",image="",name="",namespace="",pod=""} 0 1644893651525
container_cpu_load_average_10s{container="",id="/kubepods/besteffort/pod00b3e4fe-cebd-47a5-8d8c-a2ffbccc146c",image="",name="",namespace="baetyl-edge",pod="idp-web-546d968bb4-4k7fj"} 0 1644893644069
......

@Wayde2014
Copy link

@Wayde2014 actually, if you look just above your comment, it seems jasonk000 attempted to fix it here

image

I have seen these, and I have searched online for nearly 2 days, but there is still no good solution.

Thank you for your enthusiastic help.

@muzammil360
Copy link
Author

Thank you for your enthusiastic help.

@Wayde2014 i don't exactly remember. I remember figuring out the pattern of time stamps i wanted to avoid. My regexp was looking for a bunch of digits e.g 1644893653633. Like you might want to avoid the time from today to 100 years from now. Thsi will limit the no. of most significant digits.

I must add that i didn't care much about compute complexity for my application. consider for your application.

@kiuber
Copy link

kiuber commented Jul 26, 2022

@muzammil360 Hi, would you share your regexp rule? thank you.

@kiuber
Copy link

kiuber commented Jul 26, 2022

curl ${cadvisor_host}:${cadvisor_host} | sed -r 's/(} .*) ([0-9]*)/\1/' works for me.

@muzammil360 Hi, would you share your regexp rule? thank you.

@Gypsying
Copy link

curl ${cadvisor_host}:${cadvisor_host} | sed -r 's/(} .*) ([0-9]*)/\1/' works for me.

@muzammil360 Hi, would you share your regexp rule? thank you.

Thanks, that works for me too.

@brancz
Copy link
Contributor

brancz commented Jul 11, 2023

I actually think this can be closed in cadvisor side, since Prometheus supports honor_timestamp. Users can that way choose themselves whether they want to honor cadvisor timestamps or rather use the scrape time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants