Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I do sampling on metrics? #1808

Closed
zxwing opened this Issue Jul 13, 2016 · 16 comments

Comments

Projects
None yet
2 participants
@zxwing
Copy link

zxwing commented Jul 13, 2016

My application needs to show the monitoring history spanning six months with the scrape_interval = 10s, which says the range vector will have ~1.5M records. We originally plan to use a recording rule to average data with the interval 15m. However, because of the bug #1095 we cannot have other rules averaging data with different intervals(e.g. 30m, 1h).

I'd like to ask your suggestions for this situation. Is there any way we can specify the sampling interval when doing avg()? That says the avg() can return a range vector instead of an instant vector. This feature is available on Kairosdb, below is a screenshot.

image

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 13, 2016

We don't and won't offer downsampling in Prometheus, however you can call query_range with whatever interval you like.

@zxwing

This comment has been minimized.

Copy link
Author

zxwing commented Jul 13, 2016

So the recording rule is the only way for my situation? Hope we can have the feature that sets evaluation_interval per rule soon

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 13, 2016

The question here is what are you trying to do?

@zxwing

This comment has been minimized.

Copy link
Author

zxwing commented Jul 13, 2016

I have a time series with 1.5 million samples collected during the past six months. How can I display them without downsampling?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 13, 2016

For that you'll need to put my_series[26w] into the query endpoint and then render them yourself. This can't be done with a single request to query_range, as we've safeties against returning too many points.

@zxwing

This comment has been minimized.

Copy link
Author

zxwing commented Jul 13, 2016

Processing the raw data in my application is always doable. However, as prometheus already have a bunch of functions and operators, I hope there would be a way doing such post-processing work in prometheus before returning samples to API callers. For me the recording rule is designed for this, but the global evaluation_interval restricts its usage.

@zxwing

This comment has been minimized.

Copy link
Author

zxwing commented Jul 13, 2016

BTW, what is the "query endpoint" you mean here? My understanding is the application can only retrieve samples from prometheus thru the HTTP APIs. Does the "query endpoint" here stand for applications?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 13, 2016

The HTTP API offers several endpoints, query_range is the one usually used for graphing and there's also query.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 13, 2016

It sounds like with you should make multiple query_range calls, or handle the processing inside your application.

@zxwing

This comment has been minimized.

Copy link
Author

zxwing commented Jul 13, 2016

"It sounds like with you should make multiple query_range calls" we can't do this as the cost is too expensive, my dashboard will access those samples frequently.

Now my solution is to create my exporter to create new time series with post-processed samples. The workflow is:

  1. Promethues pull my exporter
  2. My exporter queries promethues using functions like avg_over_time()
  3. My exporter returns processed samples to Promethues
    4 .Promethues saves the samples in a new series

Then my application can just query and render the new series.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 13, 2016

I think you're greatly over-complicating the problem. Why not just make a normal graph using avg_over_time?

@zxwing

This comment has been minimized.

Copy link
Author

zxwing commented Jul 13, 2016

avg_over_time returns an instant vector but not a range vector, I don't want to make lots of query_range calls(tens of thousands for 1.5M samples).

@zxwing

This comment has been minimized.

Copy link
Author

zxwing commented Jul 13, 2016

I paste the avg_over_time() below to make sure I don't misunderstand something.
For example I have samples for the last one hour, the avg_over_time() only returns me two samples while I may want it to return 60 samples with 1 minute interval, which is downsampling that you said won't be supported.

image

image

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 13, 2016

The console view uses query, the graph view uses query_range. You can ask query_range to request evaluation at a 1 minute interval.

@zxwing

This comment has been minimized.

Copy link
Author

zxwing commented Jul 13, 2016

Awesome! I wasn't aware the query_range has a step parameter which is exactly what I want! Thanks for your patient answer! Now Prometheus solves all of my problems, I love this project!

@zxwing zxwing closed this Jul 13, 2016

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.