Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retention time configurable per series (metric, rule, ...). #1381

Open
taviLaies opened this Issue Feb 10, 2016 · 12 comments

Comments

Projects
None yet
9 participants
@taviLaies
Copy link

taviLaies commented Feb 10, 2016

Hello,

I'm evaluating prometheus as our telemetry platform and I'm looking to see if there's a way to set up graphite-like retention.
Let's assume I have a retention period of 15d in prometheus and I define aggregation rules that collapse the samples to 1h aggregates. Is there a way to keep this new metric around for more than 15 days?
If this is not possible, could you provide some insight on how you approach historical data in your systems?

Thank you

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Feb 10, 2016

This is not something Prometheus supports directly at the moment and for the foreseeable future. The focus right now is on operational monitoring, i.e. the "here and now".

You can get something like this by using a tiered system. The first-level Prometheus would scrape all the targets and compute the rules. A second-level Prometheus would federate from it, only fetching the result of these rules.

It can do so at a lower resolution, but keep in mind that if you set the scrape_interval to more than 5 minutes your time series will no longer be treated as contiguous. It can also keep them for longer. Theoretically this is only limited by disk space, however again, very long retention is not a focus so YMMV.

Additionally, the second-level Prometheus could use the (experimental) remote storage facilities to push these time series to OpenTSDB or InfluxDB as they are federated in. To query these you will need to use their own query mechanisms, there is no read-back support at the moment.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Feb 13, 2016

The "5min-problem" is handled by #398. The planned grouping of rules will allow individual evaluation intervals for groups. So something like a "1 hour aggregate" can be configured in a meaningful way.

The piece missing is retention time per series, which I will rename this bug into and make it a feature request. We discussed it several times. It's not a high priority right now, but certainly something we would consider.

@beorn7 beorn7 changed the title Graphite-like retention setup Retention time configurable per series (metric, rule, ...). Feb 13, 2016

@klausenbusk

This comment has been minimized.

Copy link

klausenbusk commented Jul 24, 2016

A per job retention period is what I need for my use-case.

I pull 4 metric from my solar panel every 30 second, and want to store them forever (so I can for example go 6 months back and see the production at that momemt) but I don't need that for all the other metric (like Prometheus metric).

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 24, 2016

Prometheus is not intended for indefinite storage, you want #10.

@klausenbusk

This comment has been minimized.

Copy link

klausenbusk commented Jul 24, 2016

Prometheus is not intended for indefinite storage, you want #10.

I see #10 make sense if you have a lot of time series, but OpenTSDB seems kind of overkill just to store 4 time series forever. Isn't it just a question of allowing people to set retention period to forever? or do you think people will "abuse" that?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 24, 2016

We make design decisions that presume that Promtheus data is ephemeral, and can be lost/blown away with no impact.

@onorua

This comment has been minimized.

Copy link

onorua commented Mar 26, 2017

Coming here from google groups discussion about the same topic
I think we could use some per-series retention period for recording rules and metrics it is based upon.
We have 3k hosts, which are reporting country they served requests from, we aggregate this values in recording rule, and basically never need raw metrics. But they are using RAM, storage etc.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Nov 22, 2017

I plan to tackle this today. So essentially it would mean this, regularly calling the delete API and in the background cleaning up the tombstones. Where should this live is the question.

My inclination is that we could leverage the delete API itself and then add a tombstone cleanup API, and add functionality to promtool to call the APIs regularly with the right matchers.

Else, I would need to manipulate the blocks on disk with a separate tool which I must say, I'm not inclined to do.

/cc @brian-brazil @fabxc @juliusv @grobie

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Nov 22, 2017

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Nov 22, 2017

My concern there is the edge-cases, what if the request to restart compacting fails? While the tsdb tool makes perfect sense on static data, I think it would be cleaner if we could make it an API on top of tsdb.DB that the applications built on top can leverage.

For the tsdb tool case, if we know we are acting on static data then we can instantiate a DB and work with that. We can have two options for promtool delete live and delete static though I highly doubt anybody will be working with static dirs.

Having it as an API also allows us to make it a feature of Prometheus if people care and Brian agrees ;)

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Nov 22, 2017

I wouldn't object to delete and force cleanup functionality being added to promtool.

I have a general concern that users looking for this tend to be over-optimising and misunderstanding how Prometheus is intended to be used, such as the original post of this issue. I'd also have performance concerns with all this cleanup going on.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Nov 9, 2018

don't think anything can be done on the tsdb side for this so removed the local storage label.

Doesn't seem there is a big demand for such a use case and since the issue is so old maybe should close it and revisit if it comes up again or if @taviLaies is still interested in this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.