Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retention time configurable per series (metric, rule, ...). #1381

Open
taviLaies opened this issue Feb 10, 2016 · 43 comments
Open

Retention time configurable per series (metric, rule, ...). #1381

taviLaies opened this issue Feb 10, 2016 · 43 comments

Comments

@taviLaies
Copy link

Hello,

I'm evaluating prometheus as our telemetry platform and I'm looking to see if there's a way to set up graphite-like retention.
Let's assume I have a retention period of 15d in prometheus and I define aggregation rules that collapse the samples to 1h aggregates. Is there a way to keep this new metric around for more than 15 days?
If this is not possible, could you provide some insight on how you approach historical data in your systems?

Thank you

@matthiasr
Copy link
Contributor

This is not something Prometheus supports directly at the moment and for the foreseeable future. The focus right now is on operational monitoring, i.e. the "here and now".

You can get something like this by using a tiered system. The first-level Prometheus would scrape all the targets and compute the rules. A second-level Prometheus would federate from it, only fetching the result of these rules.

It can do so at a lower resolution, but keep in mind that if you set the scrape_interval to more than 5 minutes your time series will no longer be treated as contiguous. It can also keep them for longer. Theoretically this is only limited by disk space, however again, very long retention is not a focus so YMMV.

Additionally, the second-level Prometheus could use the (experimental) remote storage facilities to push these time series to OpenTSDB or InfluxDB as they are federated in. To query these you will need to use their own query mechanisms, there is no read-back support at the moment.

@beorn7
Copy link
Member

beorn7 commented Feb 13, 2016

The "5min-problem" is handled by #398. The planned grouping of rules will allow individual evaluation intervals for groups. So something like a "1 hour aggregate" can be configured in a meaningful way.

The piece missing is retention time per series, which I will rename this bug into and make it a feature request. We discussed it several times. It's not a high priority right now, but certainly something we would consider.

@beorn7 beorn7 changed the title Graphite-like retention setup Retention time configurable per series (metric, rule, ...). Feb 13, 2016
@klausenbusk
Copy link

A per job retention period is what I need for my use-case.

I pull 4 metric from my solar panel every 30 second, and want to store them forever (so I can for example go 6 months back and see the production at that momemt) but I don't need that for all the other metric (like Prometheus metric).

@brian-brazil
Copy link
Contributor

Prometheus is not intended for indefinite storage, you want #10.

@klausenbusk
Copy link

Prometheus is not intended for indefinite storage, you want #10.

I see #10 make sense if you have a lot of time series, but OpenTSDB seems kind of overkill just to store 4 time series forever. Isn't it just a question of allowing people to set retention period to forever? or do you think people will "abuse" that?

@brian-brazil
Copy link
Contributor

We make design decisions that presume that Promtheus data is ephemeral, and can be lost/blown away with no impact.

@onorua
Copy link

onorua commented Mar 26, 2017

Coming here from google groups discussion about the same topic
I think we could use some per-series retention period for recording rules and metrics it is based upon.
We have 3k hosts, which are reporting country they served requests from, we aggregate this values in recording rule, and basically never need raw metrics. But they are using RAM, storage etc.

@gouthamve
Copy link
Member

I plan to tackle this today. So essentially it would mean this, regularly calling the delete API and in the background cleaning up the tombstones. Where should this live is the question.

My inclination is that we could leverage the delete API itself and then add a tombstone cleanup API, and add functionality to promtool to call the APIs regularly with the right matchers.

Else, I would need to manipulate the blocks on disk with a separate tool which I must say, I'm not inclined to do.

/cc @brian-brazil @fabxc @juliusv @grobie

@fabxc
Copy link
Contributor

fabxc commented Nov 22, 2017 via email

@gouthamve
Copy link
Member

gouthamve commented Nov 22, 2017

My concern there is the edge-cases, what if the request to restart compacting fails? While the tsdb tool makes perfect sense on static data, I think it would be cleaner if we could make it an API on top of tsdb.DB that the applications built on top can leverage.

For the tsdb tool case, if we know we are acting on static data then we can instantiate a DB and work with that. We can have two options for promtool delete live and delete static though I highly doubt anybody will be working with static dirs.

Having it as an API also allows us to make it a feature of Prometheus if people care and Brian agrees ;)

@brian-brazil
Copy link
Contributor

I wouldn't object to delete and force cleanup functionality being added to promtool.

I have a general concern that users looking for this tend to be over-optimising and misunderstanding how Prometheus is intended to be used, such as the original post of this issue. I'd also have performance concerns with all this cleanup going on.

@krasi-georgiev
Copy link
Contributor

don't think anything can be done on the tsdb side for this so removed the local storage label.

Doesn't seem there is a big demand for such a use case and since the issue is so old maybe should close it and revisit if it comes up again or if @taviLaies is still interested in this.

@csmarchbanks
Copy link
Member

A few of us had discussions around this at KubeCon and find dynamic retention valuable for both Prometheus and Thanos. Generally the approach we were discussing is to include the tool within the Prometheus code as part of compaction, and allow users to define retention with matchers. Design doc will be coming soon, but I am happy to hear any major concerns around compaction time processing sooner than later so I can include them.

@brian-brazil
Copy link
Contributor

brian-brazil commented Nov 23, 2019 via email

@matthiasr
Copy link
Contributor

I think in this case it's justified to expose this to users – "I want to keep some metrics longer than others" is such a common use case that I don't think we should relegate it to "write your own shell scripts". The impact on query semantics doesn't have to be explicitly bound to compaction – it can simply be "samples will disappear within X hours after they have reached their retention period".

@brian-brazil
Copy link
Contributor

There's a few unrelated things being tied together there. One thing we do know is that users tend to be over-aggressive in their settings, which then causes them significant performance impact. This is why we don't currently have a feature in this area, the last person to investigate it found it to not work out in practice.

is such a common use case that I don't think we should relegate it to "write your own shell scripts".

It'd be a single curl/promtool invocation, so it's not something that even really classifies as a shell script.

@matthiasr
Copy link
Contributor

matthiasr commented Nov 25, 2019 via email

@brian-brazil
Copy link
Contributor

It would still need to be executed regularly to fulfill the need. So it
needs to be scheduled, monitored, updated.

Cron covers that largely, plus existing disk space alerting.

When would the corresponding space be freed?

Typically it'd be automatically within 2 hours. Unless they trigger it manually (which is where performance problems tend to come in, this gets triggered far too often).

@fche
Copy link

fche commented Nov 25, 2019

How close can one get to an ideal scenario where a user is not made to worry about what to retain for how long, but instead the system adapts to a storage quota? It could track actual query usage of metrics and their time windows, so it can predict metrics / times that are likely to be unneeded, and prefer them for disposal.

@csmarchbanks
Copy link
Member

Sorry, I was unclear. The storage documentation already says that blocks will not get cleaned up for up to two hours after the have exceeded the retention setting. E.g. with a retention of 6 hours, I can still query data from 8 - 10 hours ago.

@brian-brazil
Copy link
Contributor

That's only at the bounds of full retention, and IMHO we should keep the 1.x behaviour of having a consistent time for that. It's not the last few hours of data with typical retention times.

@csmarchbanks
Copy link
Member

I have started a design doc for this work here: https://docs.google.com/document/d/1Dvn7GjUtjFlBnxCD8UWI2Q0EiCcBEx_j9eodfUkE8vg/edit?usp=sharing

All comments are appreciated!

@kovalev94
Copy link

Is there any progress on this issue? I had similar problem. I want to monitor total errors count on networks switches, but on some of them there isn't snmp oid for total errors. So i should get different types of errors(CRC, Aligment, Jabber etc.) and calculate sum of them. But i want to keep only total errors, not others.

@csmarchbanks
Copy link
Member

No progress to report, there are still many unresolved comments in the design doc I put forward, and I have not had the time or energy required to get consensus. There is some work related to this in Thanos that has been proposed for Google Summer of Code (thanos-io/thanos#903).

If you only need to delete certain well known series, calling the delete series api on a regular schedule is an option.

@anthonyeleven
Copy link
Contributor

Sorry, I was unclear. The storage documentation already says that blocks will not get cleaned up for up to two hours after the have exceeded the retention setting.

FWIW I can live with that. Ceph already behaves sort of this way when deleting RBD volumes.

In my situation, there are metrics that aren't likely to be useful past, say, 30 days like network stats. Others could have value going back for months, eg. certain Ceph capacity, performance, etc. metrics. Ideally I'd love to be able to downsample older metrics - maybe I only need to keep one per day.

Use-case: feeding Grafana dashboards and ad-hoc queries. The federation idea is clever and effective, but would complicate the heck out of queries. I would need to duplicate dashboards across the 2 (or more) datasources which doesn't make for the best user experience, and is prone to divergence.

@Rezeye
Copy link

Rezeye commented Dec 11, 2020

Has this been looked into any further by the development team? Or have any users found any work arounds? This would help me a lot with my dashboards.

@dfredell
Copy link

My workaround is to deploy a VictoriaMetrics next to the prometheus. Then configure Victoria to scrape prometheus but filter which metrics to scrape, have different retention, and loose granularity.

Command flags:

      - -retentionPeriod=120 # 120 months
      - -dedup.minScrapeInterval=15m

promscrape.config

scrape_configs:
  - job_name: prometheus
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{__name__="metric1"}'
        - '{__name__="metric2"}'
    static_configs:
      - targets:
        - 'prometheus:9090'

@DEvil0000
Copy link

DEvil0000 commented Dec 11, 2020 via email

@LajosCseppento
Copy link

Has this been looked into any further by the development team? Or have any users found any work arounds? This would help me a lot with my dashboards.

We decided to put in place clean up policy (Prometheus REST API) - default is 12 weeks, but a part is cleaned up after 6 weeks. We considered this cheaper to maintain than several instances.

@csmarchbanks
Copy link
Member

Has this been looked into any further by the development team?

There is a topic for a prometheus dev summit to discuss this issue. I am hopeful that we will discuss it either this month or in January, but I cannot say for sure. After the discussion, we will be able to provide a more complete answer as to how we would like this in (or not in) Prometheus.

Or have any users found any work arounds?

The current ways to do this are either with federation to a second Prometheus instance, or having an external process call the admin delete API.

@m-yosefpor
Copy link

m-yosefpor commented Dec 11, 2020

This would be a really great feature for prometheus. The use case of this feature is not only for long term metrics (which some people argued in the comments that is not the prometheus intend). There are lots of expensive metrics which we want to be able to have them only for a single day, but the rest of the metrics for 15 days. So now we have to operate 2 prometheus instances. Apart from operation, some of our queries needs matching operators to filter the metrics based on the series in the other instance, so it also makes it harder. I know there is thanos option for global query of multiple prometheis, but it is overkill to use it only for not being able to retain some metrics for a shorter time.

@bwplotka
Copy link
Member

It would be nice to revisit this 🤗

There are big wins if we have something like this: Prioritizing, Aggregations, satisfying data for alert only and discard etc

cc @csmarchbanks wonder if it's time resurrect your proposal (:

@csmarchbanks
Copy link
Member

If it doesn't get discussed at the upcoming dev summit, perhaps let's get a few interested parties together to get it moving without a dev summit? It's been on the agenda with a fair number of votes for quite awhile now so I hope it gets discussed.

@csmarchbanks
Copy link
Member

Good news!

There was consensus in today's dev summit that we would like to implement dynamic retention inside of the Prometheus server. The next step is to decide how we would like to implement this feature. Right now it looks like there are two proposals in the document I linked, one for a new format that allows reducing or extending retention based on a set of matchers, and a second building on rule evaluation to delete data that is older than age. Anyone who is interested, please provide feedback on either of those approaches (or a new one) so that implementation work can begin.

@FujishigeTemma
Copy link

Hi, I would like to tackle this issue as my GSoC'21 project.
For now, I've read through the discussions and docs noted below.

Let me know if there are any other existing discussions I should read.

As a first step, I'm going to do some code readings and figure out the dependencies.
And it would be nice to learn through practice, so let me know if there are any related good first issue.

I have several questions but I'm not sure what steps I should take, so I want to decide how to proceed first.
I mean we're going to decide on the detailed specifications based on the proposal sooner or later, I want to share the technical assumptions.

@csmarchbanks
Copy link
Member

@FujishigeTemma Those are great discussions to start, if you have questions about GSoC feel free to reach out to me via email or in the CNCF slack. Otherwise, part of the GSoC project will be to make sure a design is accepted and then start implementing it.

@yeya24
Copy link
Contributor

yeya24 commented Jul 13, 2021

As this project was not selected in GSoC this year, do we have any other updates or progress on this?

@roidelapluie
Copy link
Member

As this project was not selected in GSoC this year, do we have any other updates or progress on this?

There is no progress.

@yeya24
Copy link
Contributor

yeya24 commented Sep 30, 2021

Based on the design doc https://docs.google.com/document/d/1Dvn7GjUtjFlBnxCD8UWI2Q0EiCcBEx_j9eodfUkE8vg/edit#, for config like:

retention_configs:
- retention: 1w
  matchers:
  - {job=”node”}
- retention: 60d
  matchers:
  - slo_errors_total{job=”my-service”}
  - slo_requests_total{job=”my-service”}
- retention: 2w
  matchers:
  - {job=”my-service”}

An approach would be:

  1. For any retention time < global retention time: in the reloadBlocks method, block.Delete() can be called to add tombstones to each block based on matchers and time. The actual data will be deleted during compaction or clean_tombstones API.
  2. For any retention time > global retention time: before deleting the block, rewrite the block and keep any matched series chunks for longer retention. Matched series will become a new block and the original block will be deleted.

To achieve 2, we can extend the compactor interface with modifiers like #9413:

type Compactor interface {
	// Write persists a Block into a directory.
	// No Block is written when resulting Block has 0 samples, and returns empty ulid.ULID{}.
	Write(dest string, b BlockReader, mint, maxt int64, parent *BlockMeta, modifiers ...Modifier) (ulid.ULID, error)
}
// Modifier modifies the index symbols and chunk series before persisting a new block during compaction.
type Modifier interface {
	Modify(sym index.StringIter, set storage.ChunkSeriesSet, changeLog ChangeLogger) (index.StringIter, storage.ChunkSeriesSet, error)
}

We can define a retention time & matchers aware modifier to only keep the chunkSeries we want or simply use ChunkQuerier to get the ChunkSeriesSet using the given matchers.
This approach should work, but performance is a big issue as we have to rewrite blocks every 1 minute.

Implementation for modifier that goes through each series: https://github.com/yeya24/prometheus/blob/experiment-modify/tsdb/modifiers.go#L202-L277
Implementation for modifier that uses chunkQuerier: https://github.com/yeya24/prometheus/blob/experiment-modify/tsdb/modifiers.go#L291-L329

@shaoxt
Copy link

shaoxt commented May 23, 2022

@yeya24 Is the implementation going to merge ? What is the stage of the design doc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests