Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: add `skip_local_storage` bool option to `remote_read` config #4456

Open
valyala opened this Issue Aug 3, 2018 · 10 comments

Comments

Projects
None yet
6 participants
@valyala
Copy link

valyala commented Aug 3, 2018

Proposal

The issue

Sometimes remote read API endpoint contains all the data including the most recent data. Then there is no need to merge the data from local tsdb with the data read from remote read API endpoint, since the remote data fully covers all the local data. So reading local data for the subsequent merge and complete deduplication with the remote data becomes redundant and useless.

The solution

It would be great to have skip_local_storage bool option in the remote_read config section, which could disable reading local data when the corresponding remote read API endpoint is in use.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 3, 2018

Can you explain more about your use case?

We have thus far explicitly not included an option to disable local storage. For reliability you always want to be able to use the data you have locally.

@valyala

This comment has been minimized.

Copy link
Author

valyala commented Aug 3, 2018

If multiple Prometheus instances (or other third-party scrapers) scrape distinct targets and write the collected data to a single remote storage, then each Prometheus instance is able to query all the data from the remote storage written by all the scrapers, including the data missing in the local storage. In this case the merge of the remote data with the per-instance local data has little sense from the reliability point of view, since each Prometheus instance could return only a small part of the collected data, which is available in its' local storage, if the remote endpoint becomes unavailable.

Locally stored timeseries may clash unexpectedly with the timeseries written by another scrapers (such as Prometheus instances) into the remote storage. This may result in incorrect merge and, consequently, incorrect graphs.

The remote storage may transform its' data in various ways, which may break the subsequent merge of the data with Prometheus' locally stored data. For instance, the remote storage may reduce timeseries precision / resolution by aggregating the data over bigger intervals. This breaks rate graphs, since certain close points from remote storage and from local storage may have significantly varying values due to aggregation.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 3, 2018

Locally stored timeseries may clash unexpectedly with the timeseries written by another scrapers (such as Prometheus instances) into the remote storage. This may result in incorrect merge and, consequently, incorrect graphs.

This is not a sane setup. Remote storage is meant to be an extension of local storage, not to replace it.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 3, 2018

To expand a bit, Prometheus is explicitly not meant to be a clustered system as that has complexity and reliability implications. Accordingly we aren't going to add features to allow users to try and make it one, as that'd end us up in the situation we want to avoid.

You should probably look at Cortex rather than trying to create one of these yourself.

@valyala

This comment has been minimized.

Copy link
Author

valyala commented Aug 3, 2018

To expand a bit, Prometheus is explicitly not meant to be a clustered system as that has complexity and reliability implications. Accordingly we aren't going to add features to allow users to try and make it one, as that'd end us up in the situation we want to avoid.

The clustered system is already there - almost all the existing remote storage integrations are clustered in the reality. But it remains outside the Prometheus. The Prometheus just pushes data into these systems and may query it back via remote storage API.

You should probably look at Cortex rather than trying to create one of these yourself.

There is also Thanos, which provides global query view across all the Prometheus servers. I suppose it achieves this via remote storage read API, so it may hurt from issues outlined above. cc'ing @Bplotka and @fabxc - Thanos developers for their clarifications on this question.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 3, 2018

The Prometheus just pushes data into these systems and may query it back via remote storage API.

Yes, but always using the local storage as the primary.

so it may hurt from issues outlined above

It works in a different way.

@bwplotka

This comment has been minimized.

Copy link
Contributor

bwplotka commented Aug 3, 2018

There is also Thanos, which provides global query view across all the Prometheus servers. I suppose it achieves this via remote storage read API, so it may hurt from issues outlined above.

Thanos does not use any of remote read/write features, except maybe direct usage of /api/v1/read protobuf enabled endpoint to fetch data required for global query. Actually local storage is extremely important for Thanos, since we use it as local persistent metric buffer.

You can look in details how Thanos is designed here: https://improbable.io/games/blog/thanos-prometheus-at-scale (:

@henridf

This comment has been minimized.

Copy link
Contributor

henridf commented Aug 28, 2018

Fwiw I have one use case for this - using a Prometheus server solely as a query engine that reads from long-term storage via RR. This server doesn't do any scraping and so it doesn't write any samples to TSDB (unless I'm missing something).

In this setup, it would be convenient to disable local storage altogether. But I do understand that this is far enough from the regular path that a fair answer might just be "build your own server using PromQL and relevant bits as libraries".

@danni-m

This comment has been minimized.

Copy link

danni-m commented Dec 14, 2018

We would like to have the ability to use Prometheus as a reader without using the local storage (as @henridf suggested).

Also, remote read from a clustered (with HA) database can solve issues when you Prometheus server is down and you are bootstrapping a new one, it really makes sense to read the short-term data from remote.

Also, since Prometheus is not a clustered solution running it against a remote db without a lot of local storage makes a lot of sense in terms of simplicity.

@drewhemm

This comment has been minimized.

Copy link
Contributor

drewhemm commented Mar 3, 2019

Isn't this satisfied by the read_recent: false option for remote_read ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.