Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upProposal: add `skip_local_storage` bool option to `remote_read` config #4456
Comments
This comment has been minimized.
This comment has been minimized.
|
Can you explain more about your use case? We have thus far explicitly not included an option to disable local storage. For reliability you always want to be able to use the data you have locally. |
This comment has been minimized.
This comment has been minimized.
|
If multiple Prometheus instances (or other third-party scrapers) scrape distinct targets and write the collected data to a single remote storage, then each Prometheus instance is able to query all the data from the remote storage written by all the scrapers, including the data missing in the local storage. In this case the merge of the remote data with the per-instance local data has little sense from the reliability point of view, since each Prometheus instance could return only a small part of the collected data, which is available in its' local storage, if the remote endpoint becomes unavailable. Locally stored timeseries may clash unexpectedly with the timeseries written by another scrapers (such as Prometheus instances) into the remote storage. This may result in incorrect merge and, consequently, incorrect graphs. The remote storage may transform its' data in various ways, which may break the subsequent merge of the data with Prometheus' locally stored data. For instance, the remote storage may reduce timeseries precision / resolution by aggregating the data over bigger intervals. This breaks |
This comment has been minimized.
This comment has been minimized.
This is not a sane setup. Remote storage is meant to be an extension of local storage, not to replace it. |
This comment has been minimized.
This comment has been minimized.
|
To expand a bit, Prometheus is explicitly not meant to be a clustered system as that has complexity and reliability implications. Accordingly we aren't going to add features to allow users to try and make it one, as that'd end us up in the situation we want to avoid. You should probably look at Cortex rather than trying to create one of these yourself. |
This comment has been minimized.
This comment has been minimized.
The clustered system is already there - almost all the existing remote storage integrations are clustered in the reality. But it remains outside the Prometheus. The Prometheus just pushes data into these systems and may query it back via remote storage API.
There is also Thanos, which provides global query view across all the Prometheus servers. I suppose it achieves this via remote storage read API, so it may hurt from issues outlined above. cc'ing @Bplotka and @fabxc - |
This comment has been minimized.
This comment has been minimized.
Yes, but always using the local storage as the primary.
It works in a different way. |
This comment has been minimized.
This comment has been minimized.
Thanos does not use any of remote read/write features, except maybe direct usage of You can look in details how Thanos is designed here: https://improbable.io/games/blog/thanos-prometheus-at-scale (: |
This comment has been minimized.
This comment has been minimized.
|
Fwiw I have one use case for this - using a Prometheus server solely as a query engine that reads from long-term storage via RR. This server doesn't do any scraping and so it doesn't write any samples to TSDB (unless I'm missing something). In this setup, it would be convenient to disable local storage altogether. But I do understand that this is far enough from the regular path that a fair answer might just be "build your own server using PromQL and relevant bits as libraries". |
This comment has been minimized.
This comment has been minimized.
danni-m
commented
Dec 14, 2018
•
|
We would like to have the ability to use Prometheus as a reader without using the local storage (as @henridf suggested). Also, remote read from a clustered (with HA) database can solve issues when you Prometheus server is down and you are bootstrapping a new one, it really makes sense to read the short-term data from remote. Also, since Prometheus is not a clustered solution running it against a remote db without a lot of local storage makes a lot of sense in terms of simplicity. |
This comment has been minimized.
This comment has been minimized.
|
Isn't this satisfied by the |
valyala commentedAug 3, 2018
•
edited
Proposal
The issue
Sometimes remote read API endpoint contains all the data including the most recent data. Then there is no need to merge the data from local tsdb with the data read from remote read API endpoint, since the remote data fully covers all the local data. So reading local data for the subsequent merge and complete deduplication with the remote data becomes redundant and useless.
The solution
It would be great to have
skip_local_storagebool option in the remote_read config section, which could disable reading local data when the corresponding remote read API endpoint is in use.