Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Storage reads based on time range in local storage or not #3041

Closed
wleese opened this Issue Aug 9, 2017 · 6 comments

Comments

Projects
None yet
4 participants
@wleese
Copy link

wleese commented Aug 9, 2017

What did you do?
Configured Prometheus with remote storage to push data to InfluxDB (purely for long retention purposes)

What did you expect to see?
Some kind of 'intelligent' awareness of where to query data from.

What did you see instead? Under which circumstances?
All read queries are simply pushed to remote storage

  • Prometheus version:
    1.7.1

It would seem to me that currently a common setup would be:
[ Prometheus ] -> [ Remote Storage Adapter ] -> [ InfluxDB ]

Where InfluxDB or any other solution would purely be used when wanting more data than just Prometheus' retention period.

A truly intelligent would no doubt be hard to create, but if this setup is indeed common, it could be valuable to introduce a setting to make Prometheus aware of being run in such a setup, allowing queries within Prometheus' retention range to be solely handled by Prometheus.

A simple workaround (?) would be to have an extra setting in the Remote Storage adapter that returns an empty data set if the query time range is < than $config_file_setting

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 9, 2017

This needs to be configurable, as not all remote adapters are remote storage.

@tomwilkie

This comment has been minimized.

Copy link
Member

tomwilkie commented Aug 9, 2017

We could quite easily add a "low water mark" field to the remote read config. It would be a duration, and we would only read from remotes it the range is outside of now - low water mark.

Alternatively we could use the configured retention period, although I think having a separate duration might be more useful for cases where you've blown your prometheus data away. WDYT?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 9, 2017

We need to know when the local storage was initilised/oldest timestamp, and then use whichever is newer of that and the retention period. Plus some slack.

This can't be configured as a simple option, as the range changes over time.

Thib17 added a commit to Thib17/prometheus that referenced this issue Aug 23, 2017

[Draft] Remote storage reads based on oldest timestamp in local storage
Currently all read queries are simply pushed to remote storage.
We need instead to compare the oldest timestamp in local storage
with the query range lower boundary. If the oldest timestamp is
older than the from parameter, then there is no need for remote
storage calls.
This commit add FirstTime() in SeriesIterator interface,
and logic to use it.

Fixes prometheus#3041.

Signed-off-by: Thibault Chataigner <t.chataigner@criteo.com>
@Thib17

This comment has been minimized.

Copy link
Contributor

Thib17 commented Aug 28, 2017

Hello,
Sorry for the noise on this issue. I'm working on a draft that could fix it. It appears that I missunderstood the last comment and tried to implement a solution where we use the oldest timestamp for each timeseries. It doesn't seem to be the easiest/best way to make remote reads only when it's needed. Therefore I'm refocusing myself on comparing retention period and oldest timestamp of the whole local storage. @brian-brazil do you have more insights about it ? For example, should I try to get the timestamp of the oldest chunk or should the timestamp of the local data storage initialization be stored somewhere as metadata ?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 28, 2017

For example, should I try to get the timestamp of the oldest chunk or should the timestamp of the local data storage initialization be stored somewhere as metadata ?

Either would work, oldest chunk is probably wisest. That would need to be plumbed through from the tsdb repo.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.