Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upDetermine whether remote/local data overlap should be smaller with `read_recent: false` #4184
Comments
juliusv
added
kind/bug
priority/P1
component/remote storage
labels
May 23, 2018
This comment has been minimized.
This comment has been minimized.
We always need to query both, as there could be some data in the remote that's not in the local. However the merge logic is meant to have the local data win. |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil If that is so, then this snippet in our docs is incorrect or misleading:
|
This comment has been minimized.
This comment has been minimized.
|
A /cc @tomwilkie |
This comment has been minimized.
This comment has been minimized.
|
I was presuming that read_recent was true. |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil I did mention it was not true :) But yeah, with it being left to default |
This comment has been minimized.
This comment has been minimized.
|
The git bisect gave the wrong result because the problem was masked by broken remote write in v2.2.0. But @tomwilkie fixed one half of the problem (the deduping) in #4185 Now the |
This comment has been minimized.
This comment has been minimized.
|
Read_recent is never going to be perfect, due to various races we need to request some data for which we have blocks covering the same time range. |
This comment has been minimized.
This comment has been minimized.
|
read_recent explained by start time margin being 2 * min block duration, which is default 2h |
This comment has been minimized.
This comment has been minimized.
|
So the remaining question: should it really be such a long (4h) overlap or a bit lower? Otherwise this can be closed. |
brian-brazil
added
kind/enhancement
priority/P3
and removed
kind/bug
priority/P1
labels
Jun 15, 2018
This comment has been minimized.
This comment has been minimized.
|
Core issue is resolved, so this is now just discussion of a potential optimisation. |
juliusv
changed the title
Remote read queries recent points and doesn't dedupe properly
Determine whether remote/local data overlap should be smaller with `read_recent: false`
Jun 19, 2018
This comment has been minimized.
This comment has been minimized.
|
Changed the title to reflect this. |
This comment has been minimized.
This comment has been minimized.
R4scal
commented
Jan 24, 2019
•
|
I have problem with long overlap: When query 2 days or more for 4h overlap rate/increase queries show incorrect data May be possible add optional setting for disable query remote for data exist on local storage? |
juliusv commentedMay 23, 2018
On current master, remote read seems to read all points (even recent and without
read_recent: true) from local+remote storage and also doesn't merge/dedupe them properly. I simply followed https://www.robustperception.io/using-the-cratedb-prometheus-adapter/ and when I executeprometheus_tsdb_head_samples_appended_total[1m]as an instant query in Prometheus, I'm getting the following samples (note the duplication):As a result of the duplicate samples,
irate()shows no output (since it cannot compute a rate between identical samples).The expected behavior would be to only query the remote read endpoint for samples that are older than what is in the local TSDB and if we query the remote endpoint, we should dedupe things properly.