Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine whether remote/local data overlap should be smaller with `read_recent: false` #4184

Open
juliusv opened this Issue May 23, 2018 · 12 comments

Comments

Projects
None yet
4 participants
@juliusv
Copy link
Member

juliusv commented May 23, 2018

On current master, remote read seems to read all points (even recent and without read_recent: true) from local+remote storage and also doesn't merge/dedupe them properly. I simply followed https://www.robustperception.io/using-the-cratedb-prometheus-adapter/ and when I execute prometheus_tsdb_head_samples_appended_total[1m] as an instant query in Prometheus, I'm getting the following samples (note the duplication):

18221 @1527071278.548
18221 @1527071278.548
18740 @1527071283.548
18740 @1527071283.548
19259 @1527071284.136
19259 @1527071284.136
19778 @1527071289.136
19778 @1527071289.136
20297 @1527071294.136
20297 @1527071294.136
20816 @1527071299.136
20816 @1527071299.136
21335 @1527071304.136
21335 @1527071304.136
21854 @1527071309.136
21854 @1527071309.136
22373 @1527071314.136
22373 @1527071314.136
22892 @1527071319.136
22892 @1527071319.136
23411 @1527071324.136
23411 @1527071324.136
23930 @1527071329.136
23930 @1527071329.136
24449 @1527071334.136
24449 @1527071334.136

As a result of the duplicate samples, irate() shows no output (since it cannot compute a rate between identical samples).

The expected behavior would be to only query the remote read endpoint for samples that are older than what is in the local TSDB and if we query the remote endpoint, we should dedupe things properly.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 23, 2018

The expected behavior would be to only query the remote read endpoint for samples that are older than what is in the local TSDB and if we query the remote endpoint, we should dedupe things properly.

We always need to query both, as there could be some data in the remote that's not in the local. However the merge logic is meant to have the local data win.

@juliusv

This comment has been minimized.

Copy link
Member Author

juliusv commented May 23, 2018

@brian-brazil If that is so, then this snippet in our docs is incorrect or misleading:

# Whether reads should be made for queries for time ranges that
# the local storage should have complete data for.
[ read_recent: <boolean> | default = false ]
@juliusv

This comment has been minimized.

Copy link
Member Author

juliusv commented May 23, 2018

A git bisect determined that this got broken in 97a5fc8 as part of #3941

/cc @tomwilkie

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 23, 2018

I was presuming that read_recent was true.

@juliusv

This comment has been minimized.

Copy link
Member Author

juliusv commented May 23, 2018

@brian-brazil I did mention it was not true :) But yeah, with it being left to default false.

@juliusv

This comment has been minimized.

Copy link
Member Author

juliusv commented May 23, 2018

The git bisect gave the wrong result because the problem was masked by broken remote write in v2.2.0.

But @tomwilkie fixed one half of the problem (the deduping) in #4185

Now the read_recent behavior still needs to be fixed.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 23, 2018

Read_recent is never going to be perfect, due to various races we need to request some data for which we have blocks covering the same time range.

@tomwilkie

This comment has been minimized.

Copy link
Member

tomwilkie commented May 23, 2018

read_recent explained by start time margin being 2 * min block duration, which is default 2h

@juliusv

This comment has been minimized.

Copy link
Member Author

juliusv commented May 23, 2018

So the remaining question: should it really be such a long (4h) overlap or a bit lower? Otherwise this can be closed.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 15, 2018

Core issue is resolved, so this is now just discussion of a potential optimisation.

@juliusv juliusv changed the title Remote read queries recent points and doesn't dedupe properly Determine whether remote/local data overlap should be smaller with `read_recent: false` Jun 19, 2018

@juliusv

This comment has been minimized.

Copy link
Member Author

juliusv commented Jun 19, 2018

Changed the title to reflect this.

@R4scal

This comment has been minimized.

Copy link

R4scal commented Jan 24, 2019

I have problem with long overlap:
1: use local storage for 24h with 2h block duration
2: use remote storage with data sampling (10s local -> 60s remote)

When query 2 days or more for 4h overlap rate/increase queries show incorrect data

May be possible add optional setting for disable query remote for data exist on local storage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.