Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add out-of-order sample support to the TSDB #11075

Merged
merged 28 commits into from
Sep 20, 2022

Conversation

jesusvazquez
Copy link
Member

@jesusvazquez jesusvazquez commented Aug 1, 2022

Closes #8535

This implementation is based on this design doc:
https://docs.google.com/document/d/1Kppm7qL9C-BJB1j6yb6-9ObG3AbdZnFUBYPNNWwDBYM/edit?usp=sharing

This commit adds support to accept out-of-order ("OOO") sample into the TSDB
up to a configurable time allowance. If OOO is enabled, overlapping queries
are automatically enabled.

@jesusvazquez jesusvazquez force-pushed the jvp/out-of-order-support branch 2 times, most recently from 51ad64d to 595485f Compare August 1, 2022 14:56
@jesusvazquez jesusvazquez force-pushed the jvp/out-of-order-support branch 10 times, most recently from ebd6ca7 to f1d5706 Compare August 2, 2022 13:44
@jesusvazquez
Copy link
Member Author

Having some trouble making windows tests pass, will focus on this soon. Need to address other work first.

@jesusvazquez jesusvazquez force-pushed the jvp/out-of-order-support branch 9 times, most recently from 7993db6 to faa9001 Compare August 17, 2022 18:42
@jesusvazquez jesusvazquez marked this pull request as ready for review August 18, 2022 08:39
@jesusvazquez
Copy link
Member Author

This is now ready for review 🎉 cc @codesome @bwplotka (I'm quoting you since you are both the tsdb maintainers)

@jesusvazquez jesusvazquez deleted the jvp/out-of-order-support branch September 22, 2022 10:55
bboreham added a commit to bboreham/prometheus that referenced this pull request Oct 1, 2022
This call was added by PR prometheus#11075 merged before prometheus#11318 which changed all
similar calls to `sort.Sort` into a faster one.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
LeviHarrison pushed a commit that referenced this pull request Oct 1, 2022
This call was added by PR #11075 merged before #11318 which changed all
similar calls to `sort.Sort` into a faster one.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
@wgliang
Copy link
Contributor

wgliang commented Nov 14, 2022

Can anyone answer what is the target version of this new feature?

@codesome
Copy link
Member

It is in 2.39.x

@juliusv
Copy link
Member

juliusv commented May 20, 2023

@jesusvazquez one question: in

prometheus/scrape/scrape.go

Lines 1453 to 1457 in 92d6980

// Call sl.append again with an empty scrape to trigger stale markers.
// If the target has since been recreated and scraped, the
// stale markers will be out of order and ignored.
// sl.context would have been cancelled, hence using sl.appenderCtx.
app := sl.appender(sl.appenderCtx)
we rely on the TSDB rejecting out-of-order samples (to not mark series from targets as stale when the target disappears briefly and then reappears a short time later). Does that behavior break when turning on out-of-order appends?

@roidelapluie
Copy link
Member

@jesusvazquez one question: in

prometheus/scrape/scrape.go

Lines 1453 to 1457 in 92d6980

// Call sl.append again with an empty scrape to trigger stale markers.
// If the target has since been recreated and scraped, the
// stale markers will be out of order and ignored.
// sl.context would have been cancelled, hence using sl.appenderCtx.
app := sl.appender(sl.appenderCtx)
we rely on the TSDB rejecting out-of-order samples (to not mark series from targets as stale when the target disappears briefly and then reappears a short time later). Does that behavior break when turning on out-of-order appends?

Yes #11730

@kushWithoutWax
Copy link

Hi . A couple of questions on this -

  1. The document mentions a couple of problems - ingesting OOO samples and ingesting old samples.
    Is ingesting old samples also implemented as part of this PR ?
  2. Is there a flag to enable this feature ?

@jesusvazquez
Copy link
Member Author

The document mentions a couple of problems - ingesting OOO samples and ingesting old samples.
Is ingesting old samples also implemented as part of this PR ?

Yes its implemented, a sample is too old if its older than the window you configure. Else its ingested and its just out of order.

Is there a flag to enable this feature ?

You can enable it doing this in your config file:

storage:
  tsdb:
    out_of_order_time_window: 30m

@kushWithoutWax
Copy link

Is there a max value for out_of_order_time_window ?

@jesusvazquez
Copy link
Member Author

No, there is no limit. I have only experience running this in production with a max window of 1 month. More should be feasible. No big impact on resources expected.

@kushWithoutWax
Copy link

Thank you for the info!

@syedishaq13129
Copy link

syedishaq13129 commented Mar 14, 2024

how does it work? we are also facing Out of order issue while scrapping eks metrics.. and after enabling the out of order we are getting this error..

ts=2024-03-14T09:32:59.364Z caller=dedupe.go:112 component=remote level=error remote_name=8843e5 url=http://10.0.36.190:9090/api/v1/write msg="non-recoverable error" count=2000 exemplarCount=0 err="server returned HTTP status 400 Bad Request: too old sample"

No, there is no limit. I have only experience running this in production with a max window of 1 month. More should be feasible. No big impact on resources expected.

How does the out of order paramater works , because we are also facing the same issue..

@jesusvazquez
Copy link
Member Author

I'm helping @syedishaq13129 on the CNCF slack.

We think he might need to bump his out of order window to solve the error he is getting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[tsdb] Ingest out of order samples and samples from a few hours ago