Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out-of-order support can create data gaps #2238

Closed
Tracked by #2241
codesome opened this issue Jun 27, 2022 · 1 comment · Fixed by grafana/mimir-prometheus#277
Closed
Tracked by #2241

Out-of-order support can create data gaps #2238

codesome opened this issue Jun 27, 2022 · 1 comment · Fixed by grafana/mimir-prometheus#277

Comments

@codesome
Copy link
Member

codesome commented Jun 27, 2022

Describe the bug

EDIT: The data gaps happens for the in-order data. And not the OOO data.

We init the head block with minValidTime as last block's maxt
https://github.com/grafana/mimir-prometheus/blob/1446b53d874c0309d8f99749ced5e1c0637cf245/tsdb/db.go#L839-L847
Which means, during the WAL replay, all samples before the minValidTime are discarded.

This is fine when there is only in-order data. But when there is OOO data, since all OOO data is compacted after the in-order head's compaction, the sample in the OOO head could be like just a minute old. And it will produce block that has data that is a minute old.

This means, if you happen to restart right after compaction, then instead of having an hour of data from the WAL reply, you might just like few samples in the series left because of the out of order block.

This is a blocker for 2.2, we will work on a fix today. cc @colega

To Reproduce

We are still on reproducing this, but these are the potential steps

  1. Have OOO samples that are not too old, like few minutes old, along with in-order data. Ideally some in-order series having no OOO samples to see the gaps properly
  2. Right after a compaction cycle, restart the ingesters.
  3. The in-order series should have some gaps now.

Expected behavior

No gaps/loss after a restart.

Environment

  • Mimir r190
@pstibrany
Copy link
Member

Technically it will be fixed by #2243.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants