Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement vertical query merging and compaction #90

Closed
fabxc opened this Issue Jun 6, 2017 · 12 comments

Comments

Projects
None yet
5 participants
@fabxc
Copy link
Member

fabxc commented Jun 6, 2017

We should add support to query and compact time-overlapping blocks. Probably no need for sophisticated handling of overlapping chunks.

This is a rather complex one and we have to discuss specifics. But once done, it makes our life for restoring from backups etc. a lot easier.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented May 7, 2018

This should be relatively straightforward from what I can see. For us, BlockReader is an interface and we need to make sure the overlapping blocks turnout to be a single block of that interface.

What we essentially want is I hope clear from the following pictures:
screen shot 2018-05-07 at 1 00 45 pm
screen shot 2018-05-07 at 1 00 54 pm

Now if the same series exists in overlapping blocks, we want to merge the data, but if there exists samples with the same time-stamp but different values, we just drop the "second" (arbitrarily choose one) one. This is okay as it violates the TSDB invariants.

/cc @krasi-georgiev @fabxc

PS: This might take some extra memory, but that is okay given how we don't want to recommend this.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented May 7, 2018

From IRC, @brian-brazil on the semantics:

bulk inserted data wins or the insert fails.

We can do the same thing mentioned above, but, we delete the data that is overlapping in the existing blocks in promtool when inserting data using promtool. Little more work, but we can start with the proposal above, and then when integrating with promtool, implement the deletes.

@bwplotka

This comment has been minimized.

Copy link
Contributor

bwplotka commented May 8, 2018

I can see on IRC that the main question about this issues is "why we need that", can we enumerate the use cases, before starting to designing this?

For me this issues is also about the assumptions taken by compaction. Basically, I would love to have compactor be resilient on eventual storage write-read consistency. Why? Because so far it (lack of support for it) produced major release issue for 2.2.0 as well as major compaction issues on Thanos a month ago.

Basically:

  • if compactor sees block for T and for T - 2 it automatically assumes that T-1 is missing for sure.
  • it compacts T and T -2 together. Now when T - 1 is back we have broken state.
    As a result, query is utterly broken (sometimes showing sometimes <T, T-2> or things like that) and further compactions are dropping data around (most likely that T-1 block).

Making compactor aware that overlaping block is a think will basically reduce damage radius if any of these bugs happen again.

@gouthamve can you explain more, how bulk insert can result in overlapping (same) data? (I think I am missing the context on how bulk insert is exactly planned to be done)

I would also keep in mind that TSDB is a library and while mainly used for Prometheus, other use it as well. For Thanos this feature would be extremely valuable, because:

  • Compaction would be able to handle eventual consistency (improbable-eng/thanos#298). That will simplify our code and prevent super easy to be introduced bugs.
  • Compaction would be able to "self heal" if the blocks are overlapping. Currently I was required to implement repair job for it: improbable-eng/thanos#206

So yea. It is needed for Thanos for sure, any other use cases? Issue was created Jan 2017, so @fabxc had some certain use cases in mind for sure (:

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented May 15, 2018

I can work on this once we define the why/what/how
@gouthamve waiting for your input 🙏

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented May 16, 2018

just watched @gouthamve's video that add's another use case - back-filling
https://youtu.be/0UvKEHFNu4Q?t=1219

@fabxc

This comment has been minimized.

Copy link
Member Author

fabxc commented May 16, 2018

@bwplotka

This comment has been minimized.

Copy link
Contributor

bwplotka commented May 16, 2018

Yea, but we need to make sure compaction does not add regression when you put overlapping blocks. Currently, with overlap check we added, we just block the compaction flow. It is not safe to just unblock compaction in the current form in case of overlapping blocks

@bwplotka

This comment has been minimized.

Copy link
Contributor

bwplotka commented May 25, 2018

Any ideas for this? It is blocking: improbable-eng/thanos#348

@codesome

This comment has been minimized.

Copy link
Member

codesome commented Aug 31, 2018

I would like to work on this, will come up with ideas for this soon.
This would also help implement bulk loading (#24, prometheus/prometheus#535)

@codesome codesome referenced this issue Aug 31, 2018

Open

Bulk Imports #24

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Aug 31, 2018

great, I will find time to review it.

I'd think in general handling this at query time first will be by far
sufficient. Compaction adds lots of attack surface for data loss and
regressions and we don't have a use case where we'd be dealing with high
fragmentation.

don't forget to sync with @gouthamve and @fabxc tsdb gurus for the best design approach 😜

@codesome

This comment has been minimized.

Copy link
Member

codesome commented Aug 31, 2018

@fabxc @gouthamve
Are we looking at only vertical query merging? With bulk import I feel vertical compaction would be helpful.

@bwplotka

This comment has been minimized.

Copy link
Contributor

bwplotka commented Aug 31, 2018

Again, IMO vertical compaction is must-have. (: For bulk import if you want bulk import further in past, but also more resilient compaction overall.

Plus improbable-eng/thanos#348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.