Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop compactions if there's a block to write #13754

Merged
merged 3 commits into from Apr 7, 2024

Conversation

prymitive
Copy link
Contributor

db.Compact() checks if there's a block to write with HEAD chunks before calling db.compactBlocks(). This is to ensure that if we need to write a block then it happens ASAP, otherwise memory usage might keep growing.

But what can also happen is that we don't need to write any block, we start db.compactBlocks(), compaction takes hours, and in the meantime HEAD needs to write out chunks to a block.

This can be especially problematic if, for example, you run Thanos sidecar that's uploading block, which requires that compactions are disabled. Then you disable Thanos sidecar and re-enable compactions. When db.compactBlocks() is finally called it might have a huge number of blocks to compact, which might take a very long time, during which HEAD cannot write out chunks to a new block. In such case memory usage will keep growing until either:

  • compactions are finally finished and HEAD can write a block
  • we run out of memory and Prometheus gets OOM-killed

This change adds a check for pending HEAD block writes inside db.compactBlocks(), so that we bail out early if there are still compactions to run, but we also need to write a new block.

db.Compact() checks if there's a block to write with HEAD chunks before calling db.compactBlocks().
This is to ensure that if we need to write a block then it happens ASAP, otherwise memory usage might keep growing.

But what can also happen is that we don't need to write any block, we start db.compactBlocks(),
compaction takes hours, and in the meantime HEAD needs to write out chunks to a block.

This can be especially problematic if, for example, you run Thanos sidecar that's uploading block,
which requires that compactions are disabled. Then you disable Thanos sidecar and re-enable compactions.
When db.compactBlocks() is finally called it might have a huge number of blocks to compact, which might
take a very long time, during which HEAD cannot write out chunks to a new block.
In such case memory usage will keep growing until either:
- compactions are finally finished and HEAD can write a block
- we run out of memory and Prometheus gets OOM-killed

This change adds a check for pending HEAD block writes inside db.compactBlocks(), so that
we bail out early if there are still compactions to run, but we also need to write a new
block.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
Copy link
Collaborator

@machine424 machine424 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, could you share more info? how many blocks are there at the beginning, time range (2h I suppose), what's making block compactions take more than 2h (slow disk, cpu, it's normal?), if you have some graphs you could share...

// If we have a lot of blocks to compact the whole process might take
// long enough that we end up with a HEAD block that needs to be written.
// Check if that's the case and stop compactions early.
if db.head.compactable() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add this at the end of the iteration to make sure block compaction has more chances to run: at least once per cycle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also add a regression test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add this at the end of the iteration to make sure block compaction has more chances to run: at least once per cycle.

I disagree, I think it’s better to never run any compaction than to never write a block. Writing the block should be highest priority and there’s no way to know how long any compaction will take.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also add a regression test.

I would too, but I’m not sure how to write such test without just replicating implementation details there. Any ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with putting it here, but as per regression test -- what about using Compactor interface which would have a mocked e.g. Plan or Compactmethod code that runs once and then second run makes the head compactable and we expect the compaction only running 2 times not 3? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with putting it here, but as per regression test -- what about using Compactor interface which would have a mocked e.g. Plan or Compactmethod code that runs once and then second run makes the head compactable and we expect the compaction only running 2 times not 3? 🤔

Good idea, I've added a mock compactor that always finds something to compact, but after a few cycles it tricks the HEAD into thinking it needs to be compacted.

@machine424
Copy link
Collaborator

I disagree, I think it’s better to never run any compaction than to never write a block. Writing the block should be highest priority and there’s no way to know how long any compaction will take.

Perhaps in other setups or for different use cases, it's the opposite. They might depend on block compactions being prioritized, and we wouldn’t want to disrupt that.
Maintaining the check at the beginning won’t prevent scenarios where the Head becomes compactable while that lengthy block compaction is underway.

Since this hasn’t been reported before (Didn't find an issue), we need to determine if the situation you described is common enough to justify a change in the default behavior or the introduction of a flag.

I would too, but I’m not sure how to write such test without just replicating implementation details there. Any ideas?

I'd initialize a DB with blocks (createBlock may help) that can be merged, make the Head compactable, then call compactBlocks and check.

@prymitive
Copy link
Contributor Author

Perhaps in other setups or for different use cases, it's the opposite. They might depend on block compactions being prioritized, and we wouldn’t want to disrupt that. Maintaining the check at the beginning won’t prevent scenarios where the Head becomes compactable while that lengthy block compaction is underway.

Since this hasn’t been reported before (Didn't find an issue), we need to determine if the situation you described is common enough to justify a change in the default behavior or the introduction of a flag.

I think this is overcomplicating a very simple (IMHO) issue.
There already is a check before compaction start -

if !db.head.compactable() {
- so the intention is to skip compactions if there is a block to write.
What's not accounted for is the fact that compactions might happen for a while in a loop that's never interrupted. This change adds that missing check.
IMHO it doesn't matter how often does it happen, it can happen, compactions can take any amount of time. At the same time compactions are not a critical feature, they are simply a way to deduplicate some on-disk data. I don't think anyone relies on compactions happening. I wouldn't go down the rabbit hole of adding flags here.

I'd initialize a DB with blocks (createBlock may help) that can be merged, make the Head compactable, then call compactBlocks and check.

I don't think it's as simple as that.
You need a HEAD that doesn't need to write a block, then a call compactBlocks() must start compaction, then (while compactBlocks() is running) HEAD must suddenly start to be in a need of a block write. All of that is time sensitive.

@machine424
Copy link
Collaborator

machine424 commented Mar 18, 2024

I think this is overcomplicating a very simple (IMHO) issue.

I'm simply trying to make sure it's a real/common "issue" that justifies the change. As I mentioned, it's the first time I hear about this.
I also want to see if the change might disrupt some existing use cases. I was under the impression that reviews were intended for such discussions ;)

Let's see what the other maintainers think about this.


There already is a check before compaction start -

I don't think the check you mentioned has the same purpose, for me it's just there to break the loop: run as many Head compactions as needed until the head is no longer compactable.

I don't think block compactions are only meant to deduplicate data.

I don't think it's as simple as that.
You need a HEAD that doesn't need to write a block, then a call compactBlocks() must start compaction, then (while compactBlocks() is running) HEAD must suddenly start to be in a need of a block write. All of that is time sensitive.

I believe what’s important is the behavior when the Head is compactable. Ensuring that compactBlocks runs no block compactions when the head is compactable can still serve as a good regression test. If you wish, you can even add a ‘positive control’ case test to confirm that compactBlocks does compact when the head isn’t compactable. If one day compactBlocks disappears, we can eliminate the test, but at least if the logic inside it changes, the change will help documented. (The proof that I'm not against the change if justified, is that I'm discussing how to test it :))

Copy link
Member

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, the fundamental change here is prioritising head compactions over multi-block compactions. Which seems to me the correct choice.

The main thing I might add is to make the behaviour more observable, either in log lines or metrics.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Hello from the bug scrub meeting)

LGTM mod tests ideally (see @bboreham comment)

// If we have a lot of blocks to compact the whole process might take
// long enough that we end up with a HEAD block that needs to be written.
// Check if that's the case and stop compactions early.
if db.head.compactable() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with putting it here, but as per regression test -- what about using Compactor interface which would have a mocked e.g. Plan or Compactmethod code that runs once and then second run makes the head compactable and we expect the compaction only running 2 times not 3? 🤔

Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
@prymitive
Copy link
Contributor Author

For me, the fundamental change here is prioritising head compactions over multi-block compactions. Which seems to me the correct choice.

The main thing I might add is to make the behaviour more observable, either in log lines or metrics.

I've added a log, I don't think this needs a dedicated metric.

@bboreham bboreham merged commit 277f04f into prometheus:main Apr 7, 2024
24 checks passed
@prymitive prymitive deleted the compact branch May 9, 2024 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants