Stop compactions if there's a block to write #13754

prymitive · 2024-03-12T15:10:46Z

db.Compact() checks if there's a block to write with HEAD chunks before calling db.compactBlocks(). This is to ensure that if we need to write a block then it happens ASAP, otherwise memory usage might keep growing.

But what can also happen is that we don't need to write any block, we start db.compactBlocks(), compaction takes hours, and in the meantime HEAD needs to write out chunks to a block.

This can be especially problematic if, for example, you run Thanos sidecar that's uploading block, which requires that compactions are disabled. Then you disable Thanos sidecar and re-enable compactions. When db.compactBlocks() is finally called it might have a huge number of blocks to compact, which might take a very long time, during which HEAD cannot write out chunks to a new block. In such case memory usage will keep growing until either:

compactions are finally finished and HEAD can write a block
we run out of memory and Prometheus gets OOM-killed

This change adds a check for pending HEAD block writes inside db.compactBlocks(), so that we bail out early if there are still compactions to run, but we also need to write a new block.

db.Compact() checks if there's a block to write with HEAD chunks before calling db.compactBlocks(). This is to ensure that if we need to write a block then it happens ASAP, otherwise memory usage might keep growing. But what can also happen is that we don't need to write any block, we start db.compactBlocks(), compaction takes hours, and in the meantime HEAD needs to write out chunks to a block. This can be especially problematic if, for example, you run Thanos sidecar that's uploading block, which requires that compactions are disabled. Then you disable Thanos sidecar and re-enable compactions. When db.compactBlocks() is finally called it might have a huge number of blocks to compact, which might take a very long time, during which HEAD cannot write out chunks to a new block. In such case memory usage will keep growing until either: - compactions are finally finished and HEAD can write a block - we run out of memory and Prometheus gets OOM-killed This change adds a check for pending HEAD block writes inside db.compactBlocks(), so that we bail out early if there are still compactions to run, but we also need to write a new block. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

machine424

I'm curious, could you share more info? how many blocks are there at the beginning, time range (2h I suppose), what's making block compactions take more than 2h (slow disk, cpu, it's normal?), if you have some graphs you could share...

machine424 · 2024-03-13T09:58:44Z

tsdb/db.go

+		// If we have a lot of blocks to compact the whole process might take
+		// long enough that we end up with a HEAD block that needs to be written.
+		// Check if that's the case and stop compactions early.
+		if db.head.compactable() {


I'd add this at the end of the iteration to make sure block compaction has more chances to run: at least once per cycle.

I'd also add a regression test.

I'd add this at the end of the iteration to make sure block compaction has more chances to run: at least once per cycle.

I disagree, I think it’s better to never run any compaction than to never write a block. Writing the block should be highest priority and there’s no way to know how long any compaction will take.

I'd also add a regression test.

I would too, but I’m not sure how to write such test without just replicating implementation details there. Any ideas?

Happy with putting it here, but as per regression test -- what about using Compactor interface which would have a mocked e.g. Plan or Compactmethod code that runs once and then second run makes the head compactable and we expect the compaction only running 2 times not 3? 🤔

Happy with putting it here, but as per regression test -- what about using Compactor interface which would have a mocked e.g. Plan or Compactmethod code that runs once and then second run makes the head compactable and we expect the compaction only running 2 times not 3? 🤔

Good idea, I've added a mock compactor that always finds something to compact, but after a few cycles it tricks the HEAD into thinking it needs to be compacted.

machine424 · 2024-03-18T11:04:29Z

I disagree, I think it’s better to never run any compaction than to never write a block. Writing the block should be highest priority and there’s no way to know how long any compaction will take.

Perhaps in other setups or for different use cases, it's the opposite. They might depend on block compactions being prioritized, and we wouldn’t want to disrupt that.
Maintaining the check at the beginning won’t prevent scenarios where the Head becomes compactable while that lengthy block compaction is underway.

Since this hasn’t been reported before (Didn't find an issue), we need to determine if the situation you described is common enough to justify a change in the default behavior or the introduction of a flag.

I would too, but I’m not sure how to write such test without just replicating implementation details there. Any ideas?

I'd initialize a DB with blocks (createBlock may help) that can be merged, make the Head compactable, then call compactBlocks and check.

prymitive · 2024-03-18T11:44:52Z

Perhaps in other setups or for different use cases, it's the opposite. They might depend on block compactions being prioritized, and we wouldn’t want to disrupt that. Maintaining the check at the beginning won’t prevent scenarios where the Head becomes compactable while that lengthy block compaction is underway.

Since this hasn’t been reported before (Didn't find an issue), we need to determine if the situation you described is common enough to justify a change in the default behavior or the introduction of a flag.

I think this is overcomplicating a very simple (IMHO) issue.
There already is a check before compaction start -

prometheus/tsdb/db.go

Line 1161 in d0ecf9b

if !db.head.compactable() {

- so the intention is to skip compactions if there is a block to write.
What's not accounted for is the fact that compactions might happen for a while in a loop that's never interrupted. This change adds that missing check.
IMHO it doesn't matter how often does it happen, it can happen, compactions can take any amount of time. At the same time compactions are not a critical feature, they are simply a way to deduplicate some on-disk data. I don't think anyone relies on compactions happening. I wouldn't go down the rabbit hole of adding flags here.

I'd initialize a DB with blocks (createBlock may help) that can be merged, make the Head compactable, then call compactBlocks and check.

I don't think it's as simple as that.
You need a HEAD that doesn't need to write a block, then a call compactBlocks() must start compaction, then (while compactBlocks() is running) HEAD must suddenly start to be in a need of a block write. All of that is time sensitive.

machine424 · 2024-03-18T13:12:10Z

I think this is overcomplicating a very simple (IMHO) issue.

I'm simply trying to make sure it's a real/common "issue" that justifies the change. As I mentioned, it's the first time I hear about this.
I also want to see if the change might disrupt some existing use cases. I was under the impression that reviews were intended for such discussions ;)

Let's see what the other maintainers think about this.

There already is a check before compaction start -

I don't think the check you mentioned has the same purpose, for me it's just there to break the loop: run as many Head compactions as needed until the head is no longer compactable.

I don't think block compactions are only meant to deduplicate data.

I don't think it's as simple as that.
You need a HEAD that doesn't need to write a block, then a call compactBlocks() must start compaction, then (while compactBlocks() is running) HEAD must suddenly start to be in a need of a block write. All of that is time sensitive.

I believe what’s important is the behavior when the Head is compactable. Ensuring that compactBlocks runs no block compactions when the head is compactable can still serve as a good regression test. If you wish, you can even add a ‘positive control’ case test to confirm that compactBlocks does compact when the head isn’t compactable. If one day compactBlocks disappears, we can eliminate the test, but at least if the logic inside it changes, the change will help documented. (The proof that I'm not against the change if justified, is that I'm discussing how to test it :))

bboreham

For me, the fundamental change here is prioritising head compactions over multi-block compactions. Which seems to me the correct choice.

The main thing I might add is to make the behaviour more observable, either in log lines or metrics.

bwplotka

(Hello from the bug scrub meeting)

LGTM mod tests ideally (see @bboreham comment)

bwplotka · 2024-03-26T11:52:57Z

tsdb/db.go

+		// If we have a lot of blocks to compact the whole process might take
+		// long enough that we end up with a HEAD block that needs to be written.
+		// Check if that's the case and stop compactions early.
+		if db.head.compactable() {


Happy with putting it here, but as per regression test -- what about using Compactor interface which would have a mocked e.g. Plan or Compactmethod code that runs once and then second run makes the head compactable and we expect the compaction only running 2 times not 3? 🤔

Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>

prymitive · 2024-03-28T11:12:04Z

For me, the fundamental change here is prioritising head compactions over multi-block compactions. Which seems to me the correct choice.

The main thing I might add is to make the behaviour more observable, either in log lines or metrics.

I've added a log, I don't think this needs a dedicated metric.

prymitive requested a review from jesusvazquez as a code owner March 12, 2024 15:10

machine424 reviewed Mar 13, 2024

View reviewed changes

bboreham approved these changes Mar 26, 2024

View reviewed changes

bwplotka approved these changes Mar 26, 2024

View reviewed changes

prymitive added 2 commits March 28, 2024 11:09

Log when we break from compactBlocks early

205416f

Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>

Add a test for compactBlocks

74cdf54

Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>

bboreham approved these changes Apr 7, 2024

View reviewed changes

bboreham merged commit 277f04f into prometheus:main Apr 7, 2024
24 checks passed

machine424 mentioned this pull request Apr 8, 2024

tsdb: allow to delay head compaction start time #12532

Open

4 tasks

jhalterman mentioned this pull request Apr 8, 2024

Merge upstream prometheus/prometheus at 6332248 grafana/mimir-prometheus#612

Merged

prymitive deleted the compact branch May 9, 2024 09:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop compactions if there's a block to write #13754

Stop compactions if there's a block to write #13754

prymitive commented Mar 12, 2024

machine424 left a comment

machine424 Mar 13, 2024

machine424 Mar 13, 2024

prymitive Mar 14, 2024

prymitive Mar 14, 2024

bwplotka Mar 26, 2024

prymitive Mar 28, 2024

machine424 commented Mar 18, 2024

prymitive commented Mar 18, 2024

machine424 commented Mar 18, 2024 •

edited

bboreham left a comment

bwplotka left a comment •

edited

bwplotka Mar 26, 2024

prymitive commented Mar 28, 2024

Stop compactions if there's a block to write #13754

Stop compactions if there's a block to write #13754

Conversation

prymitive commented Mar 12, 2024

machine424 left a comment

Choose a reason for hiding this comment

machine424 Mar 13, 2024

Choose a reason for hiding this comment

machine424 Mar 13, 2024

Choose a reason for hiding this comment

prymitive Mar 14, 2024

Choose a reason for hiding this comment

prymitive Mar 14, 2024

Choose a reason for hiding this comment

bwplotka Mar 26, 2024

Choose a reason for hiding this comment

prymitive Mar 28, 2024

Choose a reason for hiding this comment

machine424 commented Mar 18, 2024

prymitive commented Mar 18, 2024

machine424 commented Mar 18, 2024 • edited

bboreham left a comment

Choose a reason for hiding this comment

bwplotka left a comment • edited

Choose a reason for hiding this comment

bwplotka Mar 26, 2024

Choose a reason for hiding this comment

prymitive commented Mar 28, 2024

machine424 commented Mar 18, 2024 •

edited

bwplotka left a comment •

edited