Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compactor: mark corrupted blocks for no compaction to avoid blocking #6588

Merged
merged 12 commits into from
Nov 27, 2023

Conversation

ortuman
Copy link
Contributor

@ortuman ortuman commented Nov 7, 2023

What this PR does

This PR changes the compactor's behavior to mark blocks as non-compact if any sort of critical issue is detected during block health check, in order to prevent the compactor from getting blocked in future runs and thus avoid widespread impact on the read path performance.

Which issue(s) this PR fixes or relates to

Fixes N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@ortuman ortuman added enhancement New feature or request component/compactor labels Nov 7, 2023
@ortuman ortuman requested a review from a team as a code owner November 7, 2023 14:27
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think increasing the cortex_compactor_blocks_marked_for_no_compaction_total metric and then alerting on the increase is very unreliable way of notifying us about this problem. This will be a short blip, that autoresolves quickly, so many people will just ignore it.

In this PR, we should update the alert to work with new "critical" reason (actually, let's just make it general to alert on any increase of cortex_compactor_blocks_marked_for_no_compaction_total).

In the next PR I would suggest that we introduce new metric to keep track how many blocks with non-compact marker there are in total, and fire an alert as long as it's not 0. To make that work, I would suggest to include non-compact marks in bucket index, so that we don't need to refetch them on every blocks scan.

pkg/compactor/bucket_compactor.go Outdated Show resolved Hide resolved
pkg/compactor/bucket_compactor.go Outdated Show resolved Hide resolved
pkg/compactor/bucket_compactor.go Outdated Show resolved Hide resolved
pkg/compactor/bucket_compactor.go Outdated Show resolved Hide resolved
@pstibrany
Copy link
Member

Thank you for working on this. PR looks good, I've left some minor comments. I think we should rethink how we notify operators about this though (see my previous comment).

@ortuman ortuman force-pushed the ortuman/mark-unhealthy-blocks-for-no-compaction branch from c99cb18 to 5d2384a Compare November 13, 2023 09:10
@ortuman ortuman force-pushed the ortuman/mark-unhealthy-blocks-for-no-compaction branch from dac612f to 0e8728b Compare November 13, 2023 09:19
@ortuman ortuman requested a review from a team as a code owner November 13, 2023 09:57
@ortuman ortuman force-pushed the ortuman/mark-unhealthy-blocks-for-no-compaction branch from 9c4ef9c to e5bca8d Compare November 13, 2023 10:04
@ortuman ortuman force-pushed the ortuman/mark-unhealthy-blocks-for-no-compaction branch 3 times, most recently from f419d6f to bbbee24 Compare November 13, 2023 11:28
@ortuman
Copy link
Contributor Author

ortuman commented Nov 13, 2023

@pstibrany @aldernero this is ready for review again.

Also, following Peter's suggestion in comment #6588 (review), I've renamed and updated MimirCompactorSkippedBlocksWithOutOfOrderChunks to MimirCompactorSkippedUnhealthyBlocks to alert on any increase in cortex_compactor_blocks_marked_for_no_compaction_total. Once this PR gets merged, I'll proceed to include the new suggested metric and adjust the alert accordingly.

Copy link
Contributor

@aldernero aldernero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks!

Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, lgtm with some minor comments.

pkg/compactor/bucket_compactor.go Show resolved Hide resolved
pkg/compactor/bucket_compactor.go Show resolved Hide resolved
operations/mimir-mixin/alerts/compactor.libsonnet Outdated Show resolved Hide resolved
operations/mimir-mixin/alerts/compactor.libsonnet Outdated Show resolved Hide resolved
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
ref: #6588 (comment)

Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
ref: #6588 (comment)

Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
…ppedUnhealthyBlocks`

ref: #6588 (review)

Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
ref: #6588 (comment)

Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
@ortuman ortuman force-pushed the ortuman/mark-unhealthy-blocks-for-no-compaction branch from d99d9a0 to aa8f52e Compare November 27, 2023 09:08
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
@ortuman ortuman merged commit e17dde3 into main Nov 27, 2023
30 checks passed
@ortuman ortuman deleted the ortuman/mark-unhealthy-blocks-for-no-compaction branch November 27, 2023 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants