-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compactor: mark corrupted blocks for no compaction to avoid blocking #6588
compactor: mark corrupted blocks for no compaction to avoid blocking #6588
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think increasing the cortex_compactor_blocks_marked_for_no_compaction_total
metric and then alerting on the increase is very unreliable way of notifying us about this problem. This will be a short blip, that autoresolves quickly, so many people will just ignore it.
In this PR, we should update the alert to work with new "critical" reason (actually, let's just make it general to alert on any increase of cortex_compactor_blocks_marked_for_no_compaction_total
).
In the next PR I would suggest that we introduce new metric to keep track how many blocks with non-compact marker there are in total, and fire an alert as long as it's not 0. To make that work, I would suggest to include non-compact marks in bucket index, so that we don't need to refetch them on every blocks scan.
Thank you for working on this. PR looks good, I've left some minor comments. I think we should rethink how we notify operators about this though (see my previous comment). |
c99cb18
to
5d2384a
Compare
dac612f
to
0e8728b
Compare
9c4ef9c
to
e5bca8d
Compare
f419d6f
to
bbbee24
Compare
@pstibrany @aldernero this is ready for review again. Also, following Peter's suggestion in comment #6588 (review), I've renamed and updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, lgtm with some minor comments.
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
ref: #6588 (comment) Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
ref: #6588 (comment) Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
…ppedUnhealthyBlocks` ref: #6588 (review) Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
ref: #6588 (comment) Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
d99d9a0
to
aa8f52e
Compare
Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com>
What this PR does
This PR changes the compactor's behavior to mark blocks as non-compact if any sort of critical issue is detected during block health check, in order to prevent the compactor from getting blocked in future runs and thus avoid widespread impact on the read path performance.
Which issue(s) this PR fixes or relates to
Fixes N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]