Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storcon: add metric for long running reconciles #9207

Merged
merged 4 commits into from
Oct 2, 2024

Conversation

VladLazar
Copy link
Contributor

@VladLazar VladLazar commented Sep 30, 2024

Problem

We don't have an alert for long running reconciles. Stuck reconciles are problematic
as we've seen in a recent incident.

Summary of changes

Add a new metric storage_controller_reconcile_long_running_total with labels: {tenant_id, shard_number, seq}.
The metric is removed after the long running reconcile finishes. These events should be rare, so we won't break
the bank on cardinality.

Related #9150

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@VladLazar VladLazar requested a review from a team as a code owner September 30, 2024 17:01
@VladLazar VladLazar requested review from arssher, jcsp and yliang412 and removed request for arssher September 30, 2024 17:01
Copy link

github-actions bot commented Sep 30, 2024

5022 tests run: 4864 passed, 0 failed, 158 skipped (full report)


Flaky tests (3)

Postgres 17

Postgres 14

Code coverage* (full report)

  • functions: 31.3% (7492 of 23902 functions)
  • lines: 49.6% (60118 of 121310 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
966d775 at 2024-10-01T09:18:27.981Z :recycle:

Copy link
Contributor

@yliang412 yliang412 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VladLazar VladLazar changed the title storcon: add alert for long running reconciles storcon: add metric for long running reconciles Oct 2, 2024
@VladLazar VladLazar merged commit 38a8dca into main Oct 2, 2024
79 checks passed
@VladLazar VladLazar deleted the vlad/long-running-reconcile-alert branch October 2, 2024 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants