Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve slow operations observability in safekeepers #8188

Merged
merged 1 commit into from
Jun 27, 2024
Merged

Conversation

petuhovskiy
Copy link
Member

@petuhovskiy petuhovskiy commented Jun 27, 2024

After #8022 was deployed to staging, I noticed many cases of timeouts. After inspecting the logs, I realized that some operations are taking ~20 seconds and they're doing while holding shared state lock. Usually it happens right after redeploy, because compute reconnections put high load on disks. This commit tries to improve observability around slow operations.

Non-observability changes:

  • TimelineState::finish_change now skips update if nothing has changed
  • wal_residence_guard() timeout is set to 30s

@petuhovskiy petuhovskiy requested a review from a team as a code owner June 27, 2024 15:58
@petuhovskiy petuhovskiy requested review from jcsp and arssher and removed request for jcsp June 27, 2024 15:58
Copy link

2940 tests run: 2823 passed, 0 failed, 117 skipped (full report)


Flaky tests (1)

Postgres 14

  • test_lr_with_slow_safekeeper: release

Code coverage* (full report)

  • functions: 32.6% (6896 of 21128 functions)
  • lines: 50.0% (53957 of 107939 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
53c0746 at 2024-06-27T16:45:54.462Z :recycle:

@petuhovskiy petuhovskiy merged commit 1d66ca7 into main Jun 27, 2024
64 checks passed
@petuhovskiy petuhovskiy deleted the sk-slow-ops branch June 27, 2024 17:39
petuhovskiy added a commit that referenced this pull request Jun 28, 2024
In #8188 I forgot to specify buckets for new operations metrics. This
commit fixes that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants