Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alerts for etcd fsync duration #12266

Merged
merged 1 commit into from
Oct 9, 2020

Conversation

chaitanyaenr
Copy link
Contributor

This commit adds support to check the 99th percentile of the etcd
members fsync duration and fires a critical alert when it is greater
than 1 sec. The recommended fsync for etcd is 20 ms but there might
be scenarios where a user might be using bad disks for reasons. This
will make sure to let the user/admin know that it is critical for
etcd performance.

@chaitanyaenr
Copy link
Contributor Author

@hexfusion @jtaleric @mffiedler PTAL.

Copy link

@jtaleric jtaleric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

One suggestion for this commit or a net-new one -
Do we want to add to this commit maybe a less severity of etcd timings increasing? Versus just alerting when the wheels fall off?

@xiang90
Copy link
Contributor

xiang90 commented Sep 25, 2020

@chaitanyaenr

The commit message needs to be updated to Documentation/etcd-mixin/mixin.libsonnet: xxx

…uration

This commit adds support to check the 99th percentile of the etcd
members fsync duration and fires a critical alert when it is greater
than 1 sec. The recommended fsync for etcd is 20 ms but there might
be scenarios where a user might be using bad disks for reasons. This
will make sure to let the user/admin know that it is critical for
etcd performance.
@chaitanyaenr
Copy link
Contributor Author

@jtaleric There's already a warning alert which gets triggered when the fsync duration is seen to be more than 0.5 sec.

@xiang90 Updated the commit message like you suggested. PTAL.

@jtaleric
Copy link

jtaleric commented Oct 8, 2020

@hexfusion Can you provide a review please!

@hexfusion
Copy link
Contributor

lgtm thanks @chaitanyaenr

@hexfusion hexfusion merged commit e1bf097 into etcd-io:master Oct 9, 2020
@chaitanyaenr chaitanyaenr deleted the etcd_fsync_alert branch October 13, 2020 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants