Bug 1827585: Add controller that watches leader changes and capture disk metrics #313

mfojtik · 2020-04-17T10:22:49Z

The controller added here will watch the etcd_server_leader_changes_seen_total metrics and get the value for last 5 minutes. If leader changes are seen (value>0), then it will query prometheus to get etcd_disk_wal_fsync_duration_seconds_bucket for last 5 minutes.

This is then reported as Warning event for etcd operator. In future this can become a degraded condition.

mfojtik · 2020-04-17T10:39:59Z

@deads2k @hexfusion lets finish this :-)

mfojtik · 2020-04-17T10:40:39Z

only last commit has the change

mfojtik · 2020-04-17T14:16:28Z

/retest

mfojtik · 2020-04-17T15:49:45Z

@stevekuznetsov any clues about what is happening here?

EDIT: nvmd, image imports

hexfusion · 2020-04-17T19:46:15Z

/retest

hexfusion · 2020-04-17T22:03:44Z

/skip
/retest

hexfusion · 2020-04-21T09:43:54Z

/retest

mfojtik · 2020-04-23T09:09:16Z

@hexfusion @deads2k @alaypatel07 this is now ready for review.

mfojtik · 2020-04-23T12:27:53Z

/retest

openshift-bot · 2020-04-23T18:16:46Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-23T18:21:51Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-23T18:35:30Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-23T19:52:12Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-23T20:31:12Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-23T21:10:39Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-23T23:20:10Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-24T01:17:18Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-24T02:48:17Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-24T03:27:11Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-24T04:33:24Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-24T05:11:12Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-24T05:37:09Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-24T06:54:45Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-24T07:20:40Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-24T07:59:37Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-24T08:12:52Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-04-24T08:38:41Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot · 2020-04-24T09:06:37Z

@mfojtik: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-azure	`c578cb2`	link	`/test e2e-azure`
ci/prow/e2e-aws-disruptive	`c578cb2`	link	`/test e2e-aws-disruptive`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2020-04-24T09:17:38Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot · 2020-04-24T09:21:46Z

@mfojtik: This pull request references Bugzilla bug 1827585, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.5.0) matches configured target release for branch (4.5.0)
bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1827585: Add controller that watches leader changes and capture disk metrics

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2020-04-24T09:24:55Z

@mfojtik: All pull requests linked via external trackers have merged: openshift/cluster-etcd-operator#313. Bugzilla bug 1827585 has been moved to the MODIFIED state.

In response to this:

Bug 1827585: Add controller that watches leader changes and capture disk metrics

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

retroflexer · 2020-05-26T13:28:29Z

/cherrypick release-4.4

openshift-cherrypick-robot · 2020-05-26T13:28:34Z

@retroflexer: #313 failed to apply on top of branch "release-4.4":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	go.mod
M	vendor/modules.txt
Falling back to patching base and 3-way merge...
Auto-merging vendor/modules.txt
Auto-merging go.mod
CONFLICT (content): Merge conflict in go.mod
Patch failed at 0001 bump(*): add prometheus api

In response to this:

/cherrypick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 17, 2020

openshift-ci-robot requested review from hexfusion and soltysh April 17, 2020 10:23

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 17, 2020

mfojtik force-pushed the metrics-observe-controller branch 3 times, most recently from c94aa0f to 77adb97 Compare April 17, 2020 10:35

mfojtik mentioned this pull request Apr 17, 2020

WIP: Proof metric observer to observe fsync prometheus metrics #206

Closed

mfojtik force-pushed the metrics-observe-controller branch 3 times, most recently from 1e91117 to 5f0d5e9 Compare April 17, 2020 12:50

mfojtik force-pushed the metrics-observe-controller branch from 5f0d5e9 to 4cf5c00 Compare April 21, 2020 07:07

mfojtik force-pushed the metrics-observe-controller branch from 4cf5c00 to bf1b940 Compare April 21, 2020 10:04

bump(*): add prometheus api

147c078

mfojtik force-pushed the metrics-observe-controller branch 2 times, most recently from 0bdf9ef to 2905e02 Compare April 23, 2020 09:05

mfojtik changed the title ~~WIP: Add fsync metric controller~~ Add controller that watches leader changes and capture disk metrics Apr 23, 2020

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 23, 2020

mfojtik force-pushed the metrics-observe-controller branch from 2905e02 to ebd5de9 Compare April 23, 2020 09:08

operator: add fsync controller

c578cb2

mfojtik force-pushed the metrics-observe-controller branch from ebd5de9 to c578cb2 Compare April 23, 2020 10:49

mfojtik changed the title ~~Add controller that watches leader changes and capture disk metrics~~ Bug 1827585: Add controller that watches leader changes and capture disk metrics Apr 24, 2020

openshift-ci-robot added bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Apr 24, 2020

openshift-merge-robot merged commit 3d9d95a into openshift:master Apr 24, 2020

retroflexer mentioned this pull request May 26, 2020

Bug 1840150: operator: add fsync controller #362

Closed

retroflexer mentioned this pull request Jun 3, 2020

Bug 1840150: operator: add fsync controller #369

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1827585: Add controller that watches leader changes and capture disk metrics #313

Bug 1827585: Add controller that watches leader changes and capture disk metrics #313

mfojtik commented Apr 17, 2020 •

edited

mfojtik commented Apr 17, 2020

mfojtik commented Apr 17, 2020

mfojtik commented Apr 17, 2020

mfojtik commented Apr 17, 2020 •

edited

hexfusion commented Apr 17, 2020

hexfusion commented Apr 17, 2020

hexfusion commented Apr 21, 2020

mfojtik commented Apr 23, 2020

mfojtik commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-ci-robot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-ci-robot commented Apr 24, 2020

openshift-ci-robot commented Apr 24, 2020

retroflexer commented May 26, 2020

openshift-cherrypick-robot commented May 26, 2020

Bug 1827585: Add controller that watches leader changes and capture disk metrics #313

Bug 1827585: Add controller that watches leader changes and capture disk metrics #313

Conversation

mfojtik commented Apr 17, 2020 • edited

mfojtik commented Apr 17, 2020

mfojtik commented Apr 17, 2020

mfojtik commented Apr 17, 2020

mfojtik commented Apr 17, 2020 • edited

hexfusion commented Apr 17, 2020

hexfusion commented Apr 17, 2020

hexfusion commented Apr 21, 2020

mfojtik commented Apr 23, 2020

mfojtik commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 23, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-ci-robot commented Apr 24, 2020

openshift-bot commented Apr 24, 2020

openshift-ci-robot commented Apr 24, 2020

openshift-ci-robot commented Apr 24, 2020

retroflexer commented May 26, 2020

openshift-cherrypick-robot commented May 26, 2020

mfojtik commented Apr 17, 2020 •

edited

mfojtik commented Apr 17, 2020 •

edited