Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1827585: Add controller that watches leader changes and capture disk metrics #313

Merged

Conversation

mfojtik
Copy link
Member

@mfojtik mfojtik commented Apr 17, 2020

The controller added here will watch the etcd_server_leader_changes_seen_total metrics and get the value for last 5 minutes. If leader changes are seen (value>0), then it will query prometheus to get etcd_disk_wal_fsync_duration_seconds_bucket for last 5 minutes.

This is then reported as Warning event for etcd operator. In future this can become a degraded condition.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 17, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 17, 2020
@mfojtik mfojtik force-pushed the metrics-observe-controller branch 3 times, most recently from c94aa0f to 77adb97 Compare April 17, 2020 10:35
@mfojtik
Copy link
Member Author

mfojtik commented Apr 17, 2020

@deads2k @hexfusion lets finish this :-)

@mfojtik
Copy link
Member Author

mfojtik commented Apr 17, 2020

only last commit has the change

@mfojtik mfojtik force-pushed the metrics-observe-controller branch 3 times, most recently from 1e91117 to 5f0d5e9 Compare April 17, 2020 12:50
@mfojtik
Copy link
Member Author

mfojtik commented Apr 17, 2020

/retest

@mfojtik
Copy link
Member Author

mfojtik commented Apr 17, 2020

@stevekuznetsov any clues about what is happening here?

EDIT: nvmd, image imports

@hexfusion
Copy link
Contributor

/retest

@hexfusion
Copy link
Contributor

/skip
/retest

@hexfusion
Copy link
Contributor

/retest

@mfojtik mfojtik force-pushed the metrics-observe-controller branch 2 times, most recently from 0bdf9ef to 2905e02 Compare April 23, 2020 09:05
@mfojtik mfojtik changed the title WIP: Add fsync metric controller Add controller that watches leader changes and capture disk metrics Apr 23, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 23, 2020
@mfojtik
Copy link
Member Author

mfojtik commented Apr 23, 2020

@hexfusion @deads2k @alaypatel07 this is now ready for review.

@mfojtik
Copy link
Member Author

mfojtik commented Apr 23, 2020

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

17 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link

@mfojtik: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-azure c578cb2 link /test e2e-azure
ci/prow/e2e-aws-disruptive c578cb2 link /test e2e-aws-disruptive

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@mfojtik mfojtik changed the title Add controller that watches leader changes and capture disk metrics Bug 1827585: Add controller that watches leader changes and capture disk metrics Apr 24, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Apr 24, 2020
@openshift-ci-robot
Copy link

@mfojtik: This pull request references Bugzilla bug 1827585, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.0) matches configured target release for branch (4.5.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1827585: Add controller that watches leader changes and capture disk metrics

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 3d9d95a into openshift:master Apr 24, 2020
@openshift-ci-robot
Copy link

@mfojtik: All pull requests linked via external trackers have merged: openshift/cluster-etcd-operator#313. Bugzilla bug 1827585 has been moved to the MODIFIED state.

In response to this:

Bug 1827585: Add controller that watches leader changes and capture disk metrics

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@retroflexer
Copy link
Contributor

/cherrypick release-4.4

@openshift-cherrypick-robot

@retroflexer: #313 failed to apply on top of branch "release-4.4":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	go.mod
M	vendor/modules.txt
Falling back to patching base and 3-way merge...
Auto-merging vendor/modules.txt
Auto-merging go.mod
CONFLICT (content): Merge conflict in go.mod
Patch failed at 0001 bump(*): add prometheus api

In response to this:

/cherrypick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants