Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert if there are no available nodes to run VMs #8906

Conversation

machadovilaca
Copy link
Member

@machadovilaca machadovilaca commented Dec 5, 2022

What this PR does / why we need it:

  • Adds a new metric kubevirt_node_virtualization_status that represents if a node is available to run VMs or not

  • Adds a new alert to warn users when no node is available to run VMs

  • Adds and e2e test for the new alert

jira-ticket: CNV-23338

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

Alert if there are no available nodes to run VMs

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Dec 5, 2022
@machadovilaca
Copy link
Member Author

/hold

@kubevirt-bot kubevirt-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. area/monitoring labels Dec 5, 2022
@machadovilaca machadovilaca force-pushed the alert-if-worker-nodes-dont-have-virtualization-extensions branch 3 times, most recently from 7cef160 to d821560 Compare December 12, 2022 10:48
@machadovilaca
Copy link
Member Author

/retest

1 similar comment
@machadovilaca
Copy link
Member Author

/retest

@machadovilaca machadovilaca force-pushed the alert-if-worker-nodes-dont-have-virtualization-extensions branch from d821560 to d5c081c Compare December 19, 2022 11:45
@machadovilaca machadovilaca force-pushed the alert-if-worker-nodes-dont-have-virtualization-extensions branch from d5c081c to 6d2b1ae Compare January 4, 2023 00:36
@machadovilaca
Copy link
Member Author

/retest

@machadovilaca
Copy link
Member Author

/unhold

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 4, 2023
@machadovilaca
Copy link
Member Author

/retest

@machadovilaca machadovilaca force-pushed the alert-if-worker-nodes-dont-have-virtualization-extensions branch from add82fa to 5655745 Compare January 5, 2023 10:57
@machadovilaca
Copy link
Member Author

/retest

Copy link
Contributor

@enp0s3 enp0s3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@machadovilaca Hi. Node informers are expensive from performance POV, since they've being updated a lot. Please consider to change the logic to get the node list using direct API call. This call will be initiated each time the collector would be polled.

@xpivarc
Copy link
Member

xpivarc commented Jan 9, 2023

@machadovilaca Hi. Node informers are expensive from performance POV, since they've being updated a lot. Please consider to change the logic to get the node list using direct API call. This call will be initiated each time the collector would be polled.

+1 I would expect Kuberentes to expose some node metrics and you might want to build on it.

@machadovilaca machadovilaca force-pushed the alert-if-worker-nodes-dont-have-virtualization-extensions branch from 5655745 to dd4e434 Compare January 9, 2023 19:03
@machadovilaca
Copy link
Member Author

great idea @xpivarc. I updated the PR to work with Prometheus node metrics

@machadovilaca
Copy link
Member Author

/retest

Copy link
Member

@xpivarc xpivarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a nice generic alert. Left one comment

@machadovilaca machadovilaca force-pushed the alert-if-worker-nodes-dont-have-virtualization-extensions branch from dd4e434 to dc4a8f5 Compare January 17, 2023 11:55
@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 3, 2023
@machadovilaca machadovilaca force-pushed the alert-if-worker-nodes-dont-have-virtualization-extensions branch from e787ce3 to 48a36a4 Compare February 7, 2023 11:08
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 7, 2023
@machadovilaca machadovilaca force-pushed the alert-if-worker-nodes-dont-have-virtualization-extensions branch from 48a36a4 to fd2ce8b Compare February 7, 2023 11:37
Copy link
Member

@xpivarc xpivarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you squash fd2ce8b and 7fc30ae and a1fd330 respectively?
Furthermore, I left a comment about the logic.

tests/monitoring/monitoring.go Outdated Show resolved Hide resolved
tests/monitoring/monitoring.go Outdated Show resolved Hide resolved
@machadovilaca machadovilaca force-pushed the alert-if-worker-nodes-dont-have-virtualization-extensions branch 2 times, most recently from 2e26cdb to 9abea30 Compare February 14, 2023 16:39
@machadovilaca machadovilaca force-pushed the alert-if-worker-nodes-dont-have-virtualization-extensions branch from e6c3b7d to 72564de Compare February 15, 2023 16:00
@machadovilaca
Copy link
Member Author

/retest

hack/prom-rule-ci/prom-rules-tests.yaml Show resolved Hide resolved
pkg/monitoring/configuration/prometheus.go Outdated Show resolved Hide resolved
tests/monitoring/monitoring.go Outdated Show resolved Hide resolved
tests/monitoring/monitoring.go Outdated Show resolved Hide resolved
@machadovilaca machadovilaca force-pushed the alert-if-worker-nodes-dont-have-virtualization-extensions branch 2 times, most recently from fe5a07d to cbb6e01 Compare February 23, 2023 19:57
Signed-off-by: João Vilaça <jvilaca@redhat.com>
@machadovilaca machadovilaca force-pushed the alert-if-worker-nodes-dont-have-virtualization-extensions branch from cbb6e01 to d3b69da Compare March 1, 2023 11:03
Signed-off-by: João Vilaça <jvilaca@redhat.com>
@machadovilaca machadovilaca force-pushed the alert-if-worker-nodes-dont-have-virtualization-extensions branch from d3b69da to a17c576 Compare March 1, 2023 12:16
@machadovilaca
Copy link
Member Author

/retest

@xpivarc
Copy link
Member

xpivarc commented Mar 1, 2023

/lgtm
Thanks for bearing with me :)

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 1, 2023
@machadovilaca
Copy link
Member Author

/retest

1 similar comment
@machadovilaca
Copy link
Member Author

/retest

@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented Mar 2, 2023

@machadovilaca: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-e2e-k8s-1.22-sig-monitoring 7cef160 link true /test pull-kubevirt-e2e-k8s-1.22-sig-monitoring
pull-kubevirt-e2e-k8s-1.23-operator 5655745 link true /test pull-kubevirt-e2e-k8s-1.23-operator
pull-kubevirt-fossa a17c576 link false /test pull-kubevirt-fossa

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kubevirt-bot kubevirt-bot merged commit f90dee8 into kubevirt:main Mar 2, 2023
@machadovilaca machadovilaca deleted the alert-if-worker-nodes-dont-have-virtualization-extensions branch March 2, 2023 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/monitoring dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants