Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix VirtualMachineCRCErrors not stop firing #742

Conversation

machadovilaca
Copy link
Member

@machadovilaca machadovilaca commented Nov 28, 2023

What this PR does / why we need it:

The controller keeps reporting old metric values even after the VM is deleted. This PR updates the metric name and labels so that we can set up the metric value to 0, and no longer trigger the alert on VM deletion

Reduces VirtualMachineCRCErrors operator impact to none

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Release note:

Fix VirtualMachineCRCErrors not stop firing

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Nov 28, 2023
Copy link

openshift-ci bot commented Nov 28, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@machadovilaca
Copy link
Member Author

/test all

@kubevirt-bot
Copy link
Contributor

@machadovilaca: No presubmit jobs available for kubevirt/ssp-operator@main

In response to this:

/test all

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@machadovilaca machadovilaca force-pushed the fix-VirtualMachineCRCErrors-not-stop-firing branch from 0256308 to ea42bff Compare November 29, 2023 13:31
Copy link

/cc sradco

@machadovilaca machadovilaca force-pushed the fix-VirtualMachineCRCErrors-not-stop-firing branch from ea42bff to 58ad69e Compare November 29, 2023 13:37
Copy link

/cc sradco

@machadovilaca
Copy link
Member Author

/test e2e-functests

@kubevirt-bot
Copy link
Contributor

@machadovilaca: No presubmit jobs available for kubevirt/ssp-operator@main

In response to this:

/test e2e-functests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@0xFelix 0xFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes quite a lot. Can you explain a little how the fix works and add the explanation also to the PR and commit descriptions?

tests/monitoring_test.go Outdated Show resolved Hide resolved
@machadovilaca machadovilaca force-pushed the fix-VirtualMachineCRCErrors-not-stop-firing branch from 58ad69e to 725d64c Compare December 4, 2023 10:53
Copy link

github-actions bot commented Dec 4, 2023

/cc sradco

@machadovilaca machadovilaca force-pushed the fix-VirtualMachineCRCErrors-not-stop-firing branch from 725d64c to 2a7b2f4 Compare December 4, 2023 10:57
Copy link

github-actions bot commented Dec 4, 2023

/cc sradco

@kubevirt-bot kubevirt-bot added size/L and removed size/M labels Dec 4, 2023
@machadovilaca
Copy link
Member Author

This PR changes quite a lot. Can you explain a little how the fix works and add the explanation also to the PR and commit descriptions?

@0xFelix The problem was that the controller kept reporting the metric for old VMs even after they were deleted. In the controller, we can understand when a VM is deleted, but at that moment we don't have access to the details present in the old labels (pv name, pv type, etc...). This PR changes the metric so that it only uses the VM name and namespace as labels, so that we can update its value, when a VM is deleted, to a no-error value. Alert expression is updated accordingly.

@machadovilaca machadovilaca marked this pull request as ready for review December 4, 2023 11:01
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 4, 2023
@machadovilaca
Copy link
Member Author

/retest

@machadovilaca machadovilaca force-pushed the fix-VirtualMachineCRCErrors-not-stop-firing branch from 10b0b31 to 47709c4 Compare January 4, 2024 18:21
Copy link

github-actions bot commented Jan 4, 2024

/cc sradco

@machadovilaca machadovilaca force-pushed the fix-VirtualMachineCRCErrors-not-stop-firing branch from 47709c4 to f74b983 Compare January 4, 2024 18:25
Copy link

github-actions bot commented Jan 4, 2024

/cc sradco

@machadovilaca machadovilaca force-pushed the fix-VirtualMachineCRCErrors-not-stop-firing branch from f74b983 to e18adc2 Compare January 4, 2024 18:48
Copy link

github-actions bot commented Jan 4, 2024

/cc sradco

@@ -58,7 +58,11 @@ func (r *VmReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Re
vm := kubevirtv1.VirtualMachine{}
if err := r.client.Get(ctx, req.NamespacedName, &vm); err != nil {
if errors.IsNotFound(err) {
// VM was deleted, so we can ignore it
// VM was deleted
vm.Name = req.Name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the VM was deleted, why do we add it to the metrics? Also it looks error-prone to call metrics.SetVmWithVolume with nil-pointers, as IIUC this func does not check for nil pointers sufficiently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • the issue was that the controller kept exposing the metric that the VM, although already deleted, as the problematic volume configuration
    this way when a VM is deleted, the controller no longer has a true value for the metric

  • I think the nil check in the function covers this case completely:

if pv == nil || pvc == nil {
  vmRbdVolume.WithLabelValues(vm.Name, vm.Namespace).Set(0)
  return
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func SetVmWithVolume(vm *kubevirtv1.VirtualMachine, pvc *k8sv1.PersistentVolumeClaim, pv *k8sv1.PersistentVolume) {

The function that is called looks different, than what you quoted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my mistake.

tests/monitoring_test.go Show resolved Hide resolved
The controller keeps reporting old metric values
even after the VM is deleted. This commit updates
the metric name and labels so that we can set up
the metric value to 0, and no longer trigger the
alert on VM deletion

Signed-off-by: João Vilaça <jvilaca@redhat.com>
Signed-off-by: João Vilaça <jvilaca@redhat.com>
@machadovilaca machadovilaca force-pushed the fix-VirtualMachineCRCErrors-not-stop-firing branch from e18adc2 to 7af6434 Compare January 8, 2024 14:31
Copy link

github-actions bot commented Jan 8, 2024

/cc sradco

Copy link

sonarcloud bot commented Jan 8, 2024

Quality Gate Failed Quality Gate failed

Failed conditions

3.4% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

Copy link
Member

@0xFelix 0xFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@@ -58,7 +58,11 @@ func (r *VmReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Re
vm := kubevirtv1.VirtualMachine{}
if err := r.client.Get(ctx, req.NamespacedName, &vm); err != nil {
if errors.IsNotFound(err) {
// VM was deleted, so we can ignore it
// VM was deleted
vm.Name = req.Name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my mistake.

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 0xFelix

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 9, 2024
@sradco
Copy link
Collaborator

sradco commented Jan 9, 2024

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 9, 2024
@0xFelix
Copy link
Member

0xFelix commented Jan 9, 2024

/retest

@kubevirt-bot kubevirt-bot merged commit 4f1a32a into kubevirt:main Jan 9, 2024
13 checks passed
@machadovilaca machadovilaca deleted the fix-VirtualMachineCRCErrors-not-stop-firing branch January 9, 2024 14:29
@0xFelix
Copy link
Member

0xFelix commented Jan 9, 2024

If you want this to be in 4.15 you need to cherry-pick it to release-v0.19.

@machadovilaca
Copy link
Member Author

/cherry-pick release-v0.19

@kubevirt-bot
Copy link
Contributor

@machadovilaca: new pull request created: #825

In response to this:

/cherry-pick release-v0.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@machadovilaca
Copy link
Member Author

/cherry-pick release-v0.18

@kubevirt-bot
Copy link
Contributor

@machadovilaca: new pull request created: #906

In response to this:

/cherry-pick release-v0.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants