Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update VM controller error handling #850

Merged

Conversation

machadovilaca
Copy link
Member

@machadovilaca machadovilaca commented Jan 22, 2024

What this PR does / why we need it:

Sometimes some resources might not exist yet and we see a huge amount of logs stating that an error created the VM metrics.

This PR:

  • adds 5 seconds backoff between failed reconcile loops of VM controller to reduce the number of logs
  • removes 'error' wording from logs when the VM controller does not create the VM-related metrics, since it might not represent more serious problems, and might deceive users
  • skips metric creation when VM-related resources are still in Provisioning status

Which issue(s) this PR fixes:

jira-ticket: https://issues.redhat.com/browse/CNV-35806

Special notes for your reviewer:

Release note:

Update VM controller error handling

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jan 22, 2024
@akrejcir
Copy link
Collaborator

akrejcir commented Jan 22, 2024

If the PVC or PV does not exist, do we want to try checking again? Will these resources be created eventually?

@machadovilaca
Copy link
Member Author

If the PVC or PV does not exist, do we want to try checking again? Will these resources be created eventually?

The PVC is associated to the VM, is just not created yet, the backoff ensure we will try the reconcile loop again in a few seconds

@akrejcir
Copy link
Collaborator

akrejcir commented Jan 22, 2024

Instead of retrying later, can you check the .status.printableStatus field?
The status Provisioning is set when not all resources are created yet:
https://github.com/kubevirt/kubevirt/blob/9d9885c655d465657f515c8f73c23b00c9f16898/staging/src/kubevirt.io/api/core/v1/types.go#L1511-L1513

If a VM has this printableStatus, it can be skipped. Later, if the .status filed changes, the VM will trigger reconciliation again.
This way we will avoid reconciliation looping while waiting for the resources to be created.

@machadovilaca
Copy link
Member Author

machadovilaca commented Jan 23, 2024

updated @akrejcir

Copy link
Collaborator

@akrejcir akrejcir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Can you add more text to the commit message?

controllers/vm_controller.go Outdated Show resolved Hide resolved
- adds 5 seconds backoff between failed reconcile loops of VM controller
- removes 'error' wording from logs when VM controller does not create the VM related metrics
- skips metric creation when VM related resources are still in Provisioning status

Signed-off-by: João Vilaça <jvilaca@redhat.com>
Copy link

sonarcloud bot commented Jan 23, 2024

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@machadovilaca
Copy link
Member Author

Thanks. Can you add more text to the commit message?

done!

@machadovilaca machadovilaca changed the title Add VM controller backoff and remove 'error' wording from logging Update VM controller error handling Jan 23, 2024
@machadovilaca
Copy link
Member Author

/retest

@akrejcir
Copy link
Collaborator

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akrejcir

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 24, 2024
@avlitman
Copy link
Contributor

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 24, 2024
@kubevirt-bot kubevirt-bot merged commit 37d5a36 into kubevirt:main Jan 24, 2024
11 of 12 checks passed
@machadovilaca machadovilaca deleted the add_backoff_for_vm_controller branch January 24, 2024 13:52
@machadovilaca
Copy link
Member Author

/cherry-pick release-v0.19

@kubevirt-bot
Copy link
Contributor

@machadovilaca: new pull request created: #857

In response to this:

/cherry-pick release-v0.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants