Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync VMI conditions to VM #6575

Merged
merged 5 commits into from Oct 26, 2021
Merged

Conversation

davidvossel
Copy link
Member

Previously, we hand picked a few VMI conditions to sync to the VM such as ready and paused. In this PR, all VMI conditions are now synced to the VM. This is done in an effort to streamline debugging efforts.

There is a new "printableStatus" field on the VM object that aggregates a VM's state across multiple objects (VMI, DV, PVC, Pod) into a human readable field. By syncing all VMI conditions to the VM, we further streamline debugging by bubbling up runtime conditions on the VM.

This means that the VM's "printableStatus" field gives us a quick high level status, and the VM's conditions will contain more detailed reasoning behind why the status exists.

VM controller now syncs VMI conditions to corresponding VM object

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Oct 11, 2021
@kubevirt-bot kubevirt-bot added size/L kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Oct 11, 2021
@davidvossel
Copy link
Member Author

/cc @rmohr @zcahana

how do you all feel about this? The idea here is to begin consolidating debug info on the VM object by bubbling up conditions from the lower layers.

Copy link
Contributor

@zcahana zcahana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @davidvossel, I believe this is generally a good direction, however there are two issues we need to consider:

  1. The VMI's Provisioning condition is set when the VMI is pending and a temp pod is created to schedule the WFFC volume. On the other hand, the VM's printableStatus is set to Provisioning as long as PVCs or DVs provisioning takes place (unless errors occur). This might result in some confusion since the Provisioning condition will be set only partially during the Provisioning status reporting. We can address it by adding first-class Provisioning condition to the VM which is set in accordance with the printableStatus.

  2. With this change, we now have two VM conditions, Failure and Synchronized which are basically used to report errors. Do you think there's justification to hold on to both, or should we try to unify them?

Comment on lines 1763 to 1779
if vmiCondManager.HasCondition(vmi, virtv1.VirtualMachineInstancePaused) {
if !vmCondManager.HasCondition(vm, virtv1.VirtualMachinePaused) {
log.Log.Object(vm).V(3).Info("Adding paused condition")
now := v1.NewTime(time.Now())
vm.Status.Conditions = append(vm.Status.Conditions, virtv1.VirtualMachineCondition{
Type: virtv1.VirtualMachinePaused,
Status: k8score.ConditionTrue,
LastProbeTime: now,
LastTransitionTime: now,
Reason: "PausedByUser",
Message: "VMI was paused by user",
})
}
} else if vmCondManager.HasCondition(vm, virtv1.VirtualMachinePaused) {
log.Log.Object(vm).V(3).Info("Removing paused condition")
vmCondManager.RemoveCondition(vm, virtv1.VirtualMachinePaused)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here (though not new in this PR) assumes that the Paused condition is either True or removed. I guess that's indeed the way it's currently implemented in the VMI controller, but for the VM controller I would just use the generic sync logic below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, i'll change this to have the generic sync logic handle it.

@@ -1369,7 +1369,7 @@ type VirtualMachineCondition struct {

//
// +k8s:openapi-gen=true
type VirtualMachineConditionType string
type VirtualMachineConditionType VirtualMachineInstanceConditionType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this change has any actual effect (the underlying type is still the same), but it kinda suggests the a VM condition is-a VMI condition, which isn't necessarily true (a VM may still have distinct conditions that a VMI doesn't)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think i'll just remove this change. It's not necessary

@davidvossel
Copy link
Member Author

With this change, we now have two VM conditions, Failure and Synchronized which are basically used to report errors. Do you think there's justification to hold on to both, or should we try to unify them?

Yes, i think unifying them is the right thing. If there's a VM sync error, that should take precedence over a VMI sync error on the VM condition.

looking closely at the VM failure condition, it is not accurate as it is written today, and needs some major work. I'll tackle that in this PR

@davidvossel
Copy link
Member Author

/hold

need to unify the VM failure condition and VMI syncronized condition.

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 12, 2021
@davidvossel davidvossel force-pushed the sync-vmi-conditions branch 2 times, most recently from e99e5cf to 606107f Compare October 13, 2021 17:16
@davidvossel
Copy link
Member Author

/hold cancel

The topic of how to handle the VM failed and VMI Synchronized conditions was discussed in the community meeting today. The conclusion is that we can't effectively unify these two conditions. The result will be that the VM reconcile errors will be in the failure condition, and the VMI reconcile errors will be in the Synchronized condition and both conditions will exist on the VM as is.

At least for now... and here's the reasoning.

Our condition naming is a part of our API, and carries the same compatibility guarantees. This means if someone builds a UI component that interprets a VM's failure condition, then we are guaranteeing that component that we will keep that interface stable. So for now, we have to adhere to that.

While looking through all of this, I did see that the VM's failure condition isn't accurately giving the proper reason and message at times depending on what the run strategy is set to. So this PR now addresses that inaccuracy.

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 13, 2021
}
}

if createErr != nil {
logger.Reason(err).Error("Creating the VirtualMachine failed.")
logger.Reason(err).Error("Reconciling the VirtualMachine failed.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should probably be:

logger.Reason(createErr).Error("Reconciling the VirtualMachine failed.")

Since there's at least one case (L293) where err == nil.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh... yeah good catch.

log.Log.Object(vm).Errorf("Error fetching RunStrategy: %v", err)
}
log.Log.Object(vm).V(4).Infof("Processing failure status:: runStrategy: %s; noErr: %t; noVm: %t", runStrategy, createErr != nil, vmi != nil)
func (c *VMController) processFailureCondition(vm *virtv1.VirtualMachine, vmi *virtv1.VirtualMachineInstance, createErr syncError) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit, but createErr can also be a FailedDelete or HotPlugVolumeError.
Maybe syncErr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup createErr is not accurate. I'll update this

@zcahana
Copy link
Contributor

zcahana commented Oct 17, 2021

Looks good!

/lgtm
/retest

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 17, 2021
Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2021
@davidvossel
Copy link
Member Author

@zcahana i had to update some unit tests to pass after the flavor api merger. Can i get another lgtm when you get a chance?

@zcahana
Copy link
Contributor

zcahana commented Oct 18, 2021

Sure.
/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2021
@davidvossel
Copy link
Member Author

/retest

@rmohr
Copy link
Member

rmohr commented Oct 26, 2021

/approve

looks great.

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rmohr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 26, 2021
@kubevirt-bot kubevirt-bot merged commit ccca4d2 into kubevirt:main Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants