Sync VMI conditions to VM #6575

davidvossel · 2021-10-11T19:15:42Z

Previously, we hand picked a few VMI conditions to sync to the VM such as ready and paused. In this PR, all VMI conditions are now synced to the VM. This is done in an effort to streamline debugging efforts.

There is a new "printableStatus" field on the VM object that aggregates a VM's state across multiple objects (VMI, DV, PVC, Pod) into a human readable field. By syncing all VMI conditions to the VM, we further streamline debugging by bubbling up runtime conditions on the VM.

This means that the VM's "printableStatus" field gives us a quick high level status, and the VM's conditions will contain more detailed reasoning behind why the status exists.

VM controller now syncs VMI conditions to corresponding VM object

davidvossel · 2021-10-11T19:17:16Z

/cc @rmohr @zcahana

how do you all feel about this? The idea here is to begin consolidating debug info on the VM object by bubbling up conditions from the lower layers.

zcahana

Hi @davidvossel, I believe this is generally a good direction, however there are two issues we need to consider:

The VMI's Provisioning condition is set when the VMI is pending and a temp pod is created to schedule the WFFC volume. On the other hand, the VM's printableStatus is set to Provisioning as long as PVCs or DVs provisioning takes place (unless errors occur). This might result in some confusion since the Provisioning condition will be set only partially during the Provisioning status reporting. We can address it by adding first-class Provisioning condition to the VM which is set in accordance with the printableStatus.
With this change, we now have two VM conditions, Failure and Synchronized which are basically used to report errors. Do you think there's justification to hold on to both, or should we try to unify them?

zcahana · 2021-10-12T08:27:29Z

pkg/virt-controller/watch/vm.go

+	if vmiCondManager.HasCondition(vmi, virtv1.VirtualMachineInstancePaused) {
+		if !vmCondManager.HasCondition(vm, virtv1.VirtualMachinePaused) {
+			log.Log.Object(vm).V(3).Info("Adding paused condition")
+			now := v1.NewTime(time.Now())
+			vm.Status.Conditions = append(vm.Status.Conditions, virtv1.VirtualMachineCondition{
+				Type:               virtv1.VirtualMachinePaused,
+				Status:             k8score.ConditionTrue,
+				LastProbeTime:      now,
+				LastTransitionTime: now,
+				Reason:             "PausedByUser",
+				Message:            "VMI was paused by user",
+			})
+		}
+	} else if vmCondManager.HasCondition(vm, virtv1.VirtualMachinePaused) {
+		log.Log.Object(vm).V(3).Info("Removing paused condition")
+		vmCondManager.RemoveCondition(vm, virtv1.VirtualMachinePaused)
+	}


The logic here (though not new in this PR) assumes that the Paused condition is either True or removed. I guess that's indeed the way it's currently implemented in the VMI controller, but for the VM controller I would just use the generic sync logic below.

yep, i'll change this to have the generic sync logic handle it.

zcahana · 2021-10-12T09:10:22Z

staging/src/kubevirt.io/client-go/api/v1/types.go

@@ -1369,7 +1369,7 @@ type VirtualMachineCondition struct {

 //
 // +k8s:openapi-gen=true
-type VirtualMachineConditionType string
+type VirtualMachineConditionType VirtualMachineInstanceConditionType


I'm not sure if this change has any actual effect (the underlying type is still the same), but it kinda suggests the a VM condition is-a VMI condition, which isn't necessarily true (a VM may still have distinct conditions that a VMI doesn't)

yeah, I think i'll just remove this change. It's not necessary

davidvossel · 2021-10-12T19:14:49Z

With this change, we now have two VM conditions, Failure and Synchronized which are basically used to report errors. Do you think there's justification to hold on to both, or should we try to unify them?

Yes, i think unifying them is the right thing. If there's a VM sync error, that should take precedence over a VMI sync error on the VM condition.

looking closely at the VM failure condition, it is not accurate as it is written today, and needs some major work. I'll tackle that in this PR

davidvossel · 2021-10-12T19:15:29Z

/hold

need to unify the VM failure condition and VMI syncronized condition.

davidvossel · 2021-10-13T17:58:32Z

/hold cancel

The topic of how to handle the VM failed and VMI Synchronized conditions was discussed in the community meeting today. The conclusion is that we can't effectively unify these two conditions. The result will be that the VM reconcile errors will be in the failure condition, and the VMI reconcile errors will be in the Synchronized condition and both conditions will exist on the VM as is.

At least for now... and here's the reasoning.

Our condition naming is a part of our API, and carries the same compatibility guarantees. This means if someone builds a UI component that interprets a VM's failure condition, then we are guaranteeing that component that we will keep that interface stable. So for now, we have to adhere to that.

While looking through all of this, I did see that the VM's failure condition isn't accurately giving the proper reason and message at times depending on what the run strategy is set to. So this PR now addresses that inaccuracy.

zcahana · 2021-10-14T11:51:49Z

pkg/virt-controller/watch/vm.go

 		}
 	}

 	if createErr != nil {
-		logger.Reason(err).Error("Creating the VirtualMachine failed.")
+		logger.Reason(err).Error("Reconciling the VirtualMachine failed.")


That should probably be:

logger.Reason(createErr).Error("Reconciling the VirtualMachine failed.")

Since there's at least one case (L293) where err == nil.

ugh... yeah good catch.

zcahana · 2021-10-14T11:53:23Z

pkg/virt-controller/watch/vm.go

-		log.Log.Object(vm).Errorf("Error fetching RunStrategy: %v", err)
-	}
-	log.Log.Object(vm).V(4).Infof("Processing failure status:: runStrategy: %s; noErr: %t; noVm: %t", runStrategy, createErr != nil, vmi != nil)
+func (c *VMController) processFailureCondition(vm *virtv1.VirtualMachine, vmi *virtv1.VirtualMachineInstance, createErr syncError) {


just a nit, but createErr can also be a FailedDelete or HotPlugVolumeError.
Maybe syncErr?

yup createErr is not accurate. I'll update this

zcahana · 2021-10-17T12:12:39Z

Looks good!

/lgtm
/retest

Signed-off-by: David Vossel <davidvossel@gmail.com>

davidvossel · 2021-10-18T20:11:32Z

@zcahana i had to update some unit tests to pass after the flavor api merger. Can i get another lgtm when you get a chance?

zcahana · 2021-10-18T20:19:43Z

Sure.
/lgtm

davidvossel · 2021-10-21T15:42:26Z

/retest

rmohr · 2021-10-26T07:24:27Z

/approve

looks great.

kubevirt-bot · 2021-10-26T07:24:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rmohr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rmohr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Oct 11, 2021

kubevirt-bot requested review from EdDev and vladikr October 11, 2021 19:15

kubevirt-bot added size/L kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Oct 11, 2021

kubevirt-bot requested review from rmohr and zcahana October 11, 2021 19:17

zcahana reviewed Oct 12, 2021

View reviewed changes

kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 12, 2021

davidvossel force-pushed the sync-vmi-conditions branch 2 times, most recently from e99e5cf to 606107f Compare October 13, 2021 17:16

kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 13, 2021

zcahana reviewed Oct 14, 2021

View reviewed changes

davidvossel force-pushed the sync-vmi-conditions branch from 606107f to 7794572 Compare October 15, 2021 21:03

kubevirt-bot assigned zcahana Oct 17, 2021

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 17, 2021

davidvossel added 5 commits October 18, 2021 16:05

Sync all VMI conditions to VM

cb555f9

Signed-off-by: David Vossel <davidvossel@gmail.com>

unit tests to verify sync of vmi conditions to vm

e5bd5bb

Signed-off-by: David Vossel <davidvossel@gmail.com>

Use generic vmi sync logic for VM paused condition

4a5ca6c

Signed-off-by: David Vossel <davidvossel@gmail.com>

Simply and enhance VM failure condition logic

7b1b9e2

Signed-off-by: David Vossel <davidvossel@gmail.com>

Unit test updates for enhanced VM failure condition logic

cb3824f

Signed-off-by: David Vossel <davidvossel@gmail.com>

davidvossel force-pushed the sync-vmi-conditions branch from 7794572 to cb3824f Compare October 18, 2021 20:11

kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2021

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2021

kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 26, 2021

kubevirt-bot merged commit ccca4d2 into kubevirt:main Oct 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync VMI conditions to VM #6575

Sync VMI conditions to VM #6575

davidvossel commented Oct 11, 2021

davidvossel commented Oct 11, 2021

zcahana left a comment

zcahana Oct 12, 2021

davidvossel Oct 12, 2021

zcahana Oct 12, 2021

davidvossel Oct 12, 2021

davidvossel commented Oct 12, 2021

davidvossel commented Oct 12, 2021

davidvossel commented Oct 13, 2021

zcahana Oct 14, 2021

davidvossel Oct 15, 2021

zcahana Oct 14, 2021

davidvossel Oct 15, 2021

zcahana commented Oct 17, 2021

davidvossel commented Oct 18, 2021

zcahana commented Oct 18, 2021

davidvossel commented Oct 21, 2021

rmohr commented Oct 26, 2021

kubevirt-bot commented Oct 26, 2021

Sync VMI conditions to VM #6575

Sync VMI conditions to VM #6575

Conversation

davidvossel commented Oct 11, 2021

davidvossel commented Oct 11, 2021

zcahana left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidvossel commented Oct 12, 2021

davidvossel commented Oct 12, 2021

davidvossel commented Oct 13, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zcahana commented Oct 17, 2021

davidvossel commented Oct 18, 2021

zcahana commented Oct 18, 2021

davidvossel commented Oct 21, 2021

rmohr commented Oct 26, 2021

kubevirt-bot commented Oct 26, 2021