New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync VMI conditions to VM #6575
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @davidvossel, I believe this is generally a good direction, however there are two issues we need to consider:
-
The VMI's
Provisioning
condition is set when the VMI is pending and a temp pod is created to schedule the WFFC volume. On the other hand, the VM's printableStatus is set toProvisioning
as long as PVCs or DVs provisioning takes place (unless errors occur). This might result in some confusion since theProvisioning
condition will be set only partially during theProvisioning
status reporting. We can address it by adding first-classProvisioning
condition to the VM which is set in accordance with the printableStatus. -
With this change, we now have two VM conditions,
Failure
andSynchronized
which are basically used to report errors. Do you think there's justification to hold on to both, or should we try to unify them?
pkg/virt-controller/watch/vm.go
Outdated
if vmiCondManager.HasCondition(vmi, virtv1.VirtualMachineInstancePaused) { | ||
if !vmCondManager.HasCondition(vm, virtv1.VirtualMachinePaused) { | ||
log.Log.Object(vm).V(3).Info("Adding paused condition") | ||
now := v1.NewTime(time.Now()) | ||
vm.Status.Conditions = append(vm.Status.Conditions, virtv1.VirtualMachineCondition{ | ||
Type: virtv1.VirtualMachinePaused, | ||
Status: k8score.ConditionTrue, | ||
LastProbeTime: now, | ||
LastTransitionTime: now, | ||
Reason: "PausedByUser", | ||
Message: "VMI was paused by user", | ||
}) | ||
} | ||
} else if vmCondManager.HasCondition(vm, virtv1.VirtualMachinePaused) { | ||
log.Log.Object(vm).V(3).Info("Removing paused condition") | ||
vmCondManager.RemoveCondition(vm, virtv1.VirtualMachinePaused) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here (though not new in this PR) assumes that the Paused condition is either True or removed. I guess that's indeed the way it's currently implemented in the VMI controller, but for the VM controller I would just use the generic sync logic below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, i'll change this to have the generic sync logic handle it.
@@ -1369,7 +1369,7 @@ type VirtualMachineCondition struct { | |||
|
|||
// | |||
// +k8s:openapi-gen=true | |||
type VirtualMachineConditionType string | |||
type VirtualMachineConditionType VirtualMachineInstanceConditionType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this change has any actual effect (the underlying type is still the same), but it kinda suggests the a VM condition is-a VMI condition, which isn't necessarily true (a VM may still have distinct conditions that a VMI doesn't)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think i'll just remove this change. It's not necessary
Yes, i think unifying them is the right thing. If there's a VM sync error, that should take precedence over a VMI sync error on the VM condition. looking closely at the VM failure condition, it is not accurate as it is written today, and needs some major work. I'll tackle that in this PR |
/hold need to unify the VM failure condition and VMI syncronized condition. |
e99e5cf
to
606107f
Compare
/hold cancel The topic of how to handle the VM failed and VMI Synchronized conditions was discussed in the community meeting today. The conclusion is that we can't effectively unify these two conditions. The result will be that the VM reconcile errors will be in the failure condition, and the VMI reconcile errors will be in the Synchronized condition and both conditions will exist on the VM as is. At least for now... and here's the reasoning. Our condition naming is a part of our API, and carries the same compatibility guarantees. This means if someone builds a UI component that interprets a VM's failure condition, then we are guaranteeing that component that we will keep that interface stable. So for now, we have to adhere to that. While looking through all of this, I did see that the VM's failure condition isn't accurately giving the proper reason and message at times depending on what the run strategy is set to. So this PR now addresses that inaccuracy. |
pkg/virt-controller/watch/vm.go
Outdated
} | ||
} | ||
|
||
if createErr != nil { | ||
logger.Reason(err).Error("Creating the VirtualMachine failed.") | ||
logger.Reason(err).Error("Reconciling the VirtualMachine failed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should probably be:
logger.Reason(createErr).Error("Reconciling the VirtualMachine failed.")
Since there's at least one case (L293) where err == nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh... yeah good catch.
pkg/virt-controller/watch/vm.go
Outdated
log.Log.Object(vm).Errorf("Error fetching RunStrategy: %v", err) | ||
} | ||
log.Log.Object(vm).V(4).Infof("Processing failure status:: runStrategy: %s; noErr: %t; noVm: %t", runStrategy, createErr != nil, vmi != nil) | ||
func (c *VMController) processFailureCondition(vm *virtv1.VirtualMachine, vmi *virtv1.VirtualMachineInstance, createErr syncError) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit, but createErr
can also be a FailedDelete
or HotPlugVolumeError
.
Maybe syncErr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup createErr
is not accurate. I'll update this
606107f
to
7794572
Compare
Looks good! /lgtm |
Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
7794572
to
cb3824f
Compare
@zcahana i had to update some unit tests to pass after the flavor api merger. Can i get another lgtm when you get a chance? |
Sure. |
/retest |
/approve looks great. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rmohr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Previously, we hand picked a few VMI conditions to sync to the VM such as ready and paused. In this PR, all VMI conditions are now synced to the VM. This is done in an effort to streamline debugging efforts.
There is a new "printableStatus" field on the VM object that aggregates a VM's state across multiple objects (VMI, DV, PVC, Pod) into a human readable field. By syncing all VMI conditions to the VM, we further streamline debugging by bubbling up runtime conditions on the VM.
This means that the VM's "printableStatus" field gives us a quick high level status, and the VM's conditions will contain more detailed reasoning behind why the status exists.