New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate dispatch #177
Migrate dispatch #177
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
migrationPod, exists, err := md.vmService.GetMigrationJob(migration) | ||
|
||
if err != nil { | ||
logger.Error().Reason(err).Msg("Checking for an existing migration job failed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point, but it causes cognitive dissonance for me when we refer to this as a "migration job". it's understandable why we call it that, but the word "job" has specific meaning in Kubernetes which doesn't really apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I didn't want to change it yet, but I am starting to refer to this as the migration pod, and the others as the source and destination pods. I'd like to formalize these terms, and move the code for the pod handlers into the vm and migration.go files.
migrationKey interface{} | ||
srcIp clientv1.NodeAddress | ||
destIp kubev1.NodeAddress | ||
srcNodeIp kubev1.Node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable name is a bit misleading. It's a node which has an IP defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. I'll adjust the name.
1da58be
to
d1f8e25
Compare
setMigrationPhase(migration, v1.MigrationInProgress) | ||
return | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No matter if you had to create the targetPod, or it was there from a last failed attempt. Here you have to set the migration to InProgress and then return.
Then on subsequent loop entries, you can start right after this line:
switch migration.Status.Phase {
case "":
do everything until this line and set state to MigrationInProgress
case MigrationInProgress:
1) check if the target pod is running
2) set migration to failed if it is in failed/succeeded state
3) if it is not running, you are done, since we did not yet get informed by the secondary pod watch loop (it will come later, so no reenqueue)
4) if pod is running, continue with the logic starting from above this comment.
}
// Successful run, so forget the history
forget(key)
done(key)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also think about the possibility to safe conditions on a controller object, to have controller internal substates ...
queue.AddRateLimited(key) | ||
return | ||
} | ||
} | ||
setMigrationPhase(migration, v1.MigrationInProgress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the state in the else if
is wrong, since we could error before we reach this line. On a subsequent run the pod might not be scheduled yet. Below you are not testing if the pod is running or scheduled, which can irritate components, relying on the correctness of that field.
|
||
switch migration.Status.Phase { | ||
case v1.MigrationUnknown: | ||
// Fetch vm which we want to migrate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover
_, targetPod := investigateTargetPodSituation(migration, podList) | ||
|
||
if targetPod == nil { | ||
setMigrationFailed(migration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of a final return, we have to always do a queue.Forget(key)
. Otherwise, if a migration with the same name is posted again, it might be processed delayed. Also the controller would see a fail history ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done here. What about elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everywhere where you return without AddRateLimited.
In the next iteration of refactoring we can finally get rid of that duplication
switch targetPod.Status.Phase { | ||
case k8sv1.PodRunning: | ||
break | ||
//Figure out why. report. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Was a reminder to myself that should not have been left there.
if vm.Status.MigrationNodeName != targetPod.Spec.NodeName { | ||
vm.Status.Phase = v1.Migrating | ||
vm.Status.MigrationNodeName = targetPod.Spec.NodeName | ||
if err = md.updateVm(vm); err != nil { | ||
queue.AddRateLimited(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a return missing?
logger.Error().Reason(err).Msgf("updating migration state failed : %v ", err) | ||
//TODO add a new state that more accurately reflects the process up the this point | ||
//and then use MigrationInProgress to accurately indicate the actual migration pod is running | ||
//setMigrationPhase(migration, v1.MigrationInProgress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MigrationTargetRequested when we call to start the target pod. MigrationRunning will be when we call to start the migration process itself.
|
||
switch migrationPod.Status.Phase { | ||
case k8sv1.PodFailed: | ||
setMigrationPhase(migration, v1.MigrationFailed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to first set the VM back and then update the migration state, otherwies we would not reach this point again in an error case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
setMigrationPhase(migration, v1.MigrationSucceeded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the order is good
a5e8f8f
to
87e76c3
Compare
pkg/virt-controller/watch/job.go
Outdated
jd.migrationQueue.Add(migrationKey) | ||
} else { | ||
logger := logging.DefaultLogger().Object(migration) | ||
logger.Error().Reason(err).Msgf("Updating migration queue failed.", migrationLabel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably you meant to include the migrationLabel in the error message.
} | ||
|
||
if !exists { | ||
logger.Info().Msgf("VM with name %s does not exist, marking migration as failed", migration.Spec.Selector.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to use logger.Info() or should this condition also be a call to logger.Error()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Yes indeed.
b378053
to
0bbde71
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some forget calls are missing and one state. The rest looks good.
func (md *MigrationDispatch) updateVm(vmCopy *v1.VM) error { | ||
if _, err := md.vmService.PutVm(vmCopy); err != nil { | ||
logger := logging.DefaultLogger().Object(vmCopy) | ||
logger.V(3).Info().Msg("Enqueuing VM again.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move that message out of the method? it is not requeued here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put better message in its place.
@@ -219,8 +231,8 @@ func (md *MigrationDispatch) Execute(store cache.Store, queue workqueue.RateLimi | |||
return | |||
} | |||
//TODO add a new state that more accurately reflects the process up the this point | |||
//and then use MigrationInProgress to accurately indicate the actual migration pod is running | |||
//setMigrationPhase(migration, v1.MigrationInProgress) | |||
//and then use MigrationScheduled to accurately indicate the actual migration pod is running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were not introducing a new state here, so far you just renamed MigrationPending and MigrationInProgress.
At this point we have scheduled the pod which will do the migration, that is where we are missing the state transission.
_, targetPod := investigateTargetPodSituation(migration, podList) | ||
|
||
if targetPod == nil { | ||
setMigrationFailed(migration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everywhere where you return without AddRateLimited.
In the next iteration of refactoring we can finally get rid of that duplication
queue.AddRateLimited(key) | ||
return | ||
} | ||
_, targetPod := investigateTargetPodSituation(migration, podList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are missing a check if targetPod is nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to fix our setup, affinity does not seem to work anymore ...
26b2984
to
9bcc124
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the Hardcoded 10 in the call (at least make a symbolic constant) looks good.
Instead of work being performed in the pod and job dispatches, the Migration is requeued, which triggers the Migration controller to reevaluate the migration. This unifies the migration logic in the migration controller's ispatch.
9bcc124
to
f8b0019
Compare
f8b0019
to
95cc007
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functional test succeed on my workstation. Still some work to do to make jenkins work. Looks good to me.
Macvlan arp proxy
tuning: add some generic link settings (MAC, promisc, MTU)
This diff adds documents for kubevirt#177 change (mac/mtu/promisc) in tuning README.md. Fixes kubevirt#199.
* Provisioning of 1.16.2 Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com> * Fix node01.sh for k8s 1.16 Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com> * Update local-volume for 1.16 Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com> * Add cluster-up for k8s 1.16 Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com> * Add sha256 sum for 1.16 Add missing one for 1.15 also Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com>
WIP: Unified dispatch function is written, but not yet triggered by queue update.