Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure the latest schedulerObservedGeneration if do not need to schedule #3455

Merged
merged 1 commit into from
Apr 27, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
27 changes: 26 additions & 1 deletion pkg/scheduler/scheduler.go
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,15 @@ func (s *Scheduler) doScheduleBinding(namespace, name string) (err error) {
}
// TODO(dddddai): reschedule bindings on cluster change
klog.V(3).Infof("Don't need to schedule ResourceBinding(%s/%s)", rb.Namespace, rb.Name)

// If no scheduling is required, we need to ensure that binding.Generation is equal to
// binding.Status.SchedulerObservedGeneration which means the current status of binding
// is the latest status of successful scheduling.
if rb.Generation != rb.Status.SchedulerObservedGeneration {
updateRB := rb.DeepCopy()
updateRB.Status.SchedulerObservedGeneration = updateRB.Generation
return patchBindingStatus(s.KarmadaClient, rb, updateRB)
}
return nil
}

Expand Down Expand Up @@ -382,6 +391,15 @@ func (s *Scheduler) doScheduleClusterBinding(name string) (err error) {
}
// TODO(dddddai): reschedule bindings on cluster change
klog.Infof("Don't need to schedule ClusterResourceBinding(%s)", name)

// If no scheduling is required, we need to ensure that binding.Generation is equal to
// binding.Status.SchedulerObservedGeneration which means the current status of binding
// is the latest status of successful scheduling.
if crb.Generation != crb.Status.SchedulerObservedGeneration {
updateCRB := crb.DeepCopy()
updateCRB.Status.SchedulerObservedGeneration = updateCRB.Generation
return patchClusterResourceBindingStatus(s.KarmadaClient, crb, updateCRB)
}
return nil
}

Expand Down Expand Up @@ -515,6 +533,7 @@ func (s *Scheduler) patchScheduleResultForResourceBinding(oldBinding *workv1alph
_, err = s.KarmadaClient.WorkV1alpha2().ResourceBindings(newBinding.Namespace).Patch(context.TODO(), newBinding.Name, types.MergePatchType, patchBytes, metav1.PatchOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: restart karmada-controller-manager will also increment the generation of RB, but it will not enter the scheduling process at this time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restart karmada-controller-manager will also increment the generation of RB

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaunceyjiang is right. Through my analyze, it's mainly because of detector. For resource templates, we add it to the queue directly for adding events. However, for update events, we only add it to queue if spec of resource templates changes for performance.

func (d *ResourceDetector) OnAdd(obj interface{}) {
runtimeObj, ok := obj.(runtime.Object)
if !ok {
return
}
d.Processor.Enqueue(runtimeObj)
}
// OnUpdate handles object update event and push the object to queue.
func (d *ResourceDetector) OnUpdate(oldObj, newObj interface{}) {
unstructuredOldObj, err := helper.ToUnstructured(oldObj)
if err != nil {
klog.Errorf("Failed to transform oldObj, error: %v", err)
return
}
unstructuredNewObj, err := helper.ToUnstructured(newObj)
if err != nil {
klog.Errorf("Failed to transform newObj, error: %v", err)
return
}
if !SpecificationChanged(unstructuredOldObj, unstructuredNewObj) {
klog.V(4).Infof("Ignore update event of object (kind=%s, %s/%s) as specification no change", unstructuredOldObj.GetKind(), unstructuredOldObj.GetNamespace(), unstructuredOldObj.GetName())
return
}
d.OnAdd(newObj)
}

After the status is aggregated into the resource template, the resourceVersion in rb.spec.resource is inconsistent with that in the actual resource template. It's currently working without issue. However, when controller-manager restarts, detector will add all resource templates again. So we will update rb.spec.resource.resourceVersion to the latest and then generation +1. But for scheduler, it think it do not need to schedule and not enter the scheduling process. So generation and schedulerObservedGeneration is not the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, return the latest binding after patching scheduler results will not work but the former redundancy design works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the former redundancy design works.

Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the status is aggregated into the resource template, the resourceVersion in rb.spec.resource is inconsistent with that in the actual resource template. It's currently working without issue. However, when controller-manager restarts, detector will add all resource templates again. So we will update rb.spec.resource.resourceVersion to the latest and then generation +1.

By the way, currently the graceful-eviction-controller relies too much on this mechanism. I think the graceful-eviction-controller should also handle create event.

#3475

@RainbowMango @Poor12 @RainbowMango What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the graceful-eviction-controller should also handle create event.

What do you mean handle create event?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do something else during this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do something else during this PR?

No, This PR can solve my problem .

if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this phenomenon still occur with this fix? Would you like to share your steps to reproduce?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reproduce it using the steps in the issue description.

I tried using your patch to fix the issue I encountered, but it doesn't seem to have solved it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to reproduce it using the steps in the issue, but it seems the generation and the schedulerObservedGeneration is the same.

klog.Errorf("Failed to patch schedule to ResourceBinding(%s/%s): %v", oldBinding.Namespace, oldBinding.Name, err)
return err
}

klog.V(4).Infof("Patch schedule to ResourceBinding(%s/%s) succeed", oldBinding.Namespace, oldBinding.Name)
Expand Down Expand Up @@ -649,7 +668,13 @@ func (s *Scheduler) patchScheduleResultForClusterResourceBinding(oldBinding *wor
}

_, err = s.KarmadaClient.WorkV1alpha2().ClusterResourceBindings().Patch(context.TODO(), newBinding.Name, types.MergePatchType, patchBytes, metav1.PatchOptions{})
return err
if err != nil {
klog.Errorf("Failed to patch schedule to ClusterResourceBinding(%s): %v", oldBinding.Name, err)
return err
}

klog.V(4).Infof("Patch schedule to ClusterResourceBinding(%s) succeed", oldBinding.Name)
return nil
}

func (s *Scheduler) handleErr(err error, key interface{}) {
Expand Down