-
Notifications
You must be signed in to change notification settings - Fork 823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure the latest schedulerObservedGeneration if do not need to schedule #3455
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -334,6 +334,15 @@ func (s *Scheduler) doScheduleBinding(namespace, name string) (err error) { | |
} | ||
// TODO(dddddai): reschedule bindings on cluster change | ||
klog.V(3).Infof("Don't need to schedule ResourceBinding(%s/%s)", rb.Namespace, rb.Name) | ||
|
||
// If no scheduling is required, we need to ensure that binding.Generation is equal to | ||
// binding.Status.SchedulerObservedGeneration which means the current status of binding | ||
// is the latest status of successful scheduling. | ||
if rb.Generation != rb.Status.SchedulerObservedGeneration { | ||
updateRB := rb.DeepCopy() | ||
updateRB.Status.SchedulerObservedGeneration = updateRB.Generation | ||
return patchBindingStatus(s.KarmadaClient, rb, updateRB) | ||
} | ||
return nil | ||
} | ||
|
||
|
@@ -382,6 +391,15 @@ func (s *Scheduler) doScheduleClusterBinding(name string) (err error) { | |
} | ||
// TODO(dddddai): reschedule bindings on cluster change | ||
klog.Infof("Don't need to schedule ClusterResourceBinding(%s)", name) | ||
|
||
// If no scheduling is required, we need to ensure that binding.Generation is equal to | ||
// binding.Status.SchedulerObservedGeneration which means the current status of binding | ||
// is the latest status of successful scheduling. | ||
if crb.Generation != crb.Status.SchedulerObservedGeneration { | ||
updateCRB := crb.DeepCopy() | ||
updateCRB.Status.SchedulerObservedGeneration = updateCRB.Generation | ||
return patchClusterResourceBindingStatus(s.KarmadaClient, crb, updateCRB) | ||
} | ||
return nil | ||
} | ||
|
||
|
@@ -515,6 +533,7 @@ func (s *Scheduler) patchScheduleResultForResourceBinding(oldBinding *workv1alph | |
_, err = s.KarmadaClient.WorkV1alpha2().ResourceBindings(newBinding.Namespace).Patch(context.TODO(), newBinding.Name, types.MergePatchType, patchBytes, metav1.PatchOptions{}) | ||
if err != nil { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this phenomenon still occur with this fix? Would you like to share your steps to reproduce? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm trying to reproduce it using the steps in the issue, but it seems the generation and the schedulerObservedGeneration is the same. |
||
klog.Errorf("Failed to patch schedule to ResourceBinding(%s/%s): %v", oldBinding.Namespace, oldBinding.Name, err) | ||
return err | ||
} | ||
|
||
klog.V(4).Infof("Patch schedule to ResourceBinding(%s/%s) succeed", oldBinding.Namespace, oldBinding.Name) | ||
|
@@ -649,7 +668,13 @@ func (s *Scheduler) patchScheduleResultForClusterResourceBinding(oldBinding *wor | |
} | ||
|
||
_, err = s.KarmadaClient.WorkV1alpha2().ClusterResourceBindings().Patch(context.TODO(), newBinding.Name, types.MergePatchType, patchBytes, metav1.PatchOptions{}) | ||
return err | ||
if err != nil { | ||
klog.Errorf("Failed to patch schedule to ClusterResourceBinding(%s): %v", oldBinding.Name, err) | ||
return err | ||
} | ||
|
||
klog.V(4).Infof("Patch schedule to ClusterResourceBinding(%s) succeed", oldBinding.Name) | ||
return nil | ||
} | ||
|
||
func (s *Scheduler) handleErr(err error, key interface{}) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: restart
karmada-controller-manager
will also increment thegeneration
of RB, but it will not enter the scheduling process at this time.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chaunceyjiang is right. Through my analyze, it's mainly because of detector. For resource templates, we add it to the queue directly for adding events. However, for update events, we only add it to queue if spec of resource templates changes for performance.
karmada/pkg/detector/detector.go
Lines 280 to 308 in ec7b3b1
After the status is aggregated into the resource template, the resourceVersion in rb.spec.resource is inconsistent with that in the actual resource template. It's currently working without issue. However, when controller-manager restarts, detector will add all resource templates again. So we will update rb.spec.resource.resourceVersion to the latest and then generation +1. But for scheduler, it think it do not need to schedule and not enter the scheduling process. So generation and schedulerObservedGeneration is not the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, return the latest binding after patching scheduler results will not work but the former redundancy design works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, currently the graceful-eviction-controller relies too much on this mechanism. I think the graceful-eviction-controller should also handle create event.
#3475
@RainbowMango @Poor12 @RainbowMango What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean
handle create event
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do something else during this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/karmada-io/karmada/blob/master/pkg/controllers/gracefuleviction/rb_graceful_eviction_controller.go#L87
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, This PR can solve my problem .