Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure the latest schedulerObservedGeneration if do not need to schedule #3455

Merged
merged 1 commit into from Apr 27, 2023

Conversation

Poor12
Copy link
Member

@Poor12 Poor12 commented Apr 23, 2023

What type of PR is this?
/kind bug

What this PR does / why we need it:
Ensure the latest schedulerObservedGeneration if do not need to schedule.

Which issue(s) this PR fixes:
Fixes #3454
Fixes #3467

Special notes for your reviewer:
This PR need to cherry-pick.

Does this PR introduce a user-facing change?:

`karmada-scheduler`: Fixed the issue of inconsistent Generation and SchedulerObservedGeneration.

@karmada-bot karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 23, 2023
@karmada-bot karmada-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 23, 2023
@Poor12
Copy link
Member Author

Poor12 commented Apr 23, 2023

/assign @XiShanYongYe-Chang

@codecov-commenter
Copy link

codecov-commenter commented Apr 23, 2023

Codecov Report

Merging #3455 (9ee205f) into master (48e93dc) will increase coverage by 1.65%.
The diff coverage is 0.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master    #3455      +/-   ##
==========================================
+ Coverage   51.94%   53.59%   +1.65%     
==========================================
  Files         210      210              
  Lines       19077    19176      +99     
==========================================
+ Hits         9910    10278     +368     
+ Misses       8618     8346     -272     
- Partials      549      552       +3     
Flag Coverage Δ
unittests 53.59% <0.00%> (+1.65%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/scheduler/scheduler.go 18.00% <0.00%> (-0.11%) ⬇️

... and 9 files with indirect coverage changes

Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~
/lgtm

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 23, 2023
@XiShanYongYe-Chang
Copy link
Member

Maybe we need to describe the components in the release-note.

@Poor12
Copy link
Member Author

Poor12 commented Apr 23, 2023

Maybe we need to describe the components in the release-note.

Done.

@Poor12
Copy link
Member Author

Poor12 commented Apr 24, 2023

/assign @RainbowMango

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root cause of this issue is that we are using patch to update the status after we updated the .spec part which will lead generation increasing.

I believe this patch could work around this problem, but it looks like a redundancy design. I wonder if we have another alternative solution. cc @Garrybest for help.

@Poor12
Copy link
Member Author

Poor12 commented Apr 25, 2023

The root cause of this issue is that we are using patch to update the status after we updated the .spec part which will lead generation increasing.

I believe this patch could work around this problem, but it looks like a redundancy design. I wonder if we have another alternative solution. cc @Garrybest for help.

Another optional idea is that we need to distinguish the situation of scheduler filling in the scheduling result.

@Garrybest
Copy link
Member

Here is my thought:

  1. Scheduler patches a result hereso rb.generation increases.
  2. Scheduler then patches status here but the rb is obsolete, so we can't update the latest observedGeneration.

@Garrybest
Copy link
Member

I guess this fucntion could return the latest rb here but we ignore the return value.

@Poor12
Copy link
Member Author

Poor12 commented Apr 25, 2023

I guess this fucntion could return the latest rb here but we ignore the return value.

I agree with @Garrybest. Ensuring that the binding is the latest binding before patching status is a good solution.

@Poor12 Poor12 force-pushed the fix-generation-inconsistent branch from 07ad4f3 to 4fcdf7b Compare April 26, 2023 03:24
@karmada-bot karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 26, 2023
@Poor12 Poor12 force-pushed the fix-generation-inconsistent branch from 4fcdf7b to 3b80fe3 Compare April 26, 2023 03:29
@karmada-bot karmada-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 26, 2023
@@ -512,11 +512,13 @@ func (s *Scheduler) patchScheduleResultForResourceBinding(oldBinding *workv1alph
return nil
}

_, err = s.KarmadaClient.WorkV1alpha2().ResourceBindings(newBinding.Namespace).Patch(context.TODO(), newBinding.Name, types.MergePatchType, patchBytes, metav1.PatchOptions{})
result, err := s.KarmadaClient.WorkV1alpha2().ResourceBindings(newBinding.Namespace).Patch(context.TODO(), newBinding.Name, types.MergePatchType, patchBytes, metav1.PatchOptions{})
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this phenomenon still occur with this fix? Would you like to share your steps to reproduce?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reproduce it using the steps in the issue description.

I tried using your patch to fix the issue I encountered, but it doesn't seem to have solved it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to reproduce it using the steps in the issue, but it seems the generation and the schedulerObservedGeneration is the same.

@@ -512,11 +512,13 @@ func (s *Scheduler) patchScheduleResultForResourceBinding(oldBinding *workv1alph
return nil
}

_, err = s.KarmadaClient.WorkV1alpha2().ResourceBindings(newBinding.Namespace).Patch(context.TODO(), newBinding.Name, types.MergePatchType, patchBytes, metav1.PatchOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: restart karmada-controller-manager will also increment the generation of RB, but it will not enter the scheduling process at this time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restart karmada-controller-manager will also increment the generation of RB

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaunceyjiang is right. Through my analyze, it's mainly because of detector. For resource templates, we add it to the queue directly for adding events. However, for update events, we only add it to queue if spec of resource templates changes for performance.

func (d *ResourceDetector) OnAdd(obj interface{}) {
runtimeObj, ok := obj.(runtime.Object)
if !ok {
return
}
d.Processor.Enqueue(runtimeObj)
}
// OnUpdate handles object update event and push the object to queue.
func (d *ResourceDetector) OnUpdate(oldObj, newObj interface{}) {
unstructuredOldObj, err := helper.ToUnstructured(oldObj)
if err != nil {
klog.Errorf("Failed to transform oldObj, error: %v", err)
return
}
unstructuredNewObj, err := helper.ToUnstructured(newObj)
if err != nil {
klog.Errorf("Failed to transform newObj, error: %v", err)
return
}
if !SpecificationChanged(unstructuredOldObj, unstructuredNewObj) {
klog.V(4).Infof("Ignore update event of object (kind=%s, %s/%s) as specification no change", unstructuredOldObj.GetKind(), unstructuredOldObj.GetNamespace(), unstructuredOldObj.GetName())
return
}
d.OnAdd(newObj)
}

After the status is aggregated into the resource template, the resourceVersion in rb.spec.resource is inconsistent with that in the actual resource template. It's currently working without issue. However, when controller-manager restarts, detector will add all resource templates again. So we will update rb.spec.resource.resourceVersion to the latest and then generation +1. But for scheduler, it think it do not need to schedule and not enter the scheduling process. So generation and schedulerObservedGeneration is not the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, return the latest binding after patching scheduler results will not work but the former redundancy design works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the former redundancy design works.

Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the status is aggregated into the resource template, the resourceVersion in rb.spec.resource is inconsistent with that in the actual resource template. It's currently working without issue. However, when controller-manager restarts, detector will add all resource templates again. So we will update rb.spec.resource.resourceVersion to the latest and then generation +1.

By the way, currently the graceful-eviction-controller relies too much on this mechanism. I think the graceful-eviction-controller should also handle create event.

#3475

@RainbowMango @Poor12 @RainbowMango What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the graceful-eviction-controller should also handle create event.

What do you mean handle create event?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do something else during this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do something else during this PR?

No, This PR can solve my problem .

Signed-off-by: Poor12 <shentiecheng@huawei.com>
@Poor12 Poor12 force-pushed the fix-generation-inconsistent branch from 3b80fe3 to 9ee205f Compare April 27, 2023 03:27
@Poor12
Copy link
Member Author

Poor12 commented Apr 27, 2023

@chaunceyjiang, could you please help verify whether this fix can solve your problem? Great thanks.

@chaunceyjiang
Copy link
Member

@chaunceyjiang, could you please help verify whether this fix can solve your problem? Great thanks.

Yes. I also used this patch in my local env.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 27, 2023
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 27, 2023
@karmada-bot karmada-bot merged commit 7adfdfa into karmada-io:master Apr 27, 2023
12 checks passed
@chaunceyjiang
Copy link
Member

chaunceyjiang commented Apr 27, 2023

BTW, I think this patch should be placed in the binding-controller instead of the scheduler.

karmada-bot added a commit that referenced this pull request Apr 27, 2023
…pstream-release-1.5

Automated cherry pick of #3455: fix generation inconsistent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
7 participants