-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reschedule bindings when unjoining a target cluster #1049
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
we need to do code refactoring here which will failed for
|
What do you mean by "failed for For example, there is a propagation policy that propagates a workload to all clusters, should it perform reschedule when unjoining a cluster? I guess the answer is YES, so is this the difference between |
我直接使用中文吧。FailoverSchedule最早是设计成和spread constraint一起用的。所以它的逻辑是:
原场景: 问题场景: |
明白 |
有点复杂,需要结合是否配置了spread constraint来判断。这里之前写的也不完善,简单的选出相同数目的集群,其实只能满足
|
删掉不健康集群的调度结果,保留正常集群的调度结果,然后代码直接走normal scheduling逻辑,让调度器二次调度,这样可以么?#1051中我已经把scale scheduling和normal scheduling合并了,我在想failover的逻辑是否可以进行合并? |
When all types share the same scheduling logic, spread constraint will be concerned automatically. Now Failover has some defects, e.g., it does not take idle resource into consideration when doing rescheduling. I thought we may merge all types into one scheduling process together. |
Agree, I think this is the best way to solve the problem |
好主意,我去看下合并的pr。failover确实需要大优化。 |
Please cc me after you made an agreement. :) |
a13375d
to
c5ece75
Compare
Hi @Garrybest, I commited a new patch to merge failover schedule with normal schedule |
/assign |
660220b
to
e2f7f16
Compare
d65ab94
to
7abe58d
Compare
ff11011
to
c83dff6
Compare
12be21d
to
2590917
Compare
46e7f49
to
a08dd3d
Compare
@dddddai Let us discuss the PR in the weekly meeting tomorrow and the scenario “unjoining a target cluster” is a common user‘s ’behaviors |
there is nothing to requeue the rb about the cluster deleted in function deleteCluster ; I think it should add the requeue logic karmada/pkg/scheduler/scheduler.go Lines 639 to 660 in 8ceb9df
|
@@ -33,24 +33,29 @@ func (s *Snapshot) GetClusters() []*framework.ClusterInfo { | |||
func (s *Snapshot) GetReadyClusters() []*framework.ClusterInfo { | |||
var readyClusterInfoList []*framework.ClusterInfo | |||
for _, c := range s.clusterInfoList { | |||
if util.IsClusterReady(&c.Cluster().Status) { | |||
if util.IsClusterReady(&c.Cluster().Status) && c.Cluster().DeletionTimestamp.IsZero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the two judgment conditions can be combined into one : util.IsClusterReady(c.Cluster());
c.Cluster().DeletionTimestamp.IsZero() is also a special NoReady
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought so, but util.IsClusterReady
is also used in other places and I'm afraid updating the func could impact them
// available clusters are ready clusters NOT in `binding.spec.clusters` | ||
availableClusters := readyClusters.Difference(bindClusters) | ||
// candidate clusters are available clusters that fit the placement | ||
candidateClusters, err := g.findClustersThatFit(ctx, g.scheduleFramework, placement, &spec.Resource, clusterInfoSnapshot, availableClusters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some problems that just filter based the availableClusters ; a scenario is as follow:
- a deployment is appled in cluster A, B, C.
- cluster A add a taint which the deployment doesn't tolerate
- the replicas scale up from 5 to 10
- new pod should not be scheduled to cluster A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it sounds reasonable, but we can not remove cluster A from binding.spec.clusters
directly cos it will break the original workloads in cluster A
It's kind of tricky and IMHO it's something about assignReplicas
, we may implement this in a follow-up
@@ -45,28 +45,52 @@ func NewGenericScheduler( | |||
} | |||
} | |||
|
|||
func (g *genericScheduler) Schedule(ctx context.Context, placement *policyv1alpha1.Placement, spec *workv1alpha2.ResourceBindingSpec) (result ScheduleResult, err error) { | |||
// evict indicates whether to remove reserved clusters that don't fit the placement | |||
func (g *genericScheduler) Schedule(ctx context.Context, placement *policyv1alpha1.Placement, spec *workv1alpha2.ResourceBindingSpec, evict bool) (result ScheduleResult, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function description should introduce the Behavior first, then important parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update it once I'm back from trip
// reserve unready clusters if failover is disabled | ||
reservedClusters = reservedClusters.Union(unreadyBindClusters) | ||
} else if len(candidateClusters) < unreadyBindClusters.Len() { | ||
// *DON'T evict unready clusters in `binding.spec.clusters` if there are insufficient candidate clusters* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the num of bindClusters isn't have to be the cluster num of the next scheduling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the previous behavior:
karmada/pkg/scheduler/core/generic_scheduler.go
Lines 279 to 287 in 8ceb9df
// TODO: should schedule as much as possible? | |
deltaLen := len(spec.Clusters) - len(reservedClusters) | |
if len(candidateClusters) < deltaLen { | |
// for ReplicaSchedulingTypeDivided, we will try to migrate replicas to the other health clusters | |
if placement.ReplicaScheduling == nil || placement.ReplicaScheduling.ReplicaSchedulingType == policyv1alpha1.ReplicaSchedulingTypeDuplicated { | |
klog.Warningf("ignore reschedule binding as insufficient available cluster") | |
return ScheduleResult{}, nil | |
} | |
} |
Are we gonna change the old behavior? That would be a breaking change
There's no need to requeue rbs on delete since they are already requeued when the karmada/pkg/scheduler/scheduler.go Lines 620 to 624 in a08dd3d
|
Signed-off-by: dddddai <dddwq@foxmail.com>
Signed-off-by: dddddai <dddwq@foxmail.com>
Signed-off-by: dddddai <dddwq@foxmail.com>
What type of PR is this?
/kind feature
What this PR does / why we need it:
Please refer to #829 (comment)
Which issue(s) this PR fixes:
Part of #829
Special notes for your reviewer:
Does this PR introduce a user-facing change?: