Reschedule bindings when unjoining a target cluster #1049

dddddai · 2021-11-30T09:53:08Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
Please refer to #829 (comment)

Which issue(s) this PR fixes:
Part of #829

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

karmada-scheduler: reschedule bindings when unjoining a target cluster.

karmada-bot · 2021-11-30T09:53:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign rainbowmango after the PR has been reviewed.
You can assign the PR to them by writing /assign @rainbowmango in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mrlihanbo · 2021-11-30T12:26:51Z

we need to do code refactoring here which will failed for Duplicated policy:

	// TODO: should schedule as much as possible?
	deltaLen := len(spec.Clusters) - len(reservedClusters)
	if len(candidateClusters) < deltaLen {
		// for ReplicaSchedulingTypeDivided, we will try to migrate replicas to the other health clusters
		if placement.ReplicaScheduling == nil || placement.ReplicaScheduling.ReplicaSchedulingType == policyv1alpha1.ReplicaSchedulingTypeDuplicated {
			klog.Warningf("ignore reschedule binding as insufficient available cluster")
			return ScheduleResult{}, nil
		}
	}

dddddai · 2021-11-30T13:29:11Z

we need to do code refactoring here which will failed for Duplicated policy:

What do you mean by "failed for Duplicated policy"?
If number of available clusters is less than deltaLen, it would NOT perform reschedule, and I don't know if it's expected

For example, there is a propagation policy that propagates a workload to all clusters, should it perform reschedule when unjoining a cluster? I guess the answer is YES, so is this the difference between Reschedule and FailoverSchedule?

mrlihanbo · 2021-12-01T01:32:16Z

we need to do code refactoring here which will failed for Duplicated policy:

What do you mean by "failed for Duplicated policy"? If number of available clusters is less than deltaLen, it would NOT perform reschedule, and I don't know if it's expected

For example, there is a propagation policy that propagates a workload to all clusters, should it perform reschedule when unjoining a cluster? I guess the answer is YES, so is this the difference between Reschedule and FailoverSchedule?

我直接使用中文吧。FailoverSchedule最早是设计成和spread constraint一起用的。所以它的逻辑是：

原调度结果中有几个集群故障了，从健康集群中选出相同数目的集群补充。对于divided的场景，其实还会触发一次scale schedule。而对于Duplicated场景，不配合spread constrain使用的话，会由于选不到新集群导致保留原有结果。
所以这里产生的一个问题是，我们需要把unjoin的集群从调度结果中移除，一但candidateClusters小于故障集群数，未移除。

原场景：
propagation policy里指定[A, B, C]三个集群，spread constrain设为2，所以调度结果是[A, B]. 如果A故障了，会选出C补上，新结果为[B, C]

问题场景：
propagation policy里指定[A, B, C]三个集群，不使用spread constrain，调度结果是[A, B, C]。如果A故障了，选不出新集群，保留[A, B, C]的调度结果。而我们期望的是把unjoin的集群从调度结果里移除。

dddddai · 2021-12-01T01:39:52Z

如果A故障了，选不出新集群，保留[A, B, C]的调度结果。而我们期望的是把unjoin的集群从调度结果里移除。

明白
所以对于Reschedule，我们不考虑“从健康集群中选出相同数目的集群补充”，对吗？这也是Reschedule和FailoverSchedule的唯一区别？

mrlihanbo · 2021-12-01T02:05:00Z

如果A故障了，选不出新集群，保留[A, B, C]的调度结果。而我们期望的是把unjoin的集群从调度结果里移除。

明白所以对于Reschedule，我们不考虑“从健康集群中选出相同数目的集群补充”，对吗？这也是Reschedule和FailoverSchedule的唯一区别？

有点复杂，需要结合是否配置了spread constraint来判断。这里之前写的也不完善，简单的选出相同数目的集群，其实只能满足SpreadByFieldCluster这种类型的spread constraint。

如果未使用spread constraint，Reschedule应该是选出所有符合条件的集群，并把unjoin的集群从调度结果中移除。
若使用spread constraint，重调度的结果仍需满足spread constraint约束。

Garrybest · 2021-12-01T02:23:50Z

删掉不健康集群的调度结果，保留正常集群的调度结果，然后代码直接走normal scheduling逻辑，让调度器二次调度，这样可以么？#1051中我已经把scale scheduling和normal scheduling合并了，我在想failover的逻辑是否可以进行合并？

Garrybest · 2021-12-01T02:37:15Z

When all types share the same scheduling logic, spread constraint will be concerned automatically. Now Failover has some defects, e.g., it does not take idle resource into consideration when doing rescheduling. I thought we may merge all types into one scheduling process together.

dddddai · 2021-12-01T02:50:05Z

When all types share the same scheduling logic, spread constraint will be concerned automatically. Now Failover has some defects, e.g., it does not take idle resource into consideration when doing rescheduling. I thought we may merge all types into one scheduling process together.

Agree, I think this is the best way to solve the problem

mrlihanbo · 2021-12-01T02:51:23Z

删掉不健康集群的调度结果，保留正常集群的调度结果，然后代码直接走normal scheduling逻辑，让调度器二次调度，这样可以么？#1051中我已经把scale scheduling和normal scheduling合并了，我在想failover的逻辑是否可以进行合并？

好主意，我去看下合并的pr。failover确实需要大优化。

RainbowMango · 2021-12-01T12:11:36Z

Please cc me after you made an agreement. :)

dddddai · 2021-12-03T00:18:32Z

Hi @Garrybest, I commited a new patch to merge failover schedule with normal schedule

Garrybest · 2021-12-03T02:14:59Z

Thanks @dddddai, I will check it later until #1051 get merged.

Garrybest · 2021-12-03T02:15:07Z

/assign

pkg/scheduler/core/generic_scheduler.go

huone1 · 2022-01-17T11:58:09Z

@dddddai Let us discuss the PR in the weekly meeting tomorrow and the scenario “unjoining a target cluster” is a common user‘s ’behaviors

huone1 · 2022-01-17T12:16:20Z

there is nothing to requeue the rb about the cluster deleted in function deleteCluster ; I think it should add the requeue logic

karmada/pkg/scheduler/scheduler.go

Lines 639 to 660 in 8ceb9df

    
           func (s *Scheduler) deleteCluster(obj interface{}) { 
        
           	var cluster *clusterv1alpha1.Cluster 
        
           	switch t := obj.(type) { 
        
           	case *clusterv1alpha1.Cluster: 
        
           		cluster = t 
        
           	case cache.DeletedFinalStateUnknown: 
        
           		var ok bool 
        
           		cluster, ok = t.Obj.(*clusterv1alpha1.Cluster) 
        
           		if !ok { 
        
           			klog.Errorf("cannot convert to clusterv1alpha1.Cluster: %v", t.Obj) 
        
           			return 
        
           		} 
        
           	default: 
        
           		klog.Errorf("cannot convert to clusterv1alpha1.Cluster: %v", t) 
        
           		return 
        
           	} 
        
           	klog.V(3).Infof("Delete event for cluster %s", cluster.Name) 
        
           	if s.enableSchedulerEstimator { 
        
           		s.schedulerEstimatorWorker.Add(cluster.Name) 
        
           	} 
        
           }

huone1 · 2022-01-17T12:23:34Z

pkg/scheduler/cache/snapshot.go

@@ -33,24 +33,29 @@ func (s *Snapshot) GetClusters() []*framework.ClusterInfo {
 func (s *Snapshot) GetReadyClusters() []*framework.ClusterInfo {
 	var readyClusterInfoList []*framework.ClusterInfo
 	for _, c := range s.clusterInfoList {
-		if util.IsClusterReady(&c.Cluster().Status) {
+		if util.IsClusterReady(&c.Cluster().Status) && c.Cluster().DeletionTimestamp.IsZero() {


the two judgment conditions can be combined into one : util.IsClusterReady(c.Cluster());
c.Cluster().DeletionTimestamp.IsZero() is also a special NoReady

I also thought so, but util.IsClusterReady is also used in other places and I'm afraid updating the func could impact them

huone1 · 2022-01-17T12:36:50Z

pkg/scheduler/core/generic_scheduler.go

+	// available clusters are ready clusters NOT in `binding.spec.clusters`
+	availableClusters := readyClusters.Difference(bindClusters)
+	// candidate clusters are available clusters that fit the placement
+	candidateClusters, err := g.findClustersThatFit(ctx, g.scheduleFramework, placement, &spec.Resource, clusterInfoSnapshot, availableClusters)


there are some problems that just filter based the availableClusters ; a scenario is as follow:

a deployment is appled in cluster A, B, C.

cluster A add a taint which the deployment doesn't tolerate

the replicas scale up from 5 to 10

new pod should not be scheduled to cluster A

Yeah it sounds reasonable, but we can not remove cluster A from binding.spec.clusters directly cos it will break the original workloads in cluster A

It's kind of tricky and IMHO it's something about assignReplicas, we may implement this in a follow-up

huone1 · 2022-01-17T13:04:42Z

pkg/scheduler/core/generic_scheduler.go

@@ -45,28 +45,52 @@ func NewGenericScheduler(
 	}
 }

-func (g *genericScheduler) Schedule(ctx context.Context, placement *policyv1alpha1.Placement, spec *workv1alpha2.ResourceBindingSpec) (result ScheduleResult, err error) {
+// evict indicates whether to remove reserved clusters that don't fit the placement
+func (g *genericScheduler) Schedule(ctx context.Context, placement *policyv1alpha1.Placement, spec *workv1alpha2.ResourceBindingSpec, evict bool) (result ScheduleResult, err error) {


the function description should introduce the Behavior first, then important parameters.

I will update it once I'm back from trip

huone1 · 2022-01-17T13:10:18Z

pkg/scheduler/core/generic_scheduler.go

+			// reserve unready clusters if failover is disabled
+			reservedClusters = reservedClusters.Union(unreadyBindClusters)
+		} else if len(candidateClusters) < unreadyBindClusters.Len() {
+			// *DON'T evict unready clusters in `binding.spec.clusters` if there are insufficient candidate clusters*


I think the num of bindClusters isn't have to be the cluster num of the next scheduling

This is exactly the previous behavior:

karmada/pkg/scheduler/core/generic_scheduler.go

Lines 279 to 287 in 8ceb9df

// TODO: should schedule as much as possible?

deltaLen := len(spec.Clusters) - len(reservedClusters)

if len(candidateClusters) < deltaLen {

// for ReplicaSchedulingTypeDivided, we will try to migrate replicas to the other health clusters

if placement.ReplicaScheduling == nil || placement.ReplicaScheduling.ReplicaSchedulingType == policyv1alpha1.ReplicaSchedulingTypeDuplicated {

klog.Warningf("ignore reschedule binding as insufficient available cluster")

return ScheduleResult{}, nil

}

}

Are we gonna change the old behavior? That would be a breaking change

pkg/scheduler/core/generic_scheduler.go

dddddai · 2022-01-17T13:22:45Z

@dddddai Let us discuss the PR in the weekly meeting tomorrow and the scenario “unjoining a target cluster” is a common user‘s ’behaviors

@huone1 sorry I would be travelling in the next three days, how about discussing it in the slack channel :)

dddddai · 2022-01-17T13:51:37Z

there is nothing to requeue the rb about the cluster deleted in function deleteCluster ; I think it should add the requeue logic

There's no need to requeue rbs on delete since they are already requeued when the DeletionTimeStamp is set

karmada/pkg/scheduler/scheduler.go

Lines 620 to 624 in a08dd3d

    
           // Check if cluster is unjoined 
        
           if !newCluster.DeletionTimestamp.IsZero() { 
        
           	// Trigger reschedule when cluster is unjoined 
        
           	s.enqueueAffectedBinding(newCluster.Name) 
        
           	s.enqueueAffectedClusterBinding(newCluster.Name)

huone1 · 2022-01-17T16:59:39Z

@dddddai Let us discuss the PR in the weekly meeting tomorrow and the scenario “unjoining a target cluster” is a common user‘s ’behaviors

@huone1 sorry I would be travelling in the next three days, how about discussing it in the slack channel :)

OK，have fun！

Signed-off-by: dddddai <dddwq@foxmail.com>

karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 30, 2021

karmada-bot requested review from qianjun1993 and XiShanYongYe-Chang November 30, 2021 09:53

karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 30, 2021

dddddai mentioned this pull request Nov 30, 2021

Reschedule bindings on cluster change #829

Closed

dddddai force-pushed the reschedule branch from a0da14a to bf3f3da Compare December 2, 2021 01:37

dddddai marked this pull request as draft December 2, 2021 02:40

karmada-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 2, 2021

dddddai force-pushed the reschedule branch 2 times, most recently from a13375d to c5ece75 Compare December 2, 2021 14:31

dddddai marked this pull request as ready for review December 2, 2021 15:00

karmada-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 2, 2021

karmada-bot assigned Garrybest Dec 3, 2021

dddddai force-pushed the reschedule branch 2 times, most recently from 660220b to e2f7f16 Compare December 4, 2021 07:59

karmada-bot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 4, 2021

mrlihanbo reviewed Dec 8, 2021

View reviewed changes

pkg/scheduler/core/generic_scheduler.go Outdated Show resolved Hide resolved

dddddai force-pushed the reschedule branch 2 times, most recently from d65ab94 to 7abe58d Compare December 8, 2021 01:57

dddddai marked this pull request as draft December 8, 2021 02:00

karmada-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 8, 2021

dddddai force-pushed the reschedule branch 2 times, most recently from ff11011 to c83dff6 Compare December 8, 2021 02:44

mrlihanbo reviewed Dec 8, 2021

View reviewed changes

pkg/scheduler/core/generic_scheduler.go Outdated Show resolved Hide resolved

mrlihanbo reviewed Dec 8, 2021

View reviewed changes

pkg/scheduler/core/generic_scheduler.go Outdated Show resolved Hide resolved

dddddai force-pushed the reschedule branch 2 times, most recently from 12be21d to 2590917 Compare December 8, 2021 03:36

dddddai marked this pull request as ready for review December 8, 2021 04:40

karmada-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 8, 2021

Garrybest reviewed Dec 8, 2021

View reviewed changes

pkg/scheduler/core/generic_scheduler.go Outdated Show resolved Hide resolved

dddddai force-pushed the reschedule branch 2 times, most recently from 46e7f49 to a08dd3d Compare December 22, 2021 03:44

dddddai mentioned this pull request Jan 14, 2022

FailoverSchedule support spread constraints #1258

Closed

huone1 reviewed Jan 17, 2022

View reviewed changes

pkg/scheduler/core/generic_scheduler.go Outdated Show resolved Hide resolved

dddddai force-pushed the reschedule branch from a08dd3d to d96fed6 Compare January 18, 2022 00:44

dddddai added 2 commits January 21, 2022 09:38

reschedule bindings when unjoining target cluster

50c0638

Signed-off-by: dddddai <dddwq@foxmail.com>

add e2e test for reschedule

35a92d8

Signed-off-by: dddddai <dddwq@foxmail.com>

dddddai force-pushed the reschedule branch from d96fed6 to 6667c2f Compare January 21, 2022 02:29

merge reschedule with normal schedule

6667c2f

Signed-off-by: dddddai <dddwq@foxmail.com>

dddddai closed this Mar 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reschedule bindings when unjoining a target cluster #1049

Reschedule bindings when unjoining a target cluster #1049

dddddai commented Nov 30, 2021

karmada-bot commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

dddddai commented Nov 30, 2021 •

edited

mrlihanbo commented Dec 1, 2021

dddddai commented Dec 1, 2021 •

edited

mrlihanbo commented Dec 1, 2021

Garrybest commented Dec 1, 2021

Garrybest commented Dec 1, 2021

dddddai commented Dec 1, 2021 •

edited

mrlihanbo commented Dec 1, 2021

RainbowMango commented Dec 1, 2021

dddddai commented Dec 3, 2021

Garrybest commented Dec 3, 2021

Garrybest commented Dec 3, 2021

huone1 commented Jan 17, 2022

huone1 commented Jan 17, 2022 •

edited

huone1 Jan 17, 2022

dddddai Jan 17, 2022 •

edited

huone1 Jan 17, 2022

dddddai Jan 17, 2022

huone1 Jan 17, 2022

dddddai Jan 17, 2022

huone1 Jan 17, 2022

dddddai Jan 17, 2022

dddddai commented Jan 17, 2022

dddddai commented Jan 17, 2022

huone1 commented Jan 17, 2022 •

edited

	// TODO: should schedule as much as possible?
	deltaLen := len(spec.Clusters) - len(reservedClusters)
	if len(candidateClusters) < deltaLen {
	// for ReplicaSchedulingTypeDivided, we will try to migrate replicas to the other health clusters
	if placement.ReplicaScheduling == nil \|\| placement.ReplicaScheduling.ReplicaSchedulingType == policyv1alpha1.ReplicaSchedulingTypeDuplicated {
	klog.Warningf("ignore reschedule binding as insufficient available cluster")
	return ScheduleResult{}, nil
	}
	}

Reschedule bindings when unjoining a target cluster #1049

Reschedule bindings when unjoining a target cluster #1049

Conversation

dddddai commented Nov 30, 2021

karmada-bot commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

dddddai commented Nov 30, 2021 • edited

mrlihanbo commented Dec 1, 2021

dddddai commented Dec 1, 2021 • edited

mrlihanbo commented Dec 1, 2021

Garrybest commented Dec 1, 2021

Garrybest commented Dec 1, 2021

dddddai commented Dec 1, 2021 • edited

mrlihanbo commented Dec 1, 2021

RainbowMango commented Dec 1, 2021

dddddai commented Dec 3, 2021

Garrybest commented Dec 3, 2021

Garrybest commented Dec 3, 2021

huone1 commented Jan 17, 2022

huone1 commented Jan 17, 2022 • edited

huone1 Jan 17, 2022

Choose a reason for hiding this comment

dddddai Jan 17, 2022 • edited

Choose a reason for hiding this comment

huone1 Jan 17, 2022

Choose a reason for hiding this comment

dddddai Jan 17, 2022

Choose a reason for hiding this comment

huone1 Jan 17, 2022

Choose a reason for hiding this comment

dddddai Jan 17, 2022

Choose a reason for hiding this comment

huone1 Jan 17, 2022

Choose a reason for hiding this comment

dddddai Jan 17, 2022

Choose a reason for hiding this comment

dddddai commented Jan 17, 2022

dddddai commented Jan 17, 2022

huone1 commented Jan 17, 2022 • edited

dddddai commented Nov 30, 2021 •

edited

dddddai commented Dec 1, 2021 •

edited

dddddai commented Dec 1, 2021 •

edited

huone1 commented Jan 17, 2022 •

edited

dddddai Jan 17, 2022 •

edited

huone1 commented Jan 17, 2022 •

edited