Reschedule bindings on cluster change #829

dddddai · 2021-10-18T12:13:26Z

What happened:
Unjoined clusters still remain in binding.spec.clusters

What you expected to happen:
Unjoined clusters should be deleted from binding.spec.clusters

How to reproduce it (as minimally and precisely as possible):
1.Set up environment(script v0.8)

root@myserver:~/karmada# hack/local-up-karmada.sh

root@myserver:~/karmada# hack/create-cluster.sh member1 $HOME/.kube/karmada.config

root@myserver:~/karmada# kubectl config use-context karmada-apiserver

root@myserver:~/karmada# karmadactl join member1 --cluster-kubeconfig=$HOME/.kube/karmada.config

root@myserver:~/karmada# kubectl apply -f samples/nginx

root@myserver:~/karmada# kubectl get deploy
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   1/1     1            1           47h

2.Unjoin member1

root@myserver:~/karmada# karmadactl unjoin member1

root@myserver:~/karmada# kubectl get clusters
No resources found

3.Check binding.spec.clusters

root@myserver:~/karmada# kubectl describe rb
......
Spec:
  Clusters:
    Name:  member1
......

Anything else we need to know?:
Is it an expected behavior? If not, who is supposed to take the responsibility to delete unjoined clusters from binding? Scheduler or other controllers (like cluster controller)?

Environment:

Karmada version:v0.8.0
Others:

The text was updated successfully, but these errors were encountered:

RainbowMango · 2021-10-18T12:18:44Z

Thanks for open this we can track the progress and talk about the solution here.
Any idea?
My thought would be re-schedule the bindings. Two scenarios should be considered:

cluster should be cleaned up from RB after cluster be unjoined
cluster should be added to RB after cluster be joined

dddddai · 2021-10-18T13:03:24Z

@RainbowMango Thanks for your reply

cluster should be cleaned up from RB after cluster be unjoined

cluster should be added to RB after cluster be joined

Yes this makes sense, but there might be a new problem, let's consider such a case:

RB is scheduled to cluster1 and cluster2
Unjoin cluster1
cluster1 is removed from binding.spec.clusters
Join cluster1

Scheduler should re-schedule the RB, but there is no valid schedule type for it and it would be handled as AvoidSchedule, which doesn't make sense

karmada/pkg/scheduler/scheduler.go

Lines 834 to 836 in 8e1e16e

    
           if s.allClustersInReadyState(resourceBinding.Spec.Clusters) { 
        
           	return AvoidSchedule 
        
           }

dddddai · 2021-10-19T13:49:36Z

I have another question: does the scheduler perform Failover after unjoining a target cluster?
I don't know cuz I have not practiced this

XiShanYongYe-Chang · 2021-10-20T01:18:14Z

I have another question: does the scheduler perform Failover after unjoining a target cluster? I don't know cuz I have not practiced this

No.

dddddai · 2021-10-20T11:12:23Z

@XiShanYongYe-Chang Thanks for your reply

No.

Is it an expected behavior? Shouldn't Failover care about cluster delete event?

XiShanYongYe-Chang · 2021-10-21T01:46:59Z

When there have cluster delete event, a rescheduling should maybe need to triggere.

How do @RainbowMango think?

RainbowMango · 2021-10-22T03:24:18Z

echo on #829 (comment).

Agree. But no idea how to solve that issue now.

RainbowMango · 2021-10-22T03:26:09Z

/priority important-soon
@dddddai I added this issue to v0.10 milestone, let's fix this in this release.

dddddai · 2021-10-22T04:53:25Z

Agree. But no idea how to solve that issue now.

How about removing applied placement annotation of binding on cluster add/delete? It will make the scheduler reschedule the binding as ReconcileSchedule

karmada/pkg/scheduler/scheduler.go

Lines 824 to 828 in 8e1e16e

    
           appliedPlacement := util.GetLabelValue(resourceBinding.Annotations, util.PolicyPlacementAnnotation) 
        
           if policyPlacementStr != appliedPlacement { 
        
           	return ReconcileSchedule 
        
           }

RainbowMango · 2021-10-22T06:02:46Z

Too tricky I guess.

dddddai · 2021-10-24T14:12:13Z

IMHO, the scheduler should reschedule the bindings on cluster change(eg. cluster joined, cluster unjoined, cluster label changed...)

I add a cluster queue in scheduler for handling cluster events to fix this issue, and it works fine, please see dddddai@568b870

RainbowMango · 2021-10-25T02:11:12Z

IMHO, the scheduler should reschedule the bindings on cluster change(eg. cluster joined, cluster unjoined, cluster label changed...)

+1 but not sure for cluster label changed, too sensitive isn't a good thing.

I'll take a look and comment on your commit.

dddddai · 2021-10-25T02:43:36Z

but not sure for cluster label changed, too sensitive isn't a good thing.

For example, there is a propagation policy whose cluster affinity label selector is foo: bar

Do you mean we should keep the binding scheduled to the cluster though the label foo: bar is removed from that cluster?

RainbowMango · 2021-10-25T03:15:59Z

I think the scenario might be handled by descheduler.

dddddai · 2021-10-25T04:06:52Z

Well, I'm not familiar with descheduler, but in this picture it seems that descheduler does not watch cluster events, it just watches workload and might reschedule them to other clusters due to the original cluster resource insufficiency, and it focuses on ScaleSchedule rather than Reschedule,please correct me if I'm wrong

RainbowMango · 2021-10-25T12:24:59Z

We might discuss the descheduler more at the community meeting. Hope you can come and meet you there.

dddddai · 2021-10-25T12:37:12Z

OK, I'll be there :)

dddddai · 2021-10-26T13:13:27Z

To be clear, I have 4 questions:

Shall we reschedule bindings when cluster field/label changed? (because the updated cluster might (not) fit the propagation policy)
Should FailoverSchedule work when unjoining a cluster?
What does SpreadConstraint.MinGroups mean? Does it mean we should not propagate the resource unless group count >= MinGroups?
If yes, shall we delete all binding.spec.clusters when unjoining a cluster which causes
group count < MinGroups?

The key is: shall we always keep the consistency between propagation policy and binding.spec.clusters?

mrlihanbo · 2021-11-26T06:58:23Z

Looking forward to seeing the progress of this issue

RainbowMango · 2021-11-26T07:01:28Z

@mrlihanbo The #967 is waiting for your review.

mrlihanbo · 2021-11-26T07:16:52Z

@mrlihanbo The #967 is waiting for your review.

I will review the pr now

dddddai · 2021-11-26T07:32:24Z

Looking forward to seeing the progress of this issue

Hi @mrlihanbo, before implementing this we have to answer the 4 questions above

dddddai · 2021-11-30T01:39:51Z

Hello @RainbowMango, any ideas about these questions?

RainbowMango · 2021-11-30T02:40:03Z

Shall we reschedule bindings when cluster field/label changed? (because the updated cluster might (not) fit the propagation policy)

I think we should take this scenario very carefully, it might bring drastic changes. Take Kubernetes as an example, after a pod has been scheduled to a node it will not get re-scheduled even in case of the node label change.

RainbowMango · 2021-11-30T02:42:25Z

Should FailoverSchedule work when unjoining a cluster?

I think we should re-schedule the workload after one of the bonded clusters is unjoined. It doesn't have to be FailoverSchedule(not sure).

RainbowMango · 2021-11-30T02:48:24Z

What does SpreadConstraint.MinGroups mean? Does it mean we should not propagate the resource unless group count >= MinGroups?

Yes, you are right. The SpreadConstraint.MinGroups restrict minimum cluster groups, if the scheduler can not find enough cluster groups, the schedule should trigger failure.

If yes, shall we delete all binding.spec.clusters when unjoining a cluster which causes
group count < MinGroups?

I don't think so, just like the answer above, it's too dangerous, at least for now.

dddddai · 2021-11-30T02:54:08Z

I see, so I guess we should reschedule workloads only when cluster joined/unjoined, right?

RainbowMango · 2021-11-30T02:57:39Z

Let's focus on the scenario of cluster-unjoin for now. The workload should be re-scheduled after one of the bound clusters is unjoined. If no more clusters fit the propagation policy, just remove the unjoined cluster from the binding object.

mrlihanbo · 2021-11-30T03:19:31Z

ld be re-scheduled after one of the bound clusters is unjoined. If no more clusters fit the propagation policy, just remove the unjoined clu

There exist a scenario if workload should be re-scheduled after one of the bound clusters is unjoined:

User specify cluster A, B, C in policy.
Only cluster A and B are joined when first schedule, so the schedule result will be [A, B].
Then, the cluster C is joined.
if cluster A is not ready and trigger reschedule, the new schedule result will be [B, C]

we should make sure that the behavior is what we expected.

mrlihanbo · 2021-11-30T11:08:23Z

I see, so I guess we should reschedule workloads only when cluster joined/unjoined, right?

@dddddai I took a glance at the FailoverSchedule func. Maybe we can do it in this func. Just a suggestion.

dddddai · 2021-11-30T11:34:55Z

@dddddai I took a glance at the FailoverSchedule func. Maybe we can do it in this func. Just a suggestion.

Thanks for digging into it, that's exactly what I did in #1049, the behavior of Reschedule is the same as FailoverSchedule

mrlihanbo · 2021-11-30T11:48:57Z

@dddddai I took a glance at the FailoverSchedule func. Maybe we can do it in this func. Just a suggestion.

Thanks for digging into it, that's exactly what I did in #1049, the behavior of Reschedule is the same as FailoverSchedule

Good job, I will review the pr ASAP.

RainbowMango · 2022-01-18T03:22:54Z

/assign @RainbowMango @huone1

karmada-bot · 2022-01-18T03:22:56Z

@RainbowMango: GitHub didn't allow me to assign the following users: huone1.

Note that only karmada-io members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @RainbowMango @huone1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

huone1 · 2022-01-18T08:22:48Z

let me work it with you @dddddai

dddddai · 2022-01-20T13:42:51Z

@huone1 Thank you!

duanmengkk · 2022-04-24T06:07:59Z

Just a the issue mentioned #1644 ,On the scenario of a new cluster join,should the RB be reschedule? @dddddai

dddddai · 2022-04-24T06:39:44Z

I was thinking so. Would ask @RainbowMango for comments.

duanmengkk · 2022-04-24T06:52:51Z

I think it's similar to kubernetes, just as deschedule(https://github.com/kubernetes-sigs/descheduler) said , the descheduler should responsible for handle rescheduling of cluster status changing,cluster join and cluster unjoin.

dddddai added the kind/bug Categorizes issue or PR as related to a bug. label Oct 18, 2021

RainbowMango added this to the v0.10 milestone Oct 22, 2021

karmada-bot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 22, 2021

dddddai mentioned this issue Nov 15, 2021

Refactor schedule type #967

Merged

dddddai changed the title ~~Delete unjoined cluster from (Cluster)ResourceBinding~~ Reschedule bindings on cluster change Nov 24, 2021

RainbowMango removed this from the v0.10 milestone Nov 26, 2021

dddddai mentioned this issue Nov 30, 2021

Reschedule bindings when unjoining a target cluster #1049

Closed

RainbowMango added this to the v1.1 milestone Jan 12, 2022

karmada-bot assigned RainbowMango Jan 18, 2022

RainbowMango assigned dddddai and unassigned RainbowMango Jan 18, 2022

huone1 mentioned this issue Feb 20, 2022

[feature]support rescheduling when deleting a cluster #1383

Merged

karmada-bot closed this as completed in #1383 Mar 4, 2022

dddddai mentioned this issue Apr 24, 2022

Karmada should rescheduled when a new cluster is joined #1644

Closed

duanmengkk mentioned this issue May 5, 2022

If some member cluster not ready, the topology of the workload may change, which is very dangerous #1717

Closed

dddddai mentioned this issue Jul 27, 2022

Karmada not propagate resources to new member cluster #2261

Closed

Reschedule bindings on cluster change #829

Reschedule bindings on cluster change #829

Comments

dddddai commented Oct 18, 2021 • edited

RainbowMango commented Oct 18, 2021

dddddai commented Oct 18, 2021 • edited

dddddai commented Oct 19, 2021 • edited

XiShanYongYe-Chang commented Oct 20, 2021

dddddai commented Oct 20, 2021

XiShanYongYe-Chang commented Oct 21, 2021

RainbowMango commented Oct 22, 2021

RainbowMango commented Oct 22, 2021

dddddai commented Oct 22, 2021

RainbowMango commented Oct 22, 2021

dddddai commented Oct 24, 2021

RainbowMango commented Oct 25, 2021

dddddai commented Oct 25, 2021

RainbowMango commented Oct 25, 2021

dddddai commented Oct 25, 2021 • edited

RainbowMango commented Oct 25, 2021

dddddai commented Oct 25, 2021

dddddai commented Oct 26, 2021 • edited

mrlihanbo commented Nov 26, 2021

RainbowMango commented Nov 26, 2021

mrlihanbo commented Nov 26, 2021

dddddai commented Nov 26, 2021

dddddai commented Nov 30, 2021 • edited

RainbowMango commented Nov 30, 2021

RainbowMango commented Nov 30, 2021

RainbowMango commented Nov 30, 2021

dddddai commented Nov 30, 2021

RainbowMango commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

dddddai commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

RainbowMango commented Jan 18, 2022

karmada-bot commented Jan 18, 2022

huone1 commented Jan 18, 2022

dddddai commented Jan 20, 2022

duanmengkk commented Apr 24, 2022

dddddai commented Apr 24, 2022

duanmengkk commented Apr 24, 2022

dddddai commented Oct 18, 2021 •

edited

dddddai commented Oct 18, 2021 •

edited

dddddai commented Oct 19, 2021 •

edited

dddddai commented Oct 25, 2021 •

edited

dddddai commented Oct 26, 2021 •

edited

dddddai commented Nov 30, 2021 •

edited