Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reschedule bindings on cluster change #829

Closed
dddddai opened this issue Oct 18, 2021 · 39 comments · Fixed by #1383
Closed

Reschedule bindings on cluster change #829

dddddai opened this issue Oct 18, 2021 · 39 comments · Fixed by #1383
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@dddddai
Copy link
Member

dddddai commented Oct 18, 2021

What happened:
Unjoined clusters still remain in binding.spec.clusters

What you expected to happen:
Unjoined clusters should be deleted from binding.spec.clusters

How to reproduce it (as minimally and precisely as possible):
1.Set up environment(script v0.8)

root@myserver:~/karmada# hack/local-up-karmada.sh

root@myserver:~/karmada# hack/create-cluster.sh member1 $HOME/.kube/karmada.config

root@myserver:~/karmada# kubectl config use-context karmada-apiserver

root@myserver:~/karmada# karmadactl join member1 --cluster-kubeconfig=$HOME/.kube/karmada.config

root@myserver:~/karmada# kubectl apply -f samples/nginx

root@myserver:~/karmada# kubectl get deploy
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   1/1     1            1           47h

2.Unjoin member1

root@myserver:~/karmada# karmadactl unjoin member1

root@myserver:~/karmada# kubectl get clusters
No resources found

3.Check binding.spec.clusters

root@myserver:~/karmada# kubectl describe rb
......
Spec:
  Clusters:
    Name:  member1
......

Anything else we need to know?:
Is it an expected behavior? If not, who is supposed to take the responsibility to delete unjoined clusters from binding? Scheduler or other controllers (like cluster controller)?

Environment:

  • Karmada version:v0.8.0
  • Others:
@dddddai dddddai added the kind/bug Categorizes issue or PR as related to a bug. label Oct 18, 2021
@RainbowMango
Copy link
Member

Thanks for open this we can track the progress and talk about the solution here.
Any idea?
My thought would be re-schedule the bindings. Two scenarios should be considered:

  • cluster should be cleaned up from RB after cluster be unjoined
  • cluster should be added to RB after cluster be joined

@dddddai
Copy link
Member Author

dddddai commented Oct 18, 2021

@RainbowMango Thanks for your reply

  • cluster should be cleaned up from RB after cluster be unjoined
  • cluster should be added to RB after cluster be joined

Yes this makes sense, but there might be a new problem, let's consider such a case:

  1. RB is scheduled to cluster1 and cluster2
  2. Unjoin cluster1
  3. cluster1 is removed from binding.spec.clusters
  4. Join cluster1
  5. Scheduler should re-schedule the RB, but there is no valid schedule type for it and it would be handled as AvoidSchedule, which doesn't make sense
    if s.allClustersInReadyState(resourceBinding.Spec.Clusters) {
    return AvoidSchedule
    }

@dddddai
Copy link
Member Author

dddddai commented Oct 19, 2021

I have another question: does the scheduler perform Failover after unjoining a target cluster?
I don't know cuz I have not practiced this

@XiShanYongYe-Chang
Copy link
Member

I have another question: does the scheduler perform Failover after unjoining a target cluster? I don't know cuz I have not practiced this

No.

@dddddai
Copy link
Member Author

dddddai commented Oct 20, 2021

@XiShanYongYe-Chang Thanks for your reply

No.

Is it an expected behavior? Shouldn't Failover care about cluster delete event?

@XiShanYongYe-Chang
Copy link
Member

When there have cluster delete event, a rescheduling should maybe need to triggere.

How do @RainbowMango think?

@RainbowMango
Copy link
Member

echo on #829 (comment).

Agree. But no idea how to solve that issue now.

@RainbowMango RainbowMango added this to the v0.10 milestone Oct 22, 2021
@RainbowMango
Copy link
Member

/priority important-soon
@dddddai I added this issue to v0.10 milestone, let's fix this in this release.

@karmada-bot karmada-bot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 22, 2021
@dddddai
Copy link
Member Author

dddddai commented Oct 22, 2021

Agree. But no idea how to solve that issue now.

How about removing applied placement annotation of binding on cluster add/delete? It will make the scheduler reschedule the binding as ReconcileSchedule

appliedPlacement := util.GetLabelValue(resourceBinding.Annotations, util.PolicyPlacementAnnotation)
if policyPlacementStr != appliedPlacement {
return ReconcileSchedule
}

@RainbowMango
Copy link
Member

Too tricky I guess.

@dddddai
Copy link
Member Author

dddddai commented Oct 24, 2021

IMHO, the scheduler should reschedule the bindings on cluster change(eg. cluster joined, cluster unjoined, cluster label changed...)

I add a cluster queue in scheduler for handling cluster events to fix this issue, and it works fine, please see dddddai@568b870

@RainbowMango
Copy link
Member

IMHO, the scheduler should reschedule the bindings on cluster change(eg. cluster joined, cluster unjoined, cluster label changed...)

+1 but not sure for cluster label changed, too sensitive isn't a good thing.

I'll take a look and comment on your commit.

@dddddai
Copy link
Member Author

dddddai commented Oct 25, 2021

but not sure for cluster label changed, too sensitive isn't a good thing.

For example, there is a propagation policy whose cluster affinity label selector is foo: bar

Do you mean we should keep the binding scheduled to the cluster though the label foo: bar is removed from that cluster?

@RainbowMango
Copy link
Member

I think the scenario might be handled by descheduler.

@dddddai
Copy link
Member Author

dddddai commented Oct 25, 2021

Well, I'm not familiar with descheduler, but in this picture it seems that descheduler does not watch cluster events, it just watches workload and might reschedule them to other clusters due to the original cluster resource insufficiency, and it focuses on ScaleSchedule rather than Reschedule,please correct me if I'm wrong

@RainbowMango
Copy link
Member

We might discuss the descheduler more at the community meeting. Hope you can come and meet you there.

@dddddai
Copy link
Member Author

dddddai commented Oct 25, 2021

OK, I'll be there :)

@dddddai
Copy link
Member Author

dddddai commented Oct 26, 2021

To be clear, I have 4 questions:

  1. Shall we reschedule bindings when cluster field/label changed? (because the updated cluster might (not) fit the propagation policy)
  2. Should FailoverSchedule work when unjoining a cluster?
  3. What does SpreadConstraint.MinGroups mean? Does it mean we should not propagate the resource unless group count >= MinGroups?
    If yes, shall we delete all binding.spec.clusters when unjoining a cluster which causes
    group count < MinGroups?

The key is: shall we always keep the consistency between propagation policy and binding.spec.clusters?

@dddddai dddddai changed the title Delete unjoined cluster from (Cluster)ResourceBinding Reschedule bindings on cluster change Nov 24, 2021
@mrlihanbo
Copy link

Looking forward to seeing the progress of this issue

@RainbowMango
Copy link
Member

@mrlihanbo The #967 is waiting for your review.

@RainbowMango RainbowMango removed this from the v0.10 milestone Nov 26, 2021
@mrlihanbo
Copy link

@mrlihanbo The #967 is waiting for your review.

I will review the pr now

@dddddai
Copy link
Member Author

dddddai commented Nov 26, 2021

Looking forward to seeing the progress of this issue

Hi @mrlihanbo, before implementing this we have to answer the 4 questions above

@dddddai
Copy link
Member Author

dddddai commented Nov 30, 2021

Hello @RainbowMango, any ideas about these questions?

@RainbowMango
Copy link
Member

Shall we reschedule bindings when cluster field/label changed? (because the updated cluster might (not) fit the propagation policy)

I think we should take this scenario very carefully, it might bring drastic changes. Take Kubernetes as an example, after a pod has been scheduled to a node it will not get re-scheduled even in case of the node label change.

@RainbowMango
Copy link
Member

Should FailoverSchedule work when unjoining a cluster?

I think we should re-schedule the workload after one of the bonded clusters is unjoined. It doesn't have to be FailoverSchedule(not sure).

@RainbowMango
Copy link
Member

What does SpreadConstraint.MinGroups mean? Does it mean we should not propagate the resource unless group count >= MinGroups?

Yes, you are right. The SpreadConstraint.MinGroups restrict minimum cluster groups, if the scheduler can not find enough cluster groups, the schedule should trigger failure.

If yes, shall we delete all binding.spec.clusters when unjoining a cluster which causes
group count < MinGroups?

I don't think so, just like the answer above, it's too dangerous, at least for now.

@dddddai
Copy link
Member Author

dddddai commented Nov 30, 2021

I see, so I guess we should reschedule workloads only when cluster joined/unjoined, right?

@RainbowMango
Copy link
Member

Let's focus on the scenario of cluster-unjoin for now. The workload should be re-scheduled after one of the bound clusters is unjoined. If no more clusters fit the propagation policy, just remove the unjoined cluster from the binding object.

@mrlihanbo
Copy link

ld be re-scheduled after one of the bound clusters is unjoined. If no more clusters fit the propagation policy, just remove the unjoined clu

There exist a scenario if workload should be re-scheduled after one of the bound clusters is unjoined:

  • User specify cluster A, B, C in policy.
  • Only cluster A and B are joined when first schedule, so the schedule result will be [A, B].
  • Then, the cluster C is joined.
  • if cluster A is not ready and trigger reschedule, the new schedule result will be [B, C]

we should make sure that the behavior is what we expected.

@mrlihanbo
Copy link

I see, so I guess we should reschedule workloads only when cluster joined/unjoined, right?

@dddddai I took a glance at the FailoverSchedule func. Maybe we can do it in this func. Just a suggestion.

@dddddai
Copy link
Member Author

dddddai commented Nov 30, 2021

@dddddai I took a glance at the FailoverSchedule func. Maybe we can do it in this func. Just a suggestion.

Thanks for digging into it, that's exactly what I did in #1049, the behavior of Reschedule is the same as FailoverSchedule

@mrlihanbo
Copy link

@dddddai I took a glance at the FailoverSchedule func. Maybe we can do it in this func. Just a suggestion.

Thanks for digging into it, that's exactly what I did in #1049, the behavior of Reschedule is the same as FailoverSchedule

Good job, I will review the pr ASAP.

@RainbowMango RainbowMango added this to the v1.1 milestone Jan 12, 2022
@RainbowMango
Copy link
Member

/assign @RainbowMango @huone1

@karmada-bot
Copy link
Collaborator

@RainbowMango: GitHub didn't allow me to assign the following users: huone1.

Note that only karmada-io members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @RainbowMango @huone1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@RainbowMango RainbowMango assigned dddddai and unassigned RainbowMango Jan 18, 2022
@huone1
Copy link
Contributor

huone1 commented Jan 18, 2022

let me work it with you @dddddai

@dddddai
Copy link
Member Author

dddddai commented Jan 20, 2022

@huone1 Thank you!

@duanmengkk
Copy link
Member

Just a the issue mentioned #1644 ,On the scenario of a new cluster join,should the RB be reschedule? @dddddai

@dddddai
Copy link
Member Author

dddddai commented Apr 24, 2022

I was thinking so. Would ask @RainbowMango for comments.

@duanmengkk
Copy link
Member

I think it's similar to kubernetes, just as deschedule(https://github.com/kubernetes-sigs/descheduler) said , the descheduler should responsible for handle rescheduling of cluster status changing,cluster join and cluster unjoin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants