Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature]support rescheduling when deleting a cluster #1383

Merged
merged 1 commit into from Mar 4, 2022

Conversation

huone1
Copy link
Contributor

@huone1 huone1 commented Feb 20, 2022

Signed-off-by: huone1 huwanxing@huawei.com

What type of PR is this?

/kind feature

What this PR does / why we need it:
support rescheduling when deleting a cluster
Which issue(s) this PR fixes:
Fixes #829
Fixes #1411

Special notes for your reviewer:
This PR does not consider the spreadconstraints and it will be considered in the scheduler refactoring .
Does this PR introduce a user-facing change?:

`karmada-scheduler`: Workload is now able to reschedule after the cluster is unregistered.

@karmada-bot karmada-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Feb 20, 2022
@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 20, 2022
@huone1
Copy link
Contributor Author

huone1 commented Feb 20, 2022

/cc @dddddai

@huone1
Copy link
Contributor Author

huone1 commented Feb 21, 2022

/cc @mrlihanbo

@mrlihanbo
Copy link

notes: This PR does not consider the spreadconstraints.
/lgtm

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 22, 2022
@RainbowMango
Copy link
Member

Please solve the conflict. @huone1

@huone1
Copy link
Contributor Author

huone1 commented Mar 1, 2022

issue #1411 is also fixed in the PR

@RainbowMango RainbowMango linked an issue Mar 1, 2022 that may be closed by this pull request
@@ -48,7 +51,7 @@ func divideReplicasByResource(
} else if assignedReplicas < spec.Replicas {
// We need to enlarge the replicas in terms of the previous result (if exists).
// First scheduling is considered as a special kind of scaling up.
newTargetClusters, err := scaleUpScheduleByReplicaDivisionPreference(clusters, spec, preference)
newTargetClusters, err := scaleUpScheduleByReplicaDivisionPreference(clusters, spec, preference, scheduledClusters, assignedReplicas)
if err != nil {
return nil, fmt.Errorf("failed to scaleUp: %v", err)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if assignedReplicas == spec.Replicas, but len(scheduledClusters) < len(spec.clusters)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if assignedReplicas == spec.Replicas, but len(scheduledClusters) < len(spec.clusters)

I didn't think of a similar rescheduling scenario

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please give an example? It seems not possible to happen "in one event"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say we have 2 clusters [A、B] , both with allocatable resources: 4C8G
a deployment requests: 2C4G, 3 replicas
and the propagationPolicy is:

placement:
    clusterAffinity:
      clusterNames:
      - A
      - B
    replicaScheduling:
      replicaDivisionPreference: Weighted
      replicaSchedulingType: Divided
      weightPreference:
        dynamicWeight: AvailableReplic

so the deployment was placed in both A and B, let's assume A has 2 replicas and B has 1。

  1. update propagationPolicy placement.clusterAffinity.clusterNames=[A], now no enough resources for this deployment to reschedule, so nothing happened;
  2. update deployment.spec.replicas=2, with this pr, nothing happened again..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right that it should return the scheduledClusters as the result

Signed-off-by: huone1 <huwanxing@huawei.com>
@karmada-bot karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 3, 2022
@huone1
Copy link
Contributor Author

huone1 commented Mar 3, 2022

Please solve the conflict. @huone1

ok, it is done

@RainbowMango
Copy link
Member

cc @dddddai @Garrybest
Can you take a look?

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Leave LGTM to @Garrybest or @dddddai

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 3, 2022
Copy link
Member

@dddddai dddddai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one concern, would leave lgtm to @Garrybest

Comment on lines -276 to -281
deltaLen := len(spec.Clusters) - len(reservedClusters)
if len(candidateClusters) < deltaLen {
// for ReplicaSchedulingTypeDivided, we will try to migrate replicas to the other health clusters
if placement.ReplicaScheduling == nil || placement.ReplicaScheduling.ReplicaSchedulingType == policyv1alpha1.ReplicaSchedulingTypeDuplicated {
klog.Warningf("ignore reschedule binding as insufficient available cluster")
return ScheduleResult{}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say we have 2 clusters [A、B] (with Failover enabled)

  1. a rb was scheduled to [A、B] (duplicated)
  2. cluster A becomes not ready
  3. the orignal behavior: scheduled to [A、B] while the current behavior: scheduled to [B]

I'm not sure if we want to keep the original behavior, if not then lgtm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrlihanbo Please see if this change is reasonable.

@Garrybest
Copy link
Member

/assign

// Step 1: Get previous total sum of replicas.
assignedReplicas := util.GetSumOfReplicas(spec.Clusters)
// Step 1: Find the ready clusters that have old replicas
scheduledClusters := findOutScheduledCluster(spec.Clusters, clusters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just delete the unjoined clusters from spec.clusters?

Then the replicas in unjoined clusters will be considered as a special kind of scaling up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we don't need more changes, am I right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a special kind of scaling up

@Garrybest
Copy link
Member

/lgtm

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 4, 2022
@karmada-bot karmada-bot merged commit b33bda9 into karmada-io:master Mar 4, 2022
@huone1 huone1 deleted the DeleteCluster branch March 7, 2022 02:31
@Garrybest
Copy link
Member

I think there is a bug here. Imagine:

  1. Failover is disabled.
  2. A cluster is not ready.

We should not erase the scheduling replicas of the not-ready cluster in RB because Failover is disabled. However, if it happens to scale up, e.g. the user scale up desired replicas. This PR will delete all replicas in this cluster which is dangerous. PTAL, @huone1

@huone1
Copy link
Contributor Author

huone1 commented Apr 22, 2022

when Failover is disabled, the rescheduling for not-ready cluster is not triggered;
if A cluster is not ready and unhealthy, i think it is normal to delete all replicas in this cluster

@Garrybest
Copy link
Member

I'm afraid not. When scaling up, rescheduling will be triggered.

@huone1
Copy link
Contributor Author

huone1 commented Apr 22, 2022

I'm afraid not. When scaling up, rescheduling will be triggered.

what is wrong with it

@RainbowMango
Copy link
Member

However, if it happens to scale up, e.g. the user scale up desired replicas. This PR will delete all replicas in this cluster which is dangerous.

Why is it dangerous?

What I'm thinking about is how to postpone the deletion operation until the desired replicas are all in the available state. That guarantees there are always sufficient replicas running at any time.

@Garrybest
Copy link
Member

Failover is disabled, but under this circumstance all replicas will be removed. It does not match the expectation.

@Garrybest
Copy link
Member

Failover is disabled because the user does not want to remove all replicas when a cluster is not ready. If the api-server of a member cluster is temporarily down and Failover is disabled, we potentially do not want the replicas in this member cluster to be evicted when the user triggers scaling up. However, now it does not match the expectation.

@RainbowMango
Copy link
Member

RainbowMango commented Apr 22, 2022

Agree with @Garrybest.
We should consider the false positive cluster failure seriously. Can we hold the replicas(for the un-healthy cluster) unchanged even in the case of scaling up and decreasing the replicas in the case of scaling down.?

@huone1
Copy link
Contributor Author

huone1 commented Apr 22, 2022

it is different between deleting a cluster and the change from healthy to unhealthy。it is reasonable to migrate the replicas to Other Clusters when deleting a cluster

@RainbowMango
Copy link
Member

Yeah, deleting a cluster is another story.

@Garrybest
Copy link
Member

Garrybest commented Apr 22, 2022

Let me describe an example.

  1. We have a deployment with a 5 desired replicas which are propagated to member cluster A.
  2. The cluster is a little bit busy so the connection between karmada-agent and kube-apiserber of member A is lost for about 1 minute. So the cluster status is unhealthy.
  3. It happens that the deployment has been scaled up to 10 replicas at this time.
  4. Even if we disable Failover, the scaling up triggers the scheduling procedure, now the karmada-scheduler will remove replicas in cluster A which may be dangerous. However, it is possible that there is nothing wrong with the 5 replicas in member cluster A.

@RainbowMango
Copy link
Member

Thanks @Garrybest for the details. Can you help file an issue to track this?

@Garrybest
Copy link
Member

Sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cannot reschedule when propagationpolicy changed Reschedule bindings on cluster change
7 participants