add KEP-521: Cluster Accurate Replica Estimator #580

Garrybest · 2021-08-04T08:30:27Z

Signed-off-by: Garrybest garrybest@foxmail.com

What type of PR is this?
/kind design

What this PR does / why we need it:
Add a new proposal, KEP-521: Cluster Accurate Replica Estimator.

Which issue(s) this PR fixes:
Fixes #521

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

KEP-521: Cluster Accurate Replica Estimator

RainbowMango · 2021-08-05T02:15:51Z

/assign @kevin-wangzefeng
/cc @mrlihanbo @algebra2k

I'm very busy this week, and today I'm out of the office.

karmada-bot · 2021-08-05T02:15:54Z

@RainbowMango: GitHub didn't allow me to request PR reviews from the following users: algebra2k.

Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/assign @kevin-wangzefeng
/cc @mrlihanbo @algebra2k

I'm very busy this week, and today I'm out of the office.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

RainbowMango · 2021-08-06T16:21:05Z

docs/proposals/scheduling/521-replica-estimator/README.md

+Based on this prefilter result, when assigning replicas, the Karmada Scheduler could try to calculate cluster max available replicas by starting gRPC requests concurrently to the Cluster Accurate Replica Estimator. At last, the Cluster Accurate Replica Estimator will soon return how many available replicas that the cluster could produce. Then the Karmada Scheduler assgin replicas into different clusters in terms of the estimation result.
+
+Furthermore, replica estimation can be considered as a new scheduler plugin.
+We could implement this by modifying function calClusterAvailableReplicas to a interface. The previous estimation method, based on `ResourceSummary` in `Cluster.Status`, is able to be a default normal estimation approach.


We could implement this by modifying function calClusterAvailableReplicas to a interface.

Do you mean we will have two implementations of this interface?

The default one still calculate replicas via ResourceSummary

The accurate one calculate replicas via the Cluster Accurate Replica Estimator.

As a user, how to config to select the two implementations?

Do you mean we will have two implementations of this interface?

Right.

As a user, how to config to select the two implementations?

Now we could just add a switch to determine whether Cluster Accurate Replica Estimator is applied. ResourceSummary interface could be a default one that does not support disabling. In the future, when scheduler profile is added, a user could custimize the config by using a profile.

How about document this in the proposal?

Let me take a change.

RainbowMango

Looks awesome to me.

RainbowMango · 2021-08-10T03:17:13Z

docs/proposals/scheduling/521-replica-estimator/README.md

+Based on this prefilter result, when assigning replicas, the Karmada Scheduler could try to calculate cluster max available replicas by starting gRPC requests concurrently to the Cluster Accurate Replica Estimator. At last, the Cluster Accurate Replica Estimator will soon return how many available replicas that the cluster could produce. Then the Karmada Scheduler assgin replicas into different clusters in terms of the estimation result.
+
+Furthermore, replica estimation can be considered as a new scheduler plugin.
+We could implement this by modifying function calClusterAvailableReplicas to a interface. The previous estimation method, based on `ResourceSummary` in `Cluster.Status`, is able to be a default normal estimation approach.


How about document this in the proposal?

RainbowMango · 2021-08-10T03:21:18Z

docs/proposals/scheduling/521-replica-estimator/README.md

+Furthermore, replica estimation can be considered as a new scheduler plugin.
+We could implement this by modifying function calClusterAvailableReplicas to a interface. The previous estimation method, based on `ResourceSummary` in `Cluster.Status`, is able to be a default normal estimation approach.
+
+### Karmada Cluster Accurate Replica Estimator


How many clusters serve for each Cluster Accurate Replica Estimator? I mean if there are a huge number of clusters, does it also mean a lot of Cluster Accurate Replica Estimator?

Now each Cluster Accurate Replica Estimator just serves for one cluster, just like karmada-agent. If Cluster Accurate Replica Estimator servers for multiple clusters, each replica will consume more resource because they have more pod and node informers. I thought one-for-one serving does not have obvious disadvantages and could make the module interaction easier.

Ok. I didn't mean it's a flaw, just needs to be annotated.

RainbowMango · 2021-08-10T03:24:37Z

@irfanurrehman Are you interest in this proposal?

irfanurrehman · 2021-08-10T05:17:46Z

@irfanurrehman Are you interest in this proposal?

thanks @RainbowMango for pointing me to this. I cannot commit to this immediately. I can take a look at this over my weekend. Meanwhile, what is the kind of urgency/priority you have attached with this proposal?

RainbowMango · 2021-08-10T07:01:18Z

My bad. I meant to invite you to review this proposal since you are a senior expert in the federation area.

Meanwhile, what is the kind of urgency/priority you have attached with this proposal?

There is no rush. You can post your comments at any time and I appreciate it.

kevin-wangzefeng · 2021-08-10T09:39:10Z

docs/proposals/scheduling/521-replica-estimator/README.md

+	// ReplicaRequirements represents the requirements required by each replica.
+	// +optional
+	ReplicaRequirements *ReplicaRequirements `json:"replicaRequirements,omitempty"`
+
+	// Replicas represents the replica number of the referencing resource.
+	// +optional
+	Replicas int32 `json:"replicas,omitempty"`


It would be better to add this fields in ResourceBingding.spec, but keep ObjectReference focusing on indicating the relationship between binding and resource.

Good idea. Now we got ReplicaResourceRequirements in ObjectReference, which represents the resources required by each replica. Should we move ReplicaResourceRequirements into ResourceBindingSpec first in next PR and then modify it to ReplicaRequirements?

Should we move ReplicaResourceRequirements into ResourceBindingSpec first in next PR

What do you mean in next PR? I think we can do it before this proposal.

Both ReplicaResourceRequirements and Replicas should be moved out, right?

karmada/pkg/apis/work/v1alpha1/binding_types.go

Lines 59 to 65 in 4c2c189

// ReplicaResourceRequirements represents the resources required by each replica.

// +optional

ReplicaResourceRequirements corev1.ResourceList `json:"resourcePerReplicas,omitempty"`

// Replicas represents the replica number of the referencing resource.

// +optional

Replicas int32 `json:"replicas,omitempty"`

What do you mean in next PR? I think we can do it before this proposal.

My bad. I mean before this proposal.

Both ReplicaResourceRequirements and Replicas should be moved out, right?

Yes.

mrlihanbo · 2021-08-11T04:09:47Z

Just a question, the ReplicaRequirements represents the requirements required by each replica. But the request and replica phase can be overridden by overidepolicy which happen in binding controller. So the replica and request may be incorrect when used by karmada schedule. How to solve this problem?

Garrybest · 2021-08-11T06:18:31Z

Just a question, the ReplicaRequirements represents the requirements required by each replica. But the request and replica phase can be overridden by overidepolicy which happen in binding controller. So the replica and request may be incorrect when used by karmada schedule. How to solve this problem?

Well, it's a good question but I haven't figured out a solution. I'm afraid that the problem already occurs when using ReplicaResourceRequirements now since user could disrupt scheduling results it by OverridePolicy. Does anyone have an idea?

RainbowMango · 2021-08-11T09:03:59Z

I don't think the issue should be covered by this proposal, though I agree this is a good question.

Signed-off-by: Garrybest <garrybest@foxmail.com>

RainbowMango · 2021-08-17T13:08:19Z

As discussed at today's meeting, we are going to merge this proposal first.

karmada-bot · 2021-08-17T13:08:32Z

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot added kind/design Categorizes issue or PR as related to design. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 4, 2021

Garrybest force-pushed the kep branch from 31c50a4 to 4727625 Compare August 4, 2021 08:50

karmada-bot assigned kevin-wangzefeng Aug 5, 2021

karmada-bot requested a review from mrlihanbo August 5, 2021 02:15

RainbowMango reviewed Aug 6, 2021

View reviewed changes

RainbowMango reviewed Aug 10, 2021

View reviewed changes

Garrybest force-pushed the kep branch from 4727625 to c6c4a70 Compare August 10, 2021 08:39

kevin-wangzefeng reviewed Aug 10, 2021

View reviewed changes

Garrybest force-pushed the kep branch 2 times, most recently from feb9013 to 9180b96 Compare August 14, 2021 08:19

add KEP-521: Cluster Accurate Replica Estimator

a05ad5e

Signed-off-by: Garrybest <garrybest@foxmail.com>

Garrybest force-pushed the kep branch from 9180b96 to a05ad5e Compare August 14, 2021 08:23

Garrybest mentioned this pull request Aug 17, 2021

KEP-521: Scheduler Estimator TODO List #617

Closed

10 tasks

RainbowMango added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 17, 2021

RainbowMango added the lgtm Indicates that a PR is ready to be merged. label Aug 17, 2021

karmada-bot merged commit c4835e1 into karmada-io:master Aug 17, 2021

Garrybest deleted the kep branch August 27, 2021 09:28

Garrybest mentioned this pull request Sep 15, 2021

Discussion about the behavior of replica scheduling weight preference #730

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add KEP-521: Cluster Accurate Replica Estimator #580

add KEP-521: Cluster Accurate Replica Estimator #580

Garrybest commented Aug 4, 2021

RainbowMango commented Aug 5, 2021

karmada-bot commented Aug 5, 2021

RainbowMango Aug 6, 2021

RainbowMango Aug 6, 2021

Garrybest Aug 8, 2021

RainbowMango Aug 10, 2021

Garrybest Aug 10, 2021

RainbowMango left a comment

RainbowMango Aug 10, 2021

RainbowMango Aug 10, 2021

Garrybest Aug 10, 2021

RainbowMango Aug 10, 2021

RainbowMango commented Aug 10, 2021

irfanurrehman commented Aug 10, 2021

RainbowMango commented Aug 10, 2021

kevin-wangzefeng Aug 10, 2021

Garrybest Aug 11, 2021

RainbowMango Aug 11, 2021

Garrybest Aug 11, 2021

mrlihanbo commented Aug 11, 2021

Garrybest commented Aug 11, 2021

RainbowMango commented Aug 11, 2021

RainbowMango commented Aug 17, 2021

karmada-bot commented Aug 17, 2021

	// ReplicaResourceRequirements represents the resources required by each replica.
	// +optional
	ReplicaResourceRequirements corev1.ResourceList `json:"resourcePerReplicas,omitempty"`

	// Replicas represents the replica number of the referencing resource.
	// +optional
	Replicas int32 `json:"replicas,omitempty"`

add KEP-521: Cluster Accurate Replica Estimator #580

add KEP-521: Cluster Accurate Replica Estimator #580

Conversation

Garrybest commented Aug 4, 2021

RainbowMango commented Aug 5, 2021

karmada-bot commented Aug 5, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RainbowMango left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RainbowMango commented Aug 10, 2021

irfanurrehman commented Aug 10, 2021

RainbowMango commented Aug 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrlihanbo commented Aug 11, 2021

Garrybest commented Aug 11, 2021

RainbowMango commented Aug 11, 2021

RainbowMango commented Aug 17, 2021

karmada-bot commented Aug 17, 2021