Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add KEP-521: Cluster Accurate Replica Estimator #580

Merged
merged 1 commit into from
Aug 17, 2021

Conversation

Garrybest
Copy link
Member

Signed-off-by: Garrybest garrybest@foxmail.com

What type of PR is this?
/kind design

What this PR does / why we need it:
Add a new proposal, KEP-521: Cluster Accurate Replica Estimator.

Which issue(s) this PR fixes:
Fixes #521

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

KEP-521: Cluster Accurate Replica Estimator

@karmada-bot karmada-bot added kind/design Categorizes issue or PR as related to design. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 4, 2021
@RainbowMango
Copy link
Member

/assign @kevin-wangzefeng
/cc @mrlihanbo @algebra2k

I'm very busy this week, and today I'm out of the office.

@karmada-bot
Copy link
Collaborator

@RainbowMango: GitHub didn't allow me to request PR reviews from the following users: algebra2k.

Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/assign @kevin-wangzefeng
/cc @mrlihanbo @algebra2k

I'm very busy this week, and today I'm out of the office.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Based on this prefilter result, when assigning replicas, the Karmada Scheduler could try to calculate cluster max available replicas by starting gRPC requests concurrently to the Cluster Accurate Replica Estimator. At last, the Cluster Accurate Replica Estimator will soon return how many available replicas that the cluster could produce. Then the Karmada Scheduler assgin replicas into different clusters in terms of the estimation result.

Furthermore, replica estimation can be considered as a new scheduler plugin.
We could implement this by modifying function calClusterAvailableReplicas to a interface. The previous estimation method, based on `ResourceSummary` in `Cluster.Status`, is able to be a default normal estimation approach.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could implement this by modifying function calClusterAvailableReplicas to a interface.

Do you mean we will have two implementations of this interface?

  • The default one still calculate replicas via ResourceSummary
  • The accurate one calculate replicas via the Cluster Accurate Replica Estimator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user, how to config to select the two implementations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we will have two implementations of this interface?

Right.

As a user, how to config to select the two implementations?

Now we could just add a switch to determine whether Cluster Accurate Replica Estimator is applied. ResourceSummary interface could be a default one that does not support disabling. In the future, when scheduler profile is added, a user could custimize the config by using a profile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about document this in the proposal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take a change.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome to me.

Based on this prefilter result, when assigning replicas, the Karmada Scheduler could try to calculate cluster max available replicas by starting gRPC requests concurrently to the Cluster Accurate Replica Estimator. At last, the Cluster Accurate Replica Estimator will soon return how many available replicas that the cluster could produce. Then the Karmada Scheduler assgin replicas into different clusters in terms of the estimation result.

Furthermore, replica estimation can be considered as a new scheduler plugin.
We could implement this by modifying function calClusterAvailableReplicas to a interface. The previous estimation method, based on `ResourceSummary` in `Cluster.Status`, is able to be a default normal estimation approach.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about document this in the proposal?

Furthermore, replica estimation can be considered as a new scheduler plugin.
We could implement this by modifying function calClusterAvailableReplicas to a interface. The previous estimation method, based on `ResourceSummary` in `Cluster.Status`, is able to be a default normal estimation approach.

### Karmada Cluster Accurate Replica Estimator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many clusters serve for each Cluster Accurate Replica Estimator? I mean if there are a huge number of clusters, does it also mean a lot of Cluster Accurate Replica Estimator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now each Cluster Accurate Replica Estimator just serves for one cluster, just like karmada-agent. If Cluster Accurate Replica Estimator servers for multiple clusters, each replica will consume more resource because they have more pod and node informers. I thought one-for-one serving does not have obvious disadvantages and could make the module interaction easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I didn't mean it's a flaw, just needs to be annotated.

@RainbowMango
Copy link
Member

@irfanurrehman Are you interest in this proposal?

@irfanurrehman
Copy link

@irfanurrehman Are you interest in this proposal?

thanks @RainbowMango for pointing me to this. I cannot commit to this immediately. I can take a look at this over my weekend. Meanwhile, what is the kind of urgency/priority you have attached with this proposal?

@RainbowMango
Copy link
Member

My bad. I meant to invite you to review this proposal since you are a senior expert in the federation area.

Meanwhile, what is the kind of urgency/priority you have attached with this proposal?

There is no rush. You can post your comments at any time and I appreciate it.

Comment on lines 132 to 154
// ReplicaRequirements represents the requirements required by each replica.
// +optional
ReplicaRequirements *ReplicaRequirements `json:"replicaRequirements,omitempty"`

// Replicas represents the replica number of the referencing resource.
// +optional
Replicas int32 `json:"replicas,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to add this fields in ResourceBingding.spec, but keep ObjectReference focusing on indicating the relationship between binding and resource.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Now we got ReplicaResourceRequirements in ObjectReference, which represents the resources required by each replica. Should we move ReplicaResourceRequirements into ResourceBindingSpec first in next PR and then modify it to ReplicaRequirements?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move ReplicaResourceRequirements into ResourceBindingSpec first in next PR

What do you mean in next PR? I think we can do it before this proposal.

Both ReplicaResourceRequirements and Replicas should be moved out, right?

// ReplicaResourceRequirements represents the resources required by each replica.
// +optional
ReplicaResourceRequirements corev1.ResourceList `json:"resourcePerReplicas,omitempty"`
// Replicas represents the replica number of the referencing resource.
// +optional
Replicas int32 `json:"replicas,omitempty"`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean in next PR? I think we can do it before this proposal.

My bad. I mean before this proposal.

Both ReplicaResourceRequirements and Replicas should be moved out, right?

Yes.

@mrlihanbo
Copy link

Just a question, the ReplicaRequirements represents the requirements required by each replica. But the request and replica phase can be overridden by overidepolicy which happen in binding controller. So the replica and request may be incorrect when used by karmada schedule. How to solve this problem?

@Garrybest
Copy link
Member Author

Just a question, the ReplicaRequirements represents the requirements required by each replica. But the request and replica phase can be overridden by overidepolicy which happen in binding controller. So the replica and request may be incorrect when used by karmada schedule. How to solve this problem?

Well, it's a good question but I haven't figured out a solution. I'm afraid that the problem already occurs when using ReplicaResourceRequirements now since user could disrupt scheduling results it by OverridePolicy. Does anyone have an idea?

@RainbowMango
Copy link
Member

I don't think the issue should be covered by this proposal, though I agree this is a good question.

Signed-off-by: Garrybest <garrybest@foxmail.com>
@RainbowMango
Copy link
Member

As discussed at today's meeting, we are going to merge this proposal first.

@RainbowMango RainbowMango added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 17, 2021
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@RainbowMango RainbowMango added the lgtm Indicates that a PR is ready to be merged. label Aug 17, 2021
@karmada-bot karmada-bot merged commit c4835e1 into karmada-io:master Aug 17, 2021
@Garrybest Garrybest deleted the kep branch August 27, 2021 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/design Categorizes issue or PR as related to design. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduler: Cluster Accurate Replica Estimator For Adjusting Replicas
6 participants