-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schedule placement based on resource usage/capacity #16
schedule placement based on resource usage/capacity #16
Conversation
64a03a9
to
24ed44c
Compare
/assign @qiujian16 |
1770be4
to
0cceaae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @elgnay ,
I have some comments. Most of them are rewording. Please consider them.
Hope it can be helpful to make this doc more readable.
enhancements/sig-architecture/15-resourcebasedscheduling/README.md
Outdated
Show resolved
Hide resolved
enhancements/sig-architecture/15-resourcebasedscheduling/README.md
Outdated
Show resolved
Hide resolved
enhancements/sig-architecture/15-resourcebasedscheduling/README.md
Outdated
Show resolved
Hide resolved
enhancements/sig-architecture/15-resourcebasedscheduling/README.md
Outdated
Show resolved
Hide resolved
#### Story 3: User is able to create a placement to select clusters based on resource usage, and then keep placement decisions updated according to the changes on cluster resource usage. | ||
- A CPU-intensive application would like to be deployed on the cluster with least CPU utilization rate. | ||
|
||
#### Story 4: User is able to create a placement to select clusters based on resource usage, and then ignore any resource usage change afterwards to keep the placement decisions pinning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User is able to select clusters based on some resource(s) usage once, without considering the usage changes afterwards, keeping the decisions pinned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @elgnay ,
I have some comments. Most of them are rewording. Please consider them.
Hope it can be helpful to make this doc more readable.
@dongwangdw Thanks for the comments. I'll update the proposal accordingly.
``` | ||
In order to support other resources, like GPU, the `allocatable` and `capacity` should be included in the status of managed cluster either. | ||
|
||
### Plugin `mostallocatabletocapacityratio` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may do not have the background of this scenario.
But from my point, in this scenario, the user will be responsible for the resource capacity level. Like, some cluster got 20 cpu, some got 200 cpu. Then this ratio now may not make scense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. That's the reason why we need another type MostAllocatable
besides MostAllocatableToCapacityRatio
.
Actually, none of them works perfectly in all scenarios. For example, suppose there are two clusters:
- cluster1 with 15/20 cpus allocatable;
- cluster2 with 20/200 cpus allocatable;
With type MostAllocatableToCapacityRatio
, cluster1 will have a higher score than cluster2, while with type MostAllocatable
, cluster2 will have a higher score. Which one makes more scenes? it depends on user's requirement.
// the placement. | ||
// This field is ignored when NumberOfClusters in spec is not specified. | ||
// +optional | ||
ClusterResourcePreference *ClusterResourcePreference `json:"clusterResourcePreference,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between nil and empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the difference:
- If nil, it means the placement has no
ClusterResourcePreference
at all; - if empty, it means the placement has an empty
ClusterResourcePreference
, whose type isMostAllocatableToCapacityRatio
. ConsiderClusterResourcePreference.ClusterResource
must have at lease one item, an emptyClusterResourcePreference
is invalid.
// +kubebuilder:validation:MinItems:=1
// +required
ClusterResources []ClusterResource `json:"clusterResources"`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given steady and balance weighting, it seems like we may want a more generic prioritizing configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A generic prioritizing configuration is provided instead of ClusterResourcePreference
- Otherwise, the score for each managed cluster is 0. | ||
|
||
Before returning the scores to the scheduler, the data should be normalized and ensure the value falls in the range between 0 and 100. | ||
`normalized = (score - min(score)) * 100 / (max(score) - min(score))` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test cases/version upgrade etc is required. You can set it to N/A if it is not required.
I addition I would like to see some examples, and a discussion on how it works with steady/balance plugin today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
### Plugin `mostallocatabletocapacityratio` | ||
It is a pritoritizer and scores feasible managed clusters with the process below. | ||
- If the placement has `ClusterResourcePreference` specified in the spec and its `Type` is `MostAllocatableToCapacityRatio`, the score of a managed cluster is the sum of the score for each resource. | ||
`score = sum(resource_x_allocatable / resource_x_capacity))` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is worth to mention that we treat each resource dimension with equal weight and the reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
cf7cd97
to
011050f
Compare
|
||
### Goals | ||
|
||
- Sotry 1 and 2, allow user to select managed clusters based on resource usage/capability with the Placement APIs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo story
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected.
|
||
### Non-Goals | ||
- Balance workload across the fleet based on cluster resource usage; | ||
- Story 3 and 4 are related to churning policy of placement, and will be covered in a separated enhancement proposal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this different then that the steady weighting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are related. Weight of steady
plugin should be adjusted automatically according to the churning policy of a placement. And we may also support some advanced features, like churningSeconds
, to describe to what extent we can stand for the cluster churning.
|
||
Link to the feature request: https://github.com/open-cluster-management-io/community/issues/52 | ||
|
||
### User Stories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to express: "I must have at least this much space"? If not, why was that excluded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does a user handle it when their clusters are autoscaling? What do they expect to happen? This could be presented as there not being capacity available I think. Or perhaps that capacity is tight, but will expand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to express: "I must have at least this much space"? If not, why was that excluded?
No, because we can not support it. Suppose we are able to create a placement which matches all clusters with at least 10G allocatable memory. In the first scheduling cycle, cluster1 is selected by this placement for it has 12G allocatable memory. After several minutes, the allocatable memory of cluster1 is reduced to 8G. We don't know if the reduced 4G memory is consumed by workload associated with this placement. So in the next scheduling cycle, should cluster1 be selected by this placement or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does a user handle it when their clusters are autoscaling? What do they expect to happen? This could be presented as there not being capacity available I think. Or perhaps that capacity is tight, but will expand.
Here is what will happens if managed clusters are autoscaling.
Scale up
- User creates a placement with
ClusterResourcePreference
; - The cluster with most allocatable to capacity ratio or most allocatable will be selected;
- Workload will then be deployed on the selected cluster;
- If there is no enough resource available or the capacity is tight, new node will be added;
- Capacity/allocatable of this cluster changes. That will trigger a new scheduling cycle for placements with ClusterResourcePreference;
Scale down
- An underutilized node in a managed cluster is removed;
- Capacity/allocatable of this cluster changes. That will trigger a new scheduling cycle for placements with ClusterResourcePreference;
- Some Placements may no longer select this cluster because of resource change;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that explanation makes sense. I might suggest an appendix with your example of scale up and scale down and your example of why you cannot express "at least this much space".
5241923
to
cb0cf57
Compare
cb0cf57
to
2dc8e48
Compare
Signed-off-by: Yang Le <yangle@redhat.com>
2dc8e48
to
56436d9
Compare
|
||
## Proposal | ||
|
||
### 1. Changes on Placement APIs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the combination of this API with the prioritizers in item 2 is going to age pretty well.
4. `ResourceAllocatableMemory`, it scores managed clusters according to allocatable Memory. | ||
|
||
According to the name it registered, the `resource` plugin uses different formulas to calculate the score of a managed cluster, the value falls in the range between -100 and 100. | ||
| Prioritizer | Formula | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting very specific here. I think these make sense to me.
This design looks good and I think it integrates well into the prioritizer work already completed. I see how weighting against churn is a different feature. Once this feature is created, is another possible consideration one that tries to access whether a given placement of a resource appears to be permafailing? That is, "the work was assigned to cluster/A and cluster/B, but on cluster/A it is consistently failing". /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, elgnay The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Yes, I think this is one of the concern raised by @mdelder also. |
/lgtm |
The related feature request: open-cluster-management-io/community#52
Signed-off-by: Yang Le yangle@redhat.com