Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schedule placement based on resource usage/capacity #16

Conversation

elgnay
Copy link
Contributor

@elgnay elgnay commented Aug 9, 2021

The related feature request: open-cluster-management-io/community#52
Signed-off-by: Yang Le yangle@redhat.com

@elgnay
Copy link
Contributor Author

elgnay commented Aug 9, 2021

/assign @qiujian16
/assign @deads2k

@elgnay elgnay force-pushed the resource_based_scheduling branch 4 times, most recently from 1770be4 to 0cceaae Compare August 10, 2021 07:31
Copy link

@nobody4t nobody4t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @elgnay ,

I have some comments. Most of them are rewording. Please consider them.
Hope it can be helpful to make this doc more readable.

#### Story 3: User is able to create a placement to select clusters based on resource usage, and then keep placement decisions updated according to the changes on cluster resource usage.
- A CPU-intensive application would like to be deployed on the cluster with least CPU utilization rate.

#### Story 4: User is able to create a placement to select clusters based on resource usage, and then ignore any resource usage change afterwards to keep the placement decisions pinning.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User is able to select clusters based on some resource(s) usage once, without considering the usage changes afterwards, keeping the decisions pinned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @elgnay ,

I have some comments. Most of them are rewording. Please consider them.
Hope it can be helpful to make this doc more readable.

@dongwangdw Thanks for the comments. I'll update the proposal accordingly.

```
In order to support other resources, like GPU, the `allocatable` and `capacity` should be included in the status of managed cluster either.

### Plugin `mostallocatabletocapacityratio`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may do not have the background of this scenario.
But from my point, in this scenario, the user will be responsible for the resource capacity level. Like, some cluster got 20 cpu, some got 200 cpu. Then this ratio now may not make scense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. That's the reason why we need another type MostAllocatable besides MostAllocatableToCapacityRatio.

Actually, none of them works perfectly in all scenarios. For example, suppose there are two clusters:

  • cluster1 with 15/20 cpus allocatable;
  • cluster2 with 20/200 cpus allocatable;

With type MostAllocatableToCapacityRatio, cluster1 will have a higher score than cluster2, while with type MostAllocatable, cluster2 will have a higher score. Which one makes more scenes? it depends on user's requirement.

// the placement.
// This field is ignored when NumberOfClusters in spec is not specified.
// +optional
ClusterResourcePreference *ClusterResourcePreference `json:"clusterResourcePreference,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between nil and empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the difference:

  • If nil, it means the placement has no ClusterResourcePreference at all;
  • if empty, it means the placement has an empty ClusterResourcePreference, whose type is MostAllocatableToCapacityRatio. Consider ClusterResourcePreference.ClusterResource must have at lease one item, an empty ClusterResourcePreference is invalid.
 	// +kubebuilder:validation:MinItems:=1
        // +required
 	ClusterResources []ClusterResource `json:"clusterResources"`

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given steady and balance weighting, it seems like we may want a more generic prioritizing configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A generic prioritizing configuration is provided instead of ClusterResourcePreference

- Otherwise, the score for each managed cluster is 0.

Before returning the scores to the scheduler, the data should be normalized and ensure the value falls in the range between 0 and 100.
`normalized = (score - min(score)) * 100 / (max(score) - min(score))`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test cases/version upgrade etc is required. You can set it to N/A if it is not required.

I addition I would like to see some examples, and a discussion on how it works with steady/balance plugin today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

### Plugin `mostallocatabletocapacityratio`
It is a pritoritizer and scores feasible managed clusters with the process below.
- If the placement has `ClusterResourcePreference` specified in the spec and its `Type` is `MostAllocatableToCapacityRatio`, the score of a managed cluster is the sum of the score for each resource.
`score = sum(resource_x_allocatable / resource_x_capacity))`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is worth to mention that we treat each resource dimension with equal weight and the reasoning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@elgnay elgnay force-pushed the resource_based_scheduling branch 2 times, most recently from cf7cd97 to 011050f Compare August 17, 2021 09:54

### Goals

- Sotry 1 and 2, allow user to select managed clusters based on resource usage/capability with the Placement APIs;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo story

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.


### Non-Goals
- Balance workload across the fleet based on cluster resource usage;
- Story 3 and 4 are related to churning policy of placement, and will be covered in a separated enhancement proposal;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this different then that the steady weighting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are related. Weight of steady plugin should be adjusted automatically according to the churning policy of a placement. And we may also support some advanced features, like churningSeconds, to describe to what extent we can stand for the cluster churning.


Link to the feature request: https://github.com/open-cluster-management-io/community/issues/52

### User Stories
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to express: "I must have at least this much space"? If not, why was that excluded?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does a user handle it when their clusters are autoscaling? What do they expect to happen? This could be presented as there not being capacity available I think. Or perhaps that capacity is tight, but will expand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to express: "I must have at least this much space"? If not, why was that excluded?

No, because we can not support it. Suppose we are able to create a placement which matches all clusters with at least 10G allocatable memory. In the first scheduling cycle, cluster1 is selected by this placement for it has 12G allocatable memory. After several minutes, the allocatable memory of cluster1 is reduced to 8G. We don't know if the reduced 4G memory is consumed by workload associated with this placement. So in the next scheduling cycle, should cluster1 be selected by this placement or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does a user handle it when their clusters are autoscaling? What do they expect to happen? This could be presented as there not being capacity available I think. Or perhaps that capacity is tight, but will expand.

Here is what will happens if managed clusters are autoscaling.

Scale up

  1. User creates a placement with ClusterResourcePreference;
  2. The cluster with most allocatable to capacity ratio or most allocatable will be selected;
  3. Workload will then be deployed on the selected cluster;
  4. If there is no enough resource available or the capacity is tight, new node will be added;
  5. Capacity/allocatable of this cluster changes. That will trigger a new scheduling cycle for placements with ClusterResourcePreference;

Scale down

  1. An underutilized node in a managed cluster is removed;
  2. Capacity/allocatable of this cluster changes. That will trigger a new scheduling cycle for placements with ClusterResourcePreference;
  3. Some Placements may no longer select this cluster because of resource change;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that explanation makes sense. I might suggest an appendix with your example of scale up and scale down and your example of why you cannot express "at least this much space".

@elgnay elgnay force-pushed the resource_based_scheduling branch 3 times, most recently from 5241923 to cb0cf57 Compare August 23, 2021 15:49
Signed-off-by: Yang Le <yangle@redhat.com>

## Proposal

### 1. Changes on Placement APIs
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the combination of this API with the prioritizers in item 2 is going to age pretty well.

4. `ResourceAllocatableMemory`, it scores managed clusters according to allocatable Memory.

According to the name it registered, the `resource` plugin uses different formulas to calculate the score of a managed cluster, the value falls in the range between -100 and 100.
| Prioritizer | Formula |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting very specific here. I think these make sense to me.

@deads2k
Copy link

deads2k commented Aug 25, 2021

This design looks good and I think it integrates well into the prioritizer work already completed. I see how weighting against churn is a different feature.

Once this feature is created, is another possible consideration one that tries to access whether a given placement of a resource appears to be permafailing? That is, "the work was assigned to cluster/A and cluster/B, but on cluster/A it is consistently failing".

/approve
/assign @qiujian16

@openshift-ci
Copy link

openshift-ci bot commented Aug 25, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, elgnay

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@qiujian16
Copy link
Member

Once this feature is created, is another possible consideration one that tries to access whether a given placement of a resource appears to be permafailing? That is, "the work was assigned to cluster/A and cluster/B, but on cluster/A it is consistently failing".

Yes, I think this is one of the concern raised by @mdelder also.

@qiujian16
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Aug 26, 2021
@openshift-merge-robot openshift-merge-robot merged commit 70704aa into open-cluster-management-io:main Aug 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants