Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace borrowing ceiling with weight #62

Closed
alculquicondor opened this issue Feb 24, 2022 · 8 comments
Closed

Replace borrowing ceiling with weight #62

alculquicondor opened this issue Feb 24, 2022 · 8 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. kind/grand-feature lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@alculquicondor
Copy link
Contributor

bit.ly/kueue-apis defined a weight to dynamically set a borrowing ceiling for each Capacity, based on the total resources in the Cohort and the capacities that have pending workloads.

We need to implement such behavior and remove the ceiling.
The weights and unused resources should lead to a dynamic ceiling that is calculated in every scheduling cycle. The exact semantics of this calculation are not fully understood.
In a given scheduling cycle, which capacities are considered for splitting the unused resources? Only the ones with pending jobs? What about the ones that are already borrowing but have no more pending jobs? What is considered unused resources once some resources have already being borrowed?

There are probably a few interpretations to these questions that lead to slightly different results. We need to explore them and pick one that sounds more reasonable or is based on existing systems.

@alculquicondor
Copy link
Contributor Author

/kind feature
/size L
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Feb 24, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2022
@alculquicondor
Copy link
Contributor Author

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 14, 2022
@alculquicondor
Copy link
Contributor Author

Alternatively, the weights could be added to the namespace, so that fairness can be evaluated for the cluster by tenant, rather than per ClusterQueue.

This calls for a proper design doc.

@ahg-g
Copy link
Contributor

ahg-g commented Sep 6, 2022

Alternatively, the weights could be added to the namespace, so that fairness can be evaluated for the cluster by tenant, rather than per ClusterQueue.

Where would those weights be stored? I think setting a weight per namespace is probably going to be difficult to maintain, there will be way more namespaces than CQs

@alculquicondor
Copy link
Contributor Author

From @denkensk in https://github.com/kubernetes-sigs/kueue/pull/410/files#r998066062:

I think we can probably add some algorithms in the future to ensure that there is fairness between the cluster queues? Even we don't violate the min,but maybe can not preempt the workloads which are all coming from the same CQ?Maybe you know, we always do not want to hurt some single user too much, even though he borrowed someone else's resources.

@alculquicondor
Copy link
Contributor Author

/close
in favor of #1714

@k8s-ci-robot
Copy link
Contributor

@alculquicondor: Closing this issue.

In response to this:

/close
in favor of #1714

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. kind/grand-feature lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

4 participants