-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Considering PVC as a resource for balanced resource utilization in the scheduler #58232
Comments
/sig scheduling |
But PVC is not bound to node, not very sure about how the spreading here works @abhgupta Mind to explain a little bit? |
@abhgupta are you just looking for a count-leveling? So distributions of PVC's are not lumpy? If so, that seems pretty trivial. |
@abhgupta , any more detail about PVC limits? My understanding is that PVC are NOT ALWAYS related to node. |
@k82 - IIUC, what @abhgupta wants is - a node which has balanced CPU, memory and number of volumes attached. |
Any impact if we have a bigger number, performance or max mount points ? If max mount points, that's interesting point that how many pods will run in each node :) Another case in my mind is DFS (distributed file system) client issue : if so many Pod read/write data into dfs mount point, the connection maybe a bottleneck. No objection, just would like to know the detail of case before we build a priorities for it :). |
/cc @bsalamat @kubernetes/sig-storage-feature-requests |
@ravisantoshgudimetla Thanks! That exactly captures the gist of my proposal. We could ignore PVs (for storage classes) that do not have an upper limit. Such PVs (with no upper limits) could simply be dealt with the "least requested" priority function. If I understand what @k82cn was highlighting
|
This is also related to #24317. Bigger nodes can have higher limits on the max number of volumes attached. An even count spreading assumes that all nodes are the same size. |
I think as of now, we are limited by 39 for AWS and 16 for GCE which are hardcoded from scheduling perspective with users being able to specify through environment variable, so the numbers aren't that high. I can create a benchmark to test this. @msau42 - Thanks for pointing to the issue where node sizes vary. I think as long as I rely on the variable that uses configMap(#53461 (comment)), we should be good to go.
I think this statement holds good for memory and CPU as well. AFAIU, a balanced node is one which has memory utilization(requested/capacity), CPU utilization close to each other meaning variance is less irrespective of size. Won't this be same be same for volumes attached? Or is your statement based on not using configmap which has information on cloud provider specific value. |
Are you using PVCs and attached volumes interchangeably here? AFAIK, PVCs should first be processed and result in an attached volume before the Pod can be scheduled. |
Right now the scheduler does not cache number of volumes attached per node. The predicate just walks through all Pods and calculates the volume counts per node. In general, there is also a lot of churn in this area:
|
+1, it's also better to enhance scheduler for general purpose, e.g. CSI, instead of hardcode :). Overall, that's a reasonable feature to me :). |
Automatic merge from submit-queue (batch tested with PRs 54997, 61869, 61816, 61909, 60525). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Balanced resource allocation priority to include volume count on nodes. Scheduler balanced resource allocation priority to include volume count on nodes. /cc @aveshagarwal @abhgupta **What this PR does / why we need it**: **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #58232 **Release note**: ```release-note Balanced resource allocation priority in scheduler to include volume count on node ```
Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature
What happened:
The scheduler currently does not have a priority function to properly spread out pods that request PVCs. In the absence of such a priority function, scheduling of pods with PVCs could be skewed resulting in:
If the above happens, we get into a state of inefficient node utilization.
What you expected to happen:
PVCs should be considered as a resource within the balanced resource utilization (BRU) priority function. Instead of the current algorithm in place within the BRU priority function, we could consider using standard deviation to allow more than 2 resources to be balanced across nodes. Input values for the standard deviation calculation could be the fractions/ratios of the scheduled->capacity for the resources (memory, cpu, and pvc).
Alternately, a separate priority function could be considered for just PVCs - whatever makes more sense.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
The PVC resource should also be considered in the "least requested" priority function.
Environment:
kubectl version
): Anyuname -a
): AnyThe text was updated successfully, but these errors were encountered: