-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler: support scheduling profile-level configuration parameters #93270
Comments
@yuanchen8911: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig scheduling |
/assign @Huang-Wei |
This requirement looks reasonable to me. /cc @alculquicondor |
Moving |
Thanks, @Huang-Wei @ahg-g. How should we proceed with it? |
should probably include the related parameters such as "minFeasibleNodesToFind". The current global setting is 100. |
The problem with "moving" is that it increases maintenance, as we have to support 2 API versions for some time. Should we keep the global as well? |
Are you sure 100 is too big for your use case? I prefer we don't add parameters if we don't need them. |
the moving in this case shouldn't be difficult (we keep both and the per-profile value takes precedence), and we need to support it for two releases only, right? |
for Beta it will be 9 months or 3 releases, whichever is longer. |
If we keep both, we don't need a new API version. |
yeah, we could keep both in Beta, and remove the global one in GA. |
It is my understanding that the strong preference is that there are no changes between Beta and GA, apart from new fields. |
True, but I think in this case it is fine because we are not removing functionality and not changing the config significantly. |
There are use cases where the number of feasible nodes to find is known as priori. For example, scheduling a Pod on a certain node (like NodeName plugin) or any feasible node. The number of nodes to find is 1. A scalable scheduling algorithm finds two feasible nodes and then chooses the better one (Sparrow https://cs.stanford.edu/~matei/papers/2013/sosp_sparrow.pdf). In this case, the number is 2. In the above cases, the number of feasible nodes to find has a specific small value. Once the specified number of nodes are found, filtering will be canceled. A (optional) per-profile parameter like numFeasibleNodesToFind would be useful (better than than minFeasibleNodesToFind). |
Keeping the global one makes sense. Local ones override the global one. |
I feel keeping the global one would work better. The global one is the default for all profiles and a local one overrides the global one. It makes a lot of sense and the compatibility is maintained. |
We can continue the discussion in #86630. |
I know this is true, but the question is whether it's worth adding such optimization. The current scheduler already can handle 100 pod/s in clusters with 5k nodes. What is your target and how far are we from it? |
A salient feature of the scheduling framework and plugin is the ability to customize scheduling for specific workloads and use cases. Providing flexible mechanisms is hence important for developing various customizations to meet different needs of diverse workloads. As far as scheduling a huge number of small batch jobs/tasks in ultra large clusters, 100 pods/second may not be good enough. Hadoop YARN's throughput is in the high end of hundreds of tasks/sec. We are aware of batch systems that can schedule more than 1000 tasks/per second. The scheduling scalability can become (if not yet) a roadblock to run batch jobs in k8s at scale. If we talk about (sub-second) tiny-tasks as described in the Sparrow paper, even 1000 pods/second might not be adequate. Adding an additional parameter like numFeasibleNodes to each profile is straightforward. It would be worthwhile If it can facilitate the development of advanced scheduling like scalable scheduling. What optimization or algorithms should be used would be a separate problem. |
If there's still a concern, we can just add |
@SataQiu are you no longer working on this? |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Is there anyone still working on this issue? I may help. |
/remove-lifecycle rotten |
I doesn't look like there is any interest on this anymore /close |
@alculquicondor: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @ahg-g @alculquicondor, can we reopen this issue? We discussed the proposed change and everyone agreed with it last year. An implementation PR was filed by @SataQiu , but it was incompleted and closed for some reason #97263. We'd like to submit an PR for it. A use case is to better support workload-specific scheduling with soft /cc @Huang-Wei |
/reopen It looks like it was closed by the author because they couldn't continue working on it. I think the agreement was to remove the original percentageOfNodesToScore, but we can no longer do that because the configuration API is a now stable. In any case, we can still add the new field, but we need to document properly what happens when the field outside of the profile is also set. |
@alculquicondor: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/triage accepted |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Once #112521 is merged, is there any other configurations requested for now? |
We can close it now. /close |
@Huang-Wei: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What would you like to be added:
The current scheduler parameters are set in a scheduler configuration file as global settings. As the scheduling framework and multiple scheduling profiles are introduced, it will be useful to support scheduling profile level parameters, e.g., percentageOfNodeToScore and related parameters minFeasibleNodesToFind.
Global configuration
Per-profile configuration
Why is this needed:
Scheduling profile-level configuration parameters will provide a simple way to customize the scheduling behavior with different scheduling profiles. For example, different thresholds of percentageOfNodesToScore can be used to conduct performance tuning and achieve a better balance of scheduling performance and quality for different workloads. For example, a long running service typically cares more about the scheduling quality and can simply use a profile with a high threshold to achieve a better scheduling quality while a large batch job looks for a quick turnaround and will use a scheduling profile with a lower threshold for a quicker scheduling.
The text was updated successfully, but these errors were encountered: