New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-23900: config.openshift.io/v1/scheduler: allow profile customizations for DRA #1738
OCPBUGS-23900: config.openshift.io/v1/scheduler: allow profile customizations for DRA #1738
Conversation
@ingvagabund: This pull request references Jira Issue OCPBUGS-23900, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Hello @ingvagabund! Some important instructions when contributing to openshift/api: |
/jira refresh |
@ingvagabund: This pull request references Jira Issue OCPBUGS-23900, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, do we have to use the acronym DRA? Why not spell it out ExperimentalDynamicResourceAllocation
?
Also, given you've included Experimental
in the name, what's the plan for promoting this? How will we transition from ExperimentalDRA
to regular DRA
?
Is this a tech preview feature?
9d81824
to
49940a7
Compare
It's true
Yes. It's currently upstream alpha feature. Normally, lifting a feature gate is sufficient for enabling it. However, in this case an additional scheduling plugin needs to be enabled alongside. So even when a customer/user enables the feature (through TPNoUpgrade) the scheduler operator still needs to make an extra step. |
That's still to be debated. Either the new scheduling plugin will be enabled by default after GA eventually. At which point the experimental profile/field gets removed. If not enabled by default after GA, we will promote the profile/field and remove the word experimental. |
Too long by what measure?
Have you considered that the scheduler operator could observe the feature gate being present in the What do you expect the feature enablement to look like when you GA the feature? Do you expect that users will always opt-in? If that's the case, it's not the first time we've added a "You must set this feature gate, then enable the feature by this field that is only enabled in tech preview clusters".
So if it's likely to be enabled by default in the future, perhaps just relying on the feature gate and enabling the feature is acceptable? |
Too long for typing by hand. Although, that's probably not a blocking issue here. Lemme do some renaming.
Auto-enabling based on TPNoUpgrade was discussed before. The decision behind going with an extra step of enabling the feature for kube-scheduler was to allow to enable dynamic resource allocation as a set of functionalities. The DRA implementation itself is an upstream feature that requires additional components deployed. E.g. vendor plugins for NVDIA, Intel, etc. running on each node. So enabling dynamicresource scheduling plugin without these additional components may lead to pods staying in pending state indefinitely.
Users need to deploy third-party plugins before DRA functionality is enabled in the kube-scheduler. Which will always be the case for GA until OpenShift start shipping a generic plugin by default. Which might not happen at all. So this extra step of configuration KSO to enable the plugin will always be required.
The way the plugin is enabled might change in the future. We might actually advocate for always disabling the plugin by default (even after GA) to enable it only when corresponding vendor plugins are installed. Also, the new field might be removed in favor of automatically detecting vendor plugins. Something to be decided in time. |
2e10503
to
8a69ff1
Compare
2567a6d
to
c6c4d46
Compare
Co-authored-by: Joel Speed <Joel.speed@hotmail.co.uk>
Co-authored-by: Joel Speed <Joel.speed@hotmail.co.uk>
Co-authored-by: Joel Speed <Joel.speed@hotmail.co.uk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ingvagabund, JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/override ci/prow/verify-crd-schema Failures are from pre-existing errors, future us will not be using booleans |
@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/verify-crd-schema In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold cancel |
/retest-required |
@ingvagabund: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
f8cee3e
into
openshift:master
@ingvagabund: Jira Issue OCPBUGS-23900: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged: These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-23900 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] This PR has been included in build ose-cluster-config-api-container-v4.16.0-202402090739.p0.gf8cee3e.assembly.stream.el9 for distgit ose-cluster-config-api. |
/cherry-pick release-4.15 |
@ingvagabund: new pull request created: #1763 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Fix included in accepted release 4.16.0-0.nightly-2024-02-17-013806 |
Fix included in accepted release 4.16.0-0.nightly-2024-03-05-105513 |
For more info: https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/
/hold