-
Notifications
You must be signed in to change notification settings - Fork 39.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prioritizing nodes based on volume capacity: API changes #99594
Conversation
@cofyc: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
3df62be
to
bedf08a
Compare
/retest |
I guess you didn't need this because the feature is still disabled. |
/approve this is good from the scheduler's POV. |
@@ -211,6 +211,11 @@ type VolumeBindingArgs struct { | |||
// Value must be non-negative integer. The value zero indicates no waiting. | |||
// If this value is nil, the default value will be used. | |||
BindTimeoutSeconds int64 | |||
|
|||
// Shape specifies the points defining the score function shape. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more description to this field? What does each point represent and how does it impact scoring?
It would also be good to indicate that this field requires the alpha VolumeCapacityPriority feature gate to be enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be good to indicate that this field requires the alpha VolumeCapacityPriority feature gate to be enabled.
Is the +featureGate=VolumeCapacityPriority
for this purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more description to this field? What does each point represent and how does it impact scoring?
tried my best, but I'm not good at this, suggestions are welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is helpful.
Can you also describe how volume capacity maps to utilization? Is it based on #pvs, total capacity available to the node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also describe how volume capacity maps to utilization? Is it based on #pvs, total capacity available to the node?
done in 5d9baa9
the existing bound PVs on the node are excluded as they cannot be shared by other pods like cpu/memory.
@@ -273,6 +273,18 @@ func SetDefaults_VolumeBindingArgs(obj *v1beta1.VolumeBindingArgs) { | |||
if obj.BindTimeoutSeconds == nil { | |||
obj.BindTimeoutSeconds = pointer.Int64Ptr(600) | |||
} | |||
if len(obj.Shape) == 0 && feature.DefaultFeatureGate.Enabled(features.VolumeCapacityPriority) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add unit tests for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
return err | ||
if utilfeature.DefaultFeatureGate.Enabled(features.VolumeCapacityPriority) { | ||
allErrs = append(allErrs, validateFunctionShape(args.Shape, path.Child("shape"))...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we validate the argument even if the feature gate is off? Otherwise someone could pass an invalid argument and we ignore it, and then in the future when we turn on the feature, their config no longer works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt we don't have "field dropping" for config APIs right? In lieu of that, I think validating the field in all cases and ignoring it at runtime is the best approach.
Another approach of returning an error if the feature is disabled will break rollback/disable scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In lieu of that, I think validating the field in all cases and ignoring it at runtime is the best approach.
+1 and implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go a step further and forbid values in this config field if the feature is off... otherwise someone can specify a shape that is syntactically valid but does not behave at all like they want, successfully run the scheduler with this feature off, and then break themselves with the same config when the feature turns on in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we return an error, then if someone has the feature on and sets the config, but then rolls back/disables the feature, then they would have to update their config at that time too otherwise the config will fail validation. Is that the expected approach for config apis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... I don't love either direction, but in the rollback case, since they're already modifying the invocation, I think making them comment out the config for the disabled feature is better than letting people put timebomb inert configuration in the file and then activating it on upgrade when the feature becomes active
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems a safer choice, done in 3e788a3
@@ -273,6 +273,18 @@ func SetDefaults_VolumeBindingArgs(obj *v1beta1.VolumeBindingArgs) { | |||
if obj.BindTimeoutSeconds == nil { | |||
obj.BindTimeoutSeconds = pointer.Int64Ptr(600) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there is a v1 type. Do we need to add defaults there? I notice there are no defaults at all for v1. cc @ahg-g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the v1 is not for component config, it is for the old policy api, which we will be deprecating in the next release.
api review |
one comment about forbidding this config value in validation if the feature is off, then lgtm |
/approve /hold for final scheduler lgtm and approval |
can you please squash the commits. |
done |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, cofyc, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/hold cancel |
@cofyc: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Review the full test history for this PR. Silence the bot with an |
What type of PR is this?
/kind api-change
What this PR does / why we need it:
Based on #96347. This adds api changes.
#96347 must be merged first.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: