-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validation for kube-scheduler configuration options #66799
Conversation
/assign @bsalamat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @noqcks!
Some early comments.
var errs []string | ||
ip, port, err := net.SplitHostPort(value) | ||
if err != nil { | ||
return append(errs, "must be a valid socket address format, (e.g. 0.0.0.0:10254)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should support IPv6 as well. So, you may want to augment your example.
@@ -511,3 +511,28 @@ func TestIsFullyQualifiedName(t *testing.T) { | |||
} | |||
} | |||
} | |||
|
|||
func TestIsValidSocketAddr(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add IPv6 examples as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
"time" | ||
) | ||
|
||
func TestValidateKubeSchedulerConfiguration(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a test that the old style (deprecated) options still work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some validations for deprecated options @bsalamat, but for some reason the function ListAlgorithmProviders
was not returning the proper values I would expect "ClusterAutoscalerProvider | DefaultProvider" and was instead returning an empty string 🤔 .
I'm not sure exactly how I could get this to work. I was planning on using factory.ListAlgorithmProviders()
in the Validate
function. If you have any idea of a way I could get the list of valid algorithm providers I would love to understand. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of different algorithm providers are not very important in my opinion. Actually, I think it is not used anymore (I am not sure though).
@@ -0,0 +1,109 @@ | |||
/* | |||
Copyright 2017 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2018
if len(cc.SchedulerName) == 0 { | ||
allErrs = append(allErrs, field.Required(field.NewPath("schedulerName"), "")) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the empty lines bellow {
s can be removed here in ValidateKubeSchedulerConfiguration()
.
allErrs = append(allErrs, field.Invalid(fldPath.Child("qps"), cc.QPS, "must be non-negative value")) | ||
} | ||
return allErrs | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, remove this empty line.
allErrs = append(allErrs, field.Required(fldPath.Child("kubeConfigFile"), "")) | ||
} | ||
if cc.QPS < 0 { | ||
allErrs = append(allErrs, field.Invalid(fldPath.Child("qps"), cc.QPS, "must be non-negative value")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must be
-> must be a
func ValidateSchedulerPolicySource(cc *componentconfig.SchedulerPolicySource, fldPath *field.Path) field.ErrorList { | ||
allErrs := field.ErrorList{} | ||
if cc.File == nil && cc.ConfigMap == nil { | ||
allErrs = append(allErrs, field.Invalid(fldPath.Child("file"), nil, "need to set atleast one of File or ConfigMap")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to set atleast one of File or ConfigMap
->
needs to set either File or ConfigMap
?
allErrs = append(allErrs, field.Invalid(fldPath.Child("leaseDuration"), cc.LeaseDuration, "must be greater than or equal to 0")) | ||
} | ||
if cc.RenewDeadline.Duration < 0 { | ||
allErrs = append(allErrs, field.Invalid(fldPath.Child("renewDeadline"), cc.LeaseDuration, "must be positive integer")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a positive
allErrs = append(allErrs, field.Invalid(fldPath.Child("renewDeadline"), cc.RenewDeadline, "RenewDeadline must be less than or equal to LeaseDuration")) | ||
} | ||
if cc.RetryPeriod.Duration <= 0 { | ||
allErrs = append(allErrs, field.Invalid(fldPath.Child("retryPeriod"), cc.RetryPeriod, "must be positive integer")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a positive
@@ -0,0 +1,199 @@ | |||
/* | |||
Copyright 2014 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2018
@noqcks thanks for the PR. |
/ok-to-test |
@noqcks
edit: have a look at the failing tests logs by clicking their |
the verify test indicates some small problems:
|
@noqcks the e2e tests flake from time to time. |
@neolit123 not sure how I can debug these failed e2e tests 🤔I can't find anything in the logs that shows an error on my side. lmk if you have any ideas. |
possibly flakes, still. /test pull-kubernetes-e2e-gce |
@noqcks |
I may try again a little later tonight. I checked a couple other PRs and they also seem to be failing e2e tests in the same manner. Thank you for the help |
/retest |
if cc.File == nil && cc.ConfigMap == nil { | ||
allErrs = append(allErrs, field.Invalid(fldPath.Child("file"), nil, "needs to set either File or ConfigMap")) | ||
if cc == nil { | ||
allErrs = append(allErrs, field.Invalid(fldPath.Child("policy"), nil, "need to set Policy")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be refactored as:
allErrs := field.ErrorList{}
if cc == nil {
return append(allErrs, field.Invalid(fldPath.Child("policy"), nil, "need to set Policy"))
}
if cc.File == nil && cc.ConfigMap == nil {
return append(allErrs, field.Invalid(fldPath.Child("file"), nil, "need to set either File or ConfigMap"))
}
return allErrs
to avoid indenting the } else
part.
@neolit123 |
|
||
func TestValidateDeprecatedKubeSchedulerConfiguration(t *testing.T) { | ||
scenarios := map[string]struct { | ||
isExpectedFailure bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: s/isExpectedFailure/expectedToFail/
"time" | ||
) | ||
|
||
func TestValidateKubeSchedulerConfiguration(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of different algorithm providers are not very important in my opinion. Actually, I think it is not used anymore (I am not sure though).
// ValidateSchedulerPolicySource ensures validation of the SchedulerPolicySource struct | ||
func ValidateSchedulerPolicySource(cc *componentconfig.SchedulerPolicySource, fldPath *field.Path) field.ErrorList { | ||
allErrs := field.ErrorList{} | ||
if cc.File == nil && cc.ConfigMap == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scheduler policy is not mandatory. If no policy is provided, the scheduler uses its own default config.
// ValidateClientConnectionConfiguration ensures validation of the ClientConnectionConfiguration struct | ||
func ValidateClientConnectionConfiguration(cc *componentconfig.ClientConnectionConfiguration, fldPath *field.Path) field.ErrorList { | ||
allErrs := field.ErrorList{} | ||
if len(cc.KubeConfigFile) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think KubeConfigFile is mandatory either.
// ValidateKubeSchedulerLeaderElectionConfiguration ensures validation of the KubeSchedulerLeaderElectionConfiguration struct | ||
func ValidateKubeSchedulerLeaderElectionConfiguration(cc *componentconfig.KubeSchedulerLeaderElectionConfiguration, fldPath *field.Path) field.ErrorList { | ||
allErrs := field.ErrorList{} | ||
allErrs = append(allErrs, ValidateLeaderElectionConfiguration(&cc.LeaderElectionConfiguration, field.NewPath("leaderElectionConfiguration"))...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should validate leader election config only when leader election is enabled.
// ValidateLeaderElectionConfiguration ensures validation of the LeaderElectionConfiguration struct | ||
func ValidateLeaderElectionConfiguration(cc *componentconfig.LeaderElectionConfiguration, fldPath *field.Path) field.ErrorList { | ||
allErrs := field.ErrorList{} | ||
if cc.LeaseDuration.Duration <= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if 0 is a valid Duration. If it is not, the error message at the next line should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no as well: https://github.com/kubernetes/kubernetes/pull/64218/files
will change
if cc.RenewDeadline.Duration > cc.LeaseDuration.Duration { | ||
allErrs = append(allErrs, field.Invalid(fldPath.Child("renewDeadline"), cc.RenewDeadline, "RenewDeadline must be less than or equal to LeaseDuration")) | ||
} | ||
if cc.RetryPeriod.Duration <= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to the previous comment.
if !cc.LeaderElect { | ||
return allErrs | ||
} | ||
if cc.LeaseDuration.Duration < 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the error condition should remain as cc.LeaseDuration.Duration <= 0
here and below.
ready for another review on this I believe @bsalamat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @noqcks! Mostly minor comments.
for _, msg := range validation.IsValidSocketAddr(cc.MetricsBindAddress) { | ||
allErrs = append(allErrs, field.Invalid(field.NewPath("metricsBindAddress"), cc.MetricsBindAddress, msg)) | ||
} | ||
if 0 <= cc.HardPodAffinitySymmetricWeight && cc.HardPodAffinitySymmetricWeight >= 100 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 and 100 are valid. So, you should replace <= and >= with < and >.
return allErrs | ||
} | ||
if cc.LeaseDuration.Duration <= 0 { | ||
allErrs = append(allErrs, field.Invalid(fldPath.Child("leaseDuration"), cc.LeaseDuration, "must be non-negative")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error message states that the value "must be non-negative". Non-negative includes zero, but zero causes an error. So, the error message should be changed to "must be positive" or "must be larger than zero". The same problem exists in the following checks as well.
func ValidateKubeSchedulerLeaderElectionConfiguration(cc *componentconfig.KubeSchedulerLeaderElectionConfiguration, fldPath *field.Path) field.ErrorList { | ||
allErrs := field.ErrorList{} | ||
allErrs = append(allErrs, ValidateLeaderElectionConfiguration(&cc.LeaderElectionConfiguration, field.NewPath("leaderElectionConfiguration"))...) | ||
if len(cc.LockObjectNamespace) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When cc.LeaderElect
is false, we should skip these two checks too.
if cc.RetryPeriod.Duration <= 0 { | ||
allErrs = append(allErrs, field.Invalid(fldPath.Child("retryPeriod"), cc.RetryPeriod, "must be non-negative")) | ||
} | ||
if cc.LeaseDuration.Duration <= cc.RenewDeadline.Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LeaseDuration
and RenewDeadline
can be equal and that's not an error. Please replace <= by <.
}, | ||
} | ||
|
||
renewDeadlineExceedsLeaseDuration := validConfig.DeepCopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add test cases for renewDeadline and LeaseDuration being zero as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
unsubsribe
…On Wed, Aug 8, 2018 at 9:42 PM k8s-ci-robot ***@***.***> wrote:
@noqcks <https://github.com/noqcks>: The following test *failed*, say
/retest to rerun them all:
Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-100-performance 743a77e
<743a77e>
link
<https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/66799/pull-kubernetes-e2e-gce-100-performance/15524/> /test
pull-kubernetes-e2e-gce-100-performance
Full PR test history <https://k8s-gubernator.appspot.com/pr/66799>. Your
PR dashboard <https://k8s-gubernator.appspot.com/pr/noqcks>. Please help
us cut down on flakes by linking to
<https://git.k8s.io/community/contributors/devel/flaky-tests.md#filing-issues-for-flaky-tests>
an open issue
<https://github.com/kubernetes/kubernetes/issues?q=is:issue+is:open> when
you hit one in your PR.
Instructions for interacting with me using PR comments are available here
<https://git.k8s.io/community/contributors/guide/pull-requests.md>. If
you have questions or suggestions related to my behavior, please file an
issue against the kubernetes/test-infra
<https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:>
repository. I understand the commands that are listed here
<https://go.k8s.io/bot-commands>.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#66799 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALS8Mh0N3HngqJT23oWT35LIpEHT2Yhjks5uO72lgaJpZM4VnInb>
.
|
@luxas ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks!
allErrs = append(allErrs, field.Invalid(field.NewPath("metricsBindAddress"), cc.MetricsBindAddress, msg)) | ||
} | ||
if cc.HardPodAffinitySymmetricWeight < 0 || cc.HardPodAffinitySymmetricWeight > 100 { | ||
allErrs = append(allErrs, field.Invalid(field.NewPath("hardPodAffinitySymmetricWeight"), cc.HardPodAffinitySymmetricWeight, "not in valid range 0-100")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this range documented or enforced today? It is described in the API type field godoc, but not in the CLI flag help, which is where most users are currently setting it.
- what is the current behavior if a negative number is set via the CLI?
- what is the current behavior if a number over 100 is set via the CLI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also add that to the flag help. if setting this to a value less than 0 or greater than 100 does not prevent the scheduler from working today, then I wouldn't make this a fatal error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in 3f7830f6826b0ac8b5170f00014d4aaee3fb621f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if setting this to a value less than 0 or greater than 100 does not prevent the scheduler from working today, then I wouldn't make this a fatal error.
can you comment on this? @bsalamat, do you know what the effect of setting this to a value outside the 0-100 range is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt Scheduler has a check to ensure that this value is between 1 and 100. This is done at the scheduler creation. So, it was not possible to provide a value out of this range before this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean the allowed range is 1-100 or 0-100? Or is 0 a marker value that means to disable that weight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 is changed to the default weight in our defaulting logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am referring to this piece of code:
obj.HardPodAffinitySymmetricWeight = api.DefaultHardPodAffinitySymmetricWeight |
} | ||
if cc.LeaseDuration.Duration < cc.RenewDeadline.Duration { | ||
allErrs = append(allErrs, field.Invalid(fldPath.Child("leaseDuration"), cc.RenewDeadline, "LeaseDuration must be greater than RenewDeadline")) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is ResourceLock
not required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in a5554d8dc7b46622ed8bb3c9fffe21a0ea247f94
func ValidateClientConnectionConfiguration(cc *config.ClientConnectionConfiguration, fldPath *field.Path) field.ErrorList { | ||
allErrs := field.ErrorList{} | ||
if cc.QPS < 0 { | ||
allErrs = append(allErrs, field.Invalid(fldPath.Child("qps"), cc.QPS, "must be non-negative")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not accurate... a negative number disables QPS rate limiting today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the only validation I see around the client config struct today:
kubernetes/pkg/proxy/apis/config/validation/validation.go
Lines 190 to 194 in 1138727
func validateClientConnectionConfiguration(config apimachineryconfig.ClientConnectionConfiguration, fldPath *field.Path) field.ErrorList { | |
allErrs := field.ErrorList{} | |
allErrs = append(allErrs, apivalidation.ValidateNonnegativeField(int64(config.Burst), fldPath.Child("Burst"))...) | |
return allErrs | |
} |
that could be useful to promote to a reusable method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll fix.
For Burst
, is there any range that's not accepted?
} | ||
|
||
qpsLessThanZero := validConfig.DeepCopy() | ||
qpsLessThanZero.QPS = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
negative numbers are accepted today and disable client-side rate limiting. please add a test ensuring this continues to be accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in 125f64b
@liggitt ready for another review |
One last question on the 0-100 range, might need a doc update depending on what is actually allowed. Lgtm otherwise after a squash |
/approve will leave lgtm to @bsalamat after squash |
adding validation for componentconfig adding validation to cmd kube-scheduler Add support for ipv6 in IsValidSocketAddr function updating copyright date in componentconfig/validation/validation.go updating copyright date in componentconfig/validation/validation_test.go adding validation for cli options adding BUILD files updating validate function to return []errors in cmd/kube-scheduler ok, really returning []error this time adding comments for exported componentconfig Validation functions silly me, not checking structs along the way :'( refactor to avoid else statement moving policy nil check up one function rejigging some deprecated cmd validations stumbling my way around validation slowly but surely updating according to review from @bsalamat - not validating leader election config unless leader election is enabled - leader election time values cannot be zero - removing validation for KubeConfigFile - removing validation for scheduler policy leader elect options should be non-negative adding test cases for renewDeadline and leaseDuration being zero fixing logic in componentconfig validation 😅 removing KubeConfigFile reference from tests as it was removed in master kubernetes@2ff9bd6 removing bogus space after var assignment adding more tests for componentconfig based on feedback making updates to validation because types were moved on master update bazel build adding validation for staging/apimachinery adding validation for staging/apiserver adding fieldPaths for staging validations moving staging validations out of componentconfig updating test case scenario for staging/apimachinery ./hack/update-bazel.sh moving kube-scheduler validations from componentconfig ./hack/update-bazel.sh removing non-negative check for QPS resourceLock required adding HardPodAffinitySymmetricWeight 0-100 range to cmd flag help section
squashed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bsalamat, liggitt, luxas, noqcks The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. |
What this PR does / why we need it: This adds validation to the kube-scheduler so that we're not accepting bogus values to the kube-scheduler. As requested by @bsalamat in issue #66743
Which issue(s) this PR fixes:
Fixes #66743
Special notes for your reviewer:
Release note: