-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sched: ensure --leader-elect* CLI args are honored #105712
Conversation
@Huang-Wei: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Huang-Wei The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
sound like the issue is only occurred with this case, so is that better to compare the CLI args with the ComponentConfig and throw some warning if it it not consistent? |
if leaderelection.Changed("leader-elect") { | ||
// Only honor other leader elect arguments when --leader-elect is set to true. | ||
// Otherwise, invalidate the entire CC object. | ||
if o.ComponentConfig.LeaderElection.LeaderElect { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why isn't this handled by the function that processes LeaderElection
? Wouldn't that function break early when LeaderElect
is false?
I think we shouldn't have "business-logic" in the arguments parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question.
The old logic checks if leaderlection
is nil, and compose a leader election obj as the final step to scheduler:
kubernetes/cmd/kube-scheduler/app/server.go
Lines 199 to 200 in 4c9c761
// If leader election is enabled, runCommand via LeaderElector until done and exit. | |
if cc.LeaderElection != nil { |
But this is problematic if the passed-in leaderlection
is already incorrect, which is exactly the case of this issue, triggered by providing a CC config file:
kubernetes/cmd/kube-scheduler/app/options/options.go
Lines 181 to 191 in a78e313
} else { | |
cfg, err := loadConfigFromFile(o.ConfigFile) | |
if err != nil { | |
return err | |
} | |
if err := validation.ValidateKubeSchedulerConfiguration(cfg); err != nil { | |
return err | |
} | |
c.ComponentConfig = *cfg | |
} |
The above logic discarded all pre-parsed leaderlection
object, just use the loaded leaderlection
from the CC config file.
So in the PR, I delayed the "correction" step (Complete()) after loadConfigFromFile()
, to build up an accurate leaderelection
obj.
I think we shouldn't have "business-logic" in the arguments parsing.
Is that a practice commonly adopted? If yes, I can adapt to it. In that case, the tests (we'd expect semi-correct obj) need to be updated as well. Let me know your thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we make makeLeaderElectionConfig
in options.go just check the boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we make
makeLeaderElectionConfig
in options.go just check the boolean?
That doesn't make a difference. With this PR, the boolean (LeaderElection.LeaderElect
) value is already set correctly. It's just we should compose other portions strickly following the user's input even if it's legally inaccurate, or do some tailoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
If you have delayed the call to Complete
, isn't that all we need?
What I'm saying is that we can keep this function simple, always override all the flags if they are present.
Although, I thought we had other flags that took precedence over component config. If we don't anymore, perhaps this is working as expected and we just need to clarify in the flags that they are ignored if component config is specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have delayed the call to
Complete
, isn't that all we need?
yes.
Also, isn't this change also making the "deprecated" flags take precedence over component config?
You're right, deprecated flags should be ignored if --config is provided. I will make some changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: the doc says "This parameter is ignored if a config file is specified in --config.". It's more accurate to be worded as "This parameter is ignored if a config file is specified" because a bare config file would carry default values and those values will be favored over deprecated values, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will carry default values. I don't see a significant difference between the wordings. It talks about the config file, not necessarily that the field is explicitly enabled.
I'm wondering: is this a behavior that actually changed in 1.22? Or were leader elect flags already ignored if the config file was specified? If that's the case, the correct action would be to update the help texts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering: is this a behavior that actually changed in 1.22?
I can confirm leader elect was honored in 1.21.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code. PTAL.
Now we only need to particularly honor the CLI args that may conflict with CC args - i.e., just leader election args. For deprecated args, if --config file is not specified, they should be already filled automatically in cobra command parsing, which happens after instantiating a defaulting object. The current args evaluation flow is like this:
- step 1: instantiate a latest default CC obj
- step 2: (auto) override both deprecated and non-deprecated args to the aforementioned CC obj
- step 3.1: if --config is not specified, do nothing but keep the CC obj
- step 3.2: if --config is specified, compose a temp CC' obj by loading the config file. After that, apply the non-deprecated args and assign CC' to CC.
- step 4: use the CC obj to build scheduler instance
if leaderelection.Changed("leader-elect-resource-name") { | ||
cfg.LeaderElection.ResourceName = o.ComponentConfig.LeaderElection.ResourceName | ||
obtainLeaderElectionFn := func() { | ||
if leaderelection.Changed("leader-elect-lease-duration") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remind me why we need to use Changed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's b/c when doing CLI parsing, without Changed
, we don't know if a false
leaderlection value is specified by the user, or a CLI defaulted value:
kubernetes/staging/src/k8s.io/component-base/config/options/leaderelectionconfig.go
Lines 24 to 29 in a78e313
// BindLeaderElectionFlags binds the LeaderElectionConfiguration struct fields to a flagset | |
func BindLeaderElectionFlags(l *config.LeaderElectionConfiguration, fs *pflag.FlagSet) { | |
fs.BoolVar(&l.LeaderElect, "leader-elect", l.LeaderElect, ""+ | |
"Start a leader election client and gain leadership before "+ | |
"executing the main loop. Enable this when running replicated "+ | |
"components for high availability.") |
Not quite. Some potential issues (like by specifying --leader-elect=false only) are just not reported yet. I can add a sub-test without specifying --config. |
// Obtain CLI args related with leaderelection. Set them to cfg if specified in command line. | ||
leaderelection := nfs.FlagSet("leader election") | ||
func (o *Options) Complete(cfg *kubeschedulerconfig.KubeSchedulerConfiguration) { | ||
// Obtain non-deprecated CLI args that may conflict with ComponentConfig fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more precisely, the non-deprecated CLI here just means leaderelection
, the original comments looks more accurate.
If only the leaderelection
is the special one, it should be able to check is there any conflict between the CLI and config file, and show us something in the log or the console, not sure if it's worth it as we should also consider the defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more precisely, the non-deprecated CLI here just means
leaderelection
, the original comments looks more accurate.
I can use the original comment.
If only the
leaderelection
is the special one, it should be able to check is there any conflict between the CLI and config file, and show us something in the log or the console, not sure if it's worth it as we should also consider the defaults.
I don't see a big value here. We can evaluate whether the introduced complicity (you'll have to do comparings) outweighs the benefits later. But in any way, we shouldn't include it in this PR which is a bug fix and will be back-ported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Can you confirm if LogOrWriteConfig
includes the changes overridden by the flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alculquicondor Yes, I can confirm that.
@@ -80,9 +79,6 @@ kube-scheduler is the reference implementation. | |||
See [scheduling](https://kubernetes.io/docs/concepts/scheduling-eviction/) | |||
for more information about scheduling and the kube-scheduler component.`, | |||
RunE: func(cmd *cobra.Command, args []string) error { | |||
if err := opts.Complete(&namedFlagSets); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we don't need to call this for runs without component config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not any more. Now NewOptions() will obtain a latest defaulting obj, and the logic of honoring CLI args get placed after loading --config
.
4ce2e17
to
56d098c
Compare
/retest |
@@ -193,6 +222,124 @@ profiles: | |||
}, | |||
}, | |||
}, | |||
{ | |||
name: "leader election arg set to false, along with --config arg", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have a single test case will all the flags that should override values. Well, one with --config
and one without.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. sub-tests are merged now.
}, | ||
}, | ||
{ | ||
name: "leader election settings specified by ComponentConfig only", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case and the one below are good. They should stay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, you can squash
936a9fa
to
3c230af
Compare
/lgtm |
What type of PR is this?
/kind bug
/sig scheduling
What this PR does / why we need it:
--leader-elect*
consists of a collection of CLI arguments. They were handled as individuals which ends up with buggy behavior like what #105704 described - CLI arg gets overwritten by internal defaulted ComponentConfig.LeaderElection. Moreover, some other potential bugs can also occur.Basically, there are 3 scenarios to specify leader elect related options:
This PR unifies the behavior to always honor CLI arguments over ComponentConfig.
Which issue(s) this PR fixes:
Fixes #105704
Special notes for your reviewer:
The fix should be backported to v1.22.
Does this PR introduce a user-facing change?