-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2090816: Make bootstrap timeout configurable #5979
Conversation
We add a new `bootstrap-timeout` flag to the `create cluster` command.
@honza: This pull request references Bugzilla bug 2090816, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (augol@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
As someone who's mostly used the installer for baremetal (where the delays related to the environment can vary a lot) this makes sense to me (although I suspect we'd ideally want to expose some way to control all of the timeouts, not only the bootstrap-complete one). However previous feedback from @patrickdillon indicated there may be an installer team preference to just increasing the hard-coded timeout (just for baremetal) similar to #2979 Personally I prefer the configurable timeout approach, since it avoids making installs fail-slow when there are not specific issues (like slow POST times, network bandwidth constraints etc), and also it's hard to choose any one hard-coded value that will always work on-prem, because the delays can vary so much unlike in typical cloud scenarios where things are more deterministic. |
I could be wrong but do we have a default value for this configurable bootstrap-timeout flag? |
|
Cool! I missed that. |
Thanks for putting this together with the alternative #5981. In general, this timeout should only be for baremetal (or perhaps other on-prem environs too)--not cloud. Hopefully the reason is clear, but I can explain further if needed. That cross-platform approach is one issue with this implementation. I'm also trying to check whether this flag applies to all $ ./openshift-install create cluster --help
Create an OpenShift cluster
Usage:
openshift-install create cluster [flags]
Flags:
-h, --help help for cluster
Global Flags:
--dir string assets directory (default ".")
--log-level string log level (e.g. "debug | info | warn | error") (default "info")
$ ./openshift-install create cluster --dir gcp-test --bootstrap-timeout 30
FATA[0000] Error executing openshift-install: unknown flag: --bootstrap-timeout These reasons suggest to me that a flag is not the right approach for this.
I take @hardys point here, and I agree that we don't want to degrade the general experience to enable potentially a handful of environments. I will point out that the downside for a configurable timeout is that it would still require configuration for those problematic environments to work, rather than working out of the box. But with the understanding those troublesome environments are the exception and not the norm, we agree it is better to require the extra configuration for those failing environments rather than potentially fail slow in all cases. If the flag solution is not going to work, I see two potential solutions for allowing this to be configurable:
|
I'm not sure what happened there.
The new flag is specific to the |
@honza: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Yes, we maintain a firm stance that environment variables are only suitable for installer development uses. Having read the discussion in this PR I guess a flag scoped to |
Definitely a mistake on my end!
Thanks Scott. As environment variables are out of the question, I think the install config seems like the only possible option for providing configurable timeouts. Even if we do allow the timeout to be configurable across platforms (which I don't think is a good idea), we don't generally have install configuration through flags. At the minimum this would create a documentation problem. |
Most configuration is done via the install-config, which describes the desired state of the cluster - but a flag seems like a reasonable approach for something which modifies the default behavior of the installer? I guess we could make this platform-specific (e.g only enabled via baremetal tag for the openshift-baremetal-install binary?), but are we sure users will never need this escape-hatch option on other platforms? (for example I can imagine vsphere/openstack and any on-prem platforms having less deterministic performance than most clouds) |
/lgtm |
@honza Can we update documentation within this PR? |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sdodson The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label backport-risk-assessed |
/hold |
|
||
// commandF can be used to alter the `command` above; this is necessary for | ||
// things like persistent flags | ||
commandF func(cmd *cobra.Command) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we need this here? It doesn't look like we need this for all commands so why don't we add it only in the newCreateCmd() function?
func newCreateCmd() *cobra.Command {
cmd := &cobra.Command{
Use: "create",
Short: "Create part of an OpenShift cluster",
RunE: func(cmd *cobra.Command, args []string) error {
return cmd.Help()
},
}
cmd.PersistentFlags().IntVar(&createOpts.bootstrapTimeout, "bootstrap-timeout", 20, "Bootstrap timeout in minutes")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a general solution: each subcommand can now modify the cobra.Command
instance independently. Setting the bootstrap timeout only makes sense when creating the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it my bad. Was hoping to find a way to not add commandF for just one use case but I guess we can't.
Closing this as #5981 has merged |
@honza: This pull request references Bugzilla bug 2090816. The bug has been updated to no longer refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We add a new
bootstrap-timeout
flag to thecreate cluster
command.