Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump OVN Bootstrap timeout #690

Merged

Conversation

alexanderConstantinescu
Copy link
Contributor

@alexanderConstantinescu alexanderConstantinescu commented Jul 2, 2020

Today we wait for 1 minute for all nodes to be up before booting OVN. With "OVS on the system" there is additional configuration done by NetworkManager / MCO networking startup scripts and the likes, which might require longer timeouts than what we currently support. So, let's bump the time.

Reconciliation happens every 5 min, so I am setting the timeout to 280 seconds to not block reconciliation during bring up.

/assign @trozet

@alexanderConstantinescu
Copy link
Contributor Author

/assign @danwinship

@trozet
Copy link
Contributor

trozet commented Jul 3, 2020

/lgtm

@trozet
Copy link
Contributor

trozet commented Jul 3, 2020

/retest

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 3, 2020
@trozet
Copy link
Contributor

trozet commented Jul 3, 2020

@abhat @dcbw @danwinship can one of you please approve?

@trozet
Copy link
Contributor

trozet commented Jul 3, 2020

/retest

@@ -232,7 +232,7 @@ func boostrapOVN(kubeClient client.Client) (*bootstrap.BootstrapResult, error) {

controlPlaneReplicaCount, _ := strconv.Atoi(rcD.ControlPlane.Replicas)

err := wait.PollImmediate(2*time.Second, 60*time.Second, func() (bool, error) {
err := wait.PollImmediate(5*time.Second, 280*time.Second, func() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity,:

  1. any particular reason why we also bumped the polling interval to 5secs? - just to reflect the relativity wrt to the timeout interval?
  2. do we know approximately how long it takes with the new model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes
  2. No, neither I nor @trozet performed a real study on the time it normally takes, and what we should set these values to....but then again, that was never the case the first time we implemented this mechanism either. With OVS running on the node and attached to the primary NIC it has however been seen that nodes might boot a bit slower due to DHCP issues: Configure OVS NIC for OVN machine-config-operator#1860 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood, thanks for clarifying Alex.

@alexanderConstantinescu
Copy link
Contributor Author

/test e2e-gcp-ovn

@danwinship
Copy link
Contributor

/lgtm

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexanderConstantinescu, danwinship, trozet, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 5, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

8 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 5, 2020

@alexanderConstantinescu: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-metal-ipi 6a672d0 link /test e2e-metal-ipi
ci/prow/e2e-windows-hybrid-network 6a672d0 link /test e2e-windows-hybrid-network
ci/prow/e2e-azure 6a672d0 link /test e2e-azure
ci/prow/e2e-vsphere 6a672d0 link /test e2e-vsphere

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 33e5e38 into openshift:master Jul 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants