New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2100249: Revert "Bug 2082599: add upper bound to number of failed attempts" #1161
Bug 2100249: Revert "Bug 2082599: add upper bound to number of failed attempts" #1161
Conversation
@ricky-rav FYI |
@trozet: This pull request references Bugzilla bug 2100249, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
lgtm |
/lgtm Surya just explained to me what happened with the pods being retried several times at each pod update event and getting all retried upon node updates at startup. Ugh! Sorry for not catching this! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ricky-rav, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/retest-required |
Looking the failing job makes mt think we may need an @dcbw could you please help us out with that? This is a blocker for 4.11, so we should not hold this up for a known broken job. |
/override ci/prow/e2e-metal-ipi-ovn-ipv6 |
@knobunc: Overrode contexts on behalf of knobunc: ci/prow/e2e-metal-ipi-ovn-ipv6 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@trozet: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@trozet: All pull requests linked via external trackers have merged: Bugzilla bug 2100249 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.11 |
@tssurya: new pull request created: #1165 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Reverts #1147
This fix introduced an upper limit on the number of retries in ovnk for failed attempts. However, due to some compounding issues it causes pods to never come up. The scenario is like this:
These steps happen over a period of 30 seconds, just before the node is finally ready. We need a few adjustments to the logic in ovnk before we can introduce an upper limit.