New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2014360: [4.7z] Scale fixes for pods/exgws #799
Conversation
Address set operations like add and remove are idempotent. We can get away with only RLocking there, which will greatly improve pod add performance. There is also no need to store the ips in the addressSet struct. Signed-off-by: Tim Rozet <trozet@redhat.com>
This happens when the pod was already created but a new event of the pod is generated. I managed to see it after a ovnkube-master manual restart. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com> (cherry picked from commit 7828dff)
When a gw pod gets the external gateway annotation, it adds the specific routes to the external gateway for existing pods, but it does not remove the SNAT that was added when the pod was created. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com> (cherry picked from commit 8783628)
When adding routes to pod, we fail inside the inner loop returning an error. What happens is that if the pod has two ip addresses, and the gw is set only for the second address, the function will mistakenly return an error. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com> (cherry picked from commit d2e0593) (cherry picked from commit fae2540)
Previously nsInfo was holding not only a map of gateways per namespace, but all of the routes per pod in an external gateway enabled namespace. This means that during all external gateway route adds/deletes nsInfo would need to be locked. This creates heavy contention in cluster specifically using external gateway functionality. This breaks out the pod routes portion into its own cache, which has individual locks on a per pod basis. This allows exgw routes to be added and removed without needing nsInfo lock. Additionally, since locks are on a per pod basis, it provides less overall contention across the cache. Signed-off-by: Tim Rozet <trozet@redhat.com> (cherry picked from commit c6db422)
When a pod n number of gateways there will be n number of calls to create the same 501 policy. This commit reduces it to a single call. Signed-off-by: Tim Rozet <trozet@redhat.com> (cherry picked from commit dd836a7)
@trozet: This pull request references Bugzilla bug 2014360, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @fedepaol |
/assign @dcbw |
/retest |
/lgtm |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Failed verification on a cluster-bot cluster built from this PR (similar to the 4.9 bug 1997072 - see that bz for must-gather). Cluster was a 120 node OVN cluster on AWS and the workload was node-density light. Many FailedCreatePodSandBox events with reason "timed out waiting for annotations" are seen and pods take a long time for all to go Running. On 4.10 latest nightly, the issue can not be reproduced - no annotation timeout events for node-density light in the same cluster configuration |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Verified on cluster-bot AWS cluster build from this PR using the workload described in https://bugzilla.redhat.com/show_bug.cgi?id=2014360#c4. No annotation timeout errors and all pods came to Running state /label qe-approved |
/label backport-risk-assessed |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@mffiedler: This pull request references Bugzilla bug 2014360, which is valid. 6 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/label cherry-pick-approved |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fedepaol, mffiedler, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@trozet: All pull requests linked via external trackers have merged: Bugzilla bug 2014360 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Also includes missing exgw fixes.