New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2014332: [4.8z] Scale fixes for pods/exgws #798
Conversation
Address set operations like add and remove are idempotent. We can get away with only RLocking there, which will greatly improve pod add performance. There is also no need to store the ips in the addressSet struct. Signed-off-by: Tim Rozet <trozet@redhat.com>
This happens when the pod was already created but a new event of the pod is generated. I managed to see it after a ovnkube-master manual restart. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com> (cherry picked from commit 7828dff)
@trozet: This pull request references Bugzilla bug 2014332, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @dcbw Dan please review this closely, there were quite a few merge conflicts. |
When a gw pod gets the external gateway annotation, it adds the specific routes to the external gateway for existing pods, but it does not remove the SNAT that was added when the pod was created. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com> (cherry picked from commit 8783628)
Previously nsInfo was holding not only a map of gateways per namespace, but all of the routes per pod in an external gateway enabled namespace. This means that during all external gateway route adds/deletes nsInfo would need to be locked. This creates heavy contention in cluster specifically using external gateway functionality. This breaks out the pod routes portion into its own cache, which has individual locks on a per pod basis. This allows exgw routes to be added and removed without needing nsInfo lock. Additionally, since locks are on a per pod basis, it provides less overall contention across the cache. Signed-off-by: Tim Rozet <trozet@redhat.com> (cherry picked from commit c6db422)
When a pod n number of gateways there will be n number of calls to create the same 501 policy. This commit reduces it to a single call. Signed-off-by: Tim Rozet <trozet@redhat.com> (cherry picked from commit dd836a7)
/assign @fedepaol |
/retest |
1 similar comment
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fedepaol, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014332, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla cc-qa |
@anuragthehatter: This pull request references Bugzilla bug 2014332, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Failed verification on a cluster-bot cluster built from this PR (similar to the 4.9 bug 1997072 - see that bz for must-gather). Cluster was a 120 node OVN cluster on AWS and the workload was node-density light. Many FailedCreatePodSandBox events with reason "timed out waiting for annotations" are seen and pods take a long time for all to go Running. On 4.10 latest nightly, the issue can not be reproduced - no annotation timeout events for node-density light in the same cluster configuration |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
2 similar comments
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014332, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@openshift-bot: This pull request references Bugzilla bug 2014332, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014332, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014332, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014332, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014332, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014332, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014332, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Tested on 4.8 cluster-bot cluster using the workload from https://bugzilla.redhat.com/show_bug.cgi?id=2014332#c6 |
/bugzilla refresh Recalculating validity in case the underlying Bugzilla bug has changed. |
@openshift-bot: This pull request references Bugzilla bug 2014332, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@dcbw: This pull request references Bugzilla bug 2014332, which is valid. 6 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/label backport-risk-assessed |
/label cherry-pick-approved |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@trozet: All pull requests linked via external trackers have merged: Bugzilla bug 2014332 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Performance and scale fixes with pods and multiple external gateways.