Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 2014360: [4.7z] Scale fixes for pods/exgws #799

Merged
merged 6 commits into from Nov 6, 2021

Conversation

trozet
Copy link
Contributor

@trozet trozet commented Oct 15, 2021

Also includes missing exgw fixes.

trozet and others added 6 commits October 14, 2021 22:38
Address set operations like add and remove are idempotent. We can get
away with only RLocking there, which will greatly improve pod add
performance. There is also no need to store the ips in the addressSet
struct.

Signed-off-by: Tim Rozet <trozet@redhat.com>
This happens when the pod was already created but a new event of the pod
is generated. I managed to see it after a ovnkube-master manual restart.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
(cherry picked from commit 7828dff)
When a gw pod gets the external gateway annotation, it adds the specific
routes to the external gateway for existing pods, but it does not remove
the SNAT that was added when the pod was created.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
(cherry picked from commit 8783628)
When adding routes to pod, we fail inside the inner loop returning an
error. What happens is that if the pod has two ip addresses, and the gw
is set only for the second address, the function will mistakenly return
an error.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
(cherry picked from commit d2e0593)
(cherry picked from commit fae2540)
Previously nsInfo was holding not only a map of gateways per namespace,
but all of the routes per pod in an external gateway enabled namespace.
This means that during all external gateway route adds/deletes nsInfo
would need to be locked. This creates heavy contention in cluster
specifically using external gateway functionality.

This breaks out the pod routes portion into its own cache, which has
individual locks on a per pod basis. This allows exgw routes to be added
and removed without needing nsInfo lock. Additionally, since locks are
on a per pod basis, it provides less overall contention across the
cache.

Signed-off-by: Tim Rozet <trozet@redhat.com>
(cherry picked from commit c6db422)
When a pod n number of gateways there will be n number of calls to create
the same 501 policy. This commit reduces it to a single call.

Signed-off-by: Tim Rozet <trozet@redhat.com>
(cherry picked from commit dd836a7)
@openshift-ci openshift-ci bot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Oct 15, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 15, 2021

@trozet: This pull request references Bugzilla bug 2014360, which is invalid:

  • expected dependent Bugzilla bug 2014332 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 2014360: Scale fixes for pod and external gateways

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Oct 15, 2021
@trozet
Copy link
Contributor Author

trozet commented Oct 15, 2021

/assign @fedepaol

@trozet
Copy link
Contributor Author

trozet commented Oct 15, 2021

/assign @dcbw

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 15, 2021
@dcbw dcbw changed the title Bug 2014360: Scale fixes for pod and external gateways Bug 2014360: [4.7z] Scale fixes for pods/exgws Oct 15, 2021
@trozet
Copy link
Contributor Author

trozet commented Oct 16, 2021

/retest

@fedepaol
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2021
@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 19, 2021

@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:

  • expected dependent Bugzilla bug 2014332 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mffiedler
Copy link

Failed verification on a cluster-bot cluster built from this PR (similar to the 4.9 bug 1997072 - see that bz for must-gather).

Cluster was a 120 node OVN cluster on AWS and the workload was node-density light. Many FailedCreatePodSandBox events with reason "timed out waiting for annotations" are seen and pods take a long time for all to go Running.

On 4.10 latest nightly, the issue can not be reproduced - no annotation timeout events for node-density light in the same cluster configuration

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 20, 2021

@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:

  • expected dependent Bugzilla bug 2014332 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 21, 2021

@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:

  • expected dependent Bugzilla bug 2014332 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 22, 2021

@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:

  • expected dependent Bugzilla bug 2014332 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 1, 2021

@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:

  • expected dependent Bugzilla bug 2014332 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 2, 2021

@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:

  • expected dependent Bugzilla bug 2014332 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 3, 2021

@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:

  • expected dependent Bugzilla bug 2014332 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 4, 2021

@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:

  • expected dependent Bugzilla bug 2014332 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 5, 2021

@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:

  • expected dependent Bugzilla bug 2014332 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mffiedler
Copy link

Verified on cluster-bot AWS cluster build from this PR using the workload described in https://bugzilla.redhat.com/show_bug.cgi?id=2014360#c4. No annotation timeout errors and all pods came to Running state

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Nov 5, 2021
@trozet
Copy link
Contributor Author

trozet commented Nov 5, 2021

/label backport-risk-assessed

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Nov 5, 2021
@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 6, 2021

@openshift-bot: This pull request references Bugzilla bug 2014360, which is invalid:

  • expected dependent Bugzilla bug 2014332 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mffiedler
Copy link

/bugzilla refresh

@openshift-ci openshift-ci bot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Nov 6, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 6, 2021

@mffiedler: This pull request references Bugzilla bug 2014360, which is valid.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.z) matches configured target release for branch (4.7.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 2014332 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE))
  • dependent Bugzilla bug 2014332 targets the "4.8.z" release, which is one of the valid target releases: 4.8.0, 4.8.z
  • bug has dependents

Requesting review from QA contact:
/cc @mffiedler

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Nov 6, 2021
@openshift-ci openshift-ci bot requested a review from mffiedler November 6, 2021 18:36
@mffiedler
Copy link

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Nov 6, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 6, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fedepaol, mffiedler, trozet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 8ca2b7c into openshift:release-4.7 Nov 6, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 6, 2021

@trozet: All pull requests linked via external trackers have merged:

Bugzilla bug 2014360 has been moved to the MODIFIED state.

In response to this:

Bug 2014360: [4.7z] Scale fixes for pods/exgws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants