Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 2005480: [4.8z] Remove waiting for namespace and namespace lock contention #760

Merged
merged 3 commits into from Oct 1, 2021

Conversation

trozet
Copy link
Contributor

@trozet trozet commented Sep 17, 2021

Removes waiting for namespace event to show up. Now it is just ensured when needed. Also changes nsinfo to use a RWMutex so that pod handlers can all simultaneously RLock and not be blocked trying to add many pods for a single namespace.

@openshift-ci openshift-ci bot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label Sep 17, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 17, 2021

@trozet: This pull request references Bugzilla bug 2005480, which is invalid:

  • expected dependent Bugzilla bug 2005464 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead
  • expected dependent Bugzilla bug 2005476 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead
  • expected dependent Bugzilla bug 2005464 to target a release in 4.9.0, but it targets "4.8.z" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 2005480: [4.8z] Remove waiting for namespace and namespace lock contention

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Sep 17, 2021
@openshift-ci openshift-ci bot requested review from dcbw and squeed September 17, 2021 21:01
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 17, 2021
@trozet
Copy link
Contributor Author

trozet commented Sep 17, 2021

First 3 commits are part of #743 and this change depends on them merging first.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 17, 2021
@dcbw
Copy link
Contributor

dcbw commented Sep 18, 2021

/retest
#743 merged

The pod handlers and egressgw code would wait for the namespace add
event before being able to actually create things. This wait could last
up to 10 seconds before failure, and possibly even longer depending on
how backed up the namespace handler is.

This patch removes waiting where it is unnecessary. Those cases are the
following:

 - Pod add only cares about the address sets for the namespace being
   available. We already create the address set if we detect it is nil
   during pod add so there is nothing we are really waiting for the ns
   event.
 - Egress GW waits for namespace to see what routes the annotation may
   have on the namespace for gateways. There is really no reason to wait
   as we can add the routes we know about, and then let the namespace
   add event cover any routes that were missing when it arrives.
 - Multicast enablement. This is configured via the namespace
   annotation. If it is missing because we get the pod add event first,
   the add namespace event will cover enabling multicast for all pods in
   the namespace.

Patch does not cover one case of waiting for namespace lock:

 - Network Policy ACL logging. The ACL logging levels for deny and allow
   are configured in the namespace. If the network policy does not wait,
   and the namespace event comes later, we have no method to iterate
   through all affected policies for that namespace and set there
   correct logging levels.

Signed-off-by: Tim Rozet <trozet@redhat.com>
(cherry picked from commit ea5a757)
nsInfo is one of the 2 main sources of contention during heavy pod add
operations. Moving this to an RW Mutex greatly improves performance
because there in most places we are able to only use an RLock for
nsInfo, which allows the multiple parallel pod handlers to be blocked
less often.

Signed-off-by: Tim Rozet <trozet@redhat.com>
(cherry picked from commit 98e02b2)
Egress firewall gets namespace lock still in 4.8z. Updated those calls
to aquire the lock appropriately.

In pods.go, exgw calls and hybrid overlay exgw calls also need to get
the lock. For hybrid overlay exgw, we still use waitForNamespace because
we cannot update a pod's routes from the namespace watcher, so we must
wait for the namespace in the pod handler.

Signed-off-by: Tim Rozet <trozet@redhat.com>
@trozet
Copy link
Contributor Author

trozet commented Sep 20, 2021

/hold cancel

@@ -179,12 +179,12 @@ func (oc *Controller) syncEgressFirewall(egressFirwalls []interface{}) {

func (oc *Controller) addEgressFirewall(egressFirewall *egressfirewallapi.EgressFirewall) error {
klog.Infof("Adding egressFirewall %s in namespace %s", egressFirewall.Name, egressFirewall.Namespace)
nsInfo, err := oc.waitForNamespaceLocked(egressFirewall.Namespace)
nsInfo, nsUnlock, err := oc.ensureNamespaceLocked(egressFirewall.Namespace, false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacobTanenbaum PTAL at all of the changes in this commit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trozet looks good to me for the changes to egressFirewall

@trozet
Copy link
Contributor Author

trozet commented Sep 20, 2021

@dcbw PTAL at the CARRY commit

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 20, 2021
@trozet
Copy link
Contributor Author

trozet commented Sep 21, 2021

/retest

@trozet
Copy link
Contributor Author

trozet commented Sep 21, 2021

/retest-required

@trozet
Copy link
Contributor Author

trozet commented Sep 22, 2021

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 22, 2021

@trozet: This pull request references Bugzilla bug 2005480, which is invalid:

  • expected dependent Bugzilla bug 2005464 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead
  • expected dependent Bugzilla bug 2005464 to target a release in 4.9.0, but it targets "4.8.z" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@trozet
Copy link
Contributor Author

trozet commented Sep 23, 2021

/retest

@trozet
Copy link
Contributor Author

trozet commented Sep 24, 2021

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 24, 2021

@trozet: This pull request references Bugzilla bug 2005480, which is invalid:

  • expected dependent Bugzilla bug 2005464 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is ON_QA instead
  • expected dependent Bugzilla bug 2005464 to target a release in 4.9.0, but it targets "4.8.z" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@trozet
Copy link
Contributor Author

trozet commented Sep 24, 2021

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 24, 2021

@trozet: This pull request references Bugzilla bug 2005480, which is invalid:

  • expected dependent Bugzilla bug 2005464 to target a release in 4.9.0, but it targets "4.8.z" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@trozet
Copy link
Contributor Author

trozet commented Sep 24, 2021

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 24, 2021

@trozet: This pull request references Bugzilla bug 2005480, which is invalid:

  • expected dependent Bugzilla bug 2005464 to target a release in 4.9.0, but it targets "4.8.z" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@trozet
Copy link
Contributor Author

trozet commented Sep 24, 2021

/bugzilla refresh

@openshift-ci openshift-ci bot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Sep 24, 2021
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

19 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@dhellmann
Copy link

/hold

Let's give the CI system a break while someone looks into these test failures.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 29, 2021
@dcbw
Copy link
Contributor

dcbw commented Sep 30, 2021

/test e2e-vsphere-ovn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 30, 2021

@trozet: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn fde73c1 link false /test e2e-gcp-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dcbw
Copy link
Contributor

dcbw commented Sep 30, 2021

/retest
to pick up new vsphere CDN certificates and fix RPM pull issues

@dcbw
Copy link
Contributor

dcbw commented Sep 30, 2021

/hold cancel
now that vsphere should be fixed

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 30, 2021
@knobunc knobunc added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Oct 1, 2021
@openshift-merge-robot openshift-merge-robot merged commit a0f8cfe into openshift:release-4.8 Oct 1, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 1, 2021

@trozet: All pull requests linked via external trackers have merged:

Bugzilla bug 2005480 has been moved to the MODIFIED state.

In response to this:

Bug 2005480: [4.8z] Remove waiting for namespace and namespace lock contention

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants