Bug 1896958: NetworkPolicy performance (pod caching) #226

danwinship · 2020-12-01T15:21:16Z

Trying to improve NetworkPolicy performance/memory usage in large clusters with lots of policies and lots of changes...

/cc @squeed @juanluisvaladas @JacobTanenbaum

openshift-ci-robot · 2020-12-01T15:21:24Z

@danwinship: This pull request references Bugzilla bug 1896958, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.7.0) matches configured target release for branch (4.7.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

WIP Bug 1896958: NetworkPolicy performance (pod caching)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pkg/network/node/networkpolicy.go

squeed · 2020-12-01T15:35:36Z

pkg/network/node/networkpolicy.go

 	np.lock.Lock()
 	defer np.lock.Unlock()

-	delete(np.pods, pod.UID)
-	np.refreshNetworkPolicies(refreshForPods)
+	np.refreshPodNetworkPolicies(pod)


Does this need to be synchronous?

do you have any more specific concern than that?

Note that this is necessarily completely asynchronous with respect to CNI pod creation/deletion anyway

Right, but it blocks the pod informer from processing deltas.

oh, I see, refreshPodNetworkPolicies is already asynchronous - didn't quite grok it.

juanluisvaladas

Looks really good, I have a few nitpicks but nothing really relevant given the urgency.

pkg/network/node/networkpolicy.go

juanluisvaladas · 2020-12-01T15:55:35Z

pkg/network/node/networkpolicy.go

-			ips = append(ips, pod.Status.PodIP)
-		}
+
+	pods, err := np.node.kubeInformers.Core().V1().Pods().Lister().Pods(npns.name).List(sel)


Huh, this was bad, I wonder if the namespaceIndexer alone might be enough to reduce the processing time enough to prevent the memory increase, or at least make it significantly better.

pkg/network/node/networkpolicy.go

Rather than invoking the informer handlers directly, use a fake client and actually create/delete objects and let the informers be invoked normally. (In preparation for making use of the informer caches from the handlers.) Additionally, use a dummied-out BoundedFrequencyRunner to verify that syncs occur as expected.

Especially, we were previously copying all of the pods rather than just keeping pointers to the objects in the cache (probably a leftover from very old pre-shared-informer code). This may also fix leaks when pods are deleted and recreated, since informers apparently compress events based on namespace+name, not UID, so a delete+recreate would be compressed to an update, and we'd never get a delete for the old UID.

When syncing multiple namespaces, do them all in a single OVS transaction rather than a transaction per namespace

…cking In large clusters, recalculating networkpolicies after pod/namespace changes may take a lot of effort. Additionally, in some cases we may end up unnecessarily recalculating multiple times before pushing changes to OVS. Fix this by moving the recalculating step into the BoundedFrequencyRunner's thread, doing it just before we push the updates to OVS.

danwinship · 2020-12-01T19:43:49Z

/test help

openshift-ci-robot · 2020-12-01T19:44:07Z

@danwinship: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

/test e2e-aws
/test e2e-aws-multitenant
/test e2e-aws-upgrade
/test e2e-gcp
/test images
/test unit
/test verify
/test verify-deps

Use /test all to run the following jobs:

pull-ci-openshift-sdn-master-e2e-aws
pull-ci-openshift-sdn-master-e2e-aws-upgrade
pull-ci-openshift-sdn-master-e2e-gcp
pull-ci-openshift-sdn-master-images
pull-ci-openshift-sdn-master-unit
pull-ci-openshift-sdn-master-verify
pull-ci-openshift-sdn-master-verify-deps

In response to this:

/test help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

danwinship · 2020-12-01T23:42:12Z

CI is a dumpster fire but I brought up a cluster with this PR and ran openshift-tests by hand. All the NetworkPolicy tests passed.

openshift-ci-robot · 2020-12-01T23:42:35Z

@danwinship: This pull request references Bugzilla bug 1896958, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.7.0) matches configured target release for branch (4.7.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1896958: NetworkPolicy performance (pod caching)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

juanluisvaladas · 2020-12-02T02:14:37Z

/lgtm

openshift-ci-robot · 2020-12-02T02:15:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, juanluisvaladas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [danwinship]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

danwinship · 2020-12-02T12:54:03Z

/retest

squeed · 2020-12-02T13:19:04Z

pkg/network/node/networkpolicy.go

@@ -65,7 +68,8 @@ type npNamespace struct {
 type npPolicy struct {
 	policy            networkingv1.NetworkPolicy
 	watchesNamespaces bool
-	watchesPods       bool
+	watchesAllPods    bool


Want to leave a docblock indicating what the watchesFoo variables mean?

squeed · 2020-12-02T13:33:02Z

pkg/network/node/networkpolicy.go

 			}
 		}
-		if changed && npns.inUse {
+		if npns.mustRecalculate && npns.inUse {
 			np.syncNamespace(npns)


Does it make sense to just set npns.mustSync on all relevant namespaces, and trigger the runner once? I don't see much advantage to "splitting out" the transaction.

openshift-bot · 2020-12-02T15:56:28Z

/retest