Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network policy: use shared pod selector address sets #3329

Merged
merged 5 commits into from Mar 17, 2023

Conversation

npinaeva
Copy link
Member

@npinaeva npinaeva commented Dec 21, 2022

Add PodSelectorAddressSet object, that should be used to manage network
policy peers with pod selector (network policy object is only
responsible for local pods and peer namespace-only handlers).

This should improve performance, since every peer selector will have only
1 address set to update (and 1 handler), instead of 1 address set per
NetworkPolicy[In/E]gressRule, and 1 handler per every pod selector.

Update syncNetworkPolicies to cleanup policies based on acls, and not
address sets, since address sets are not created for policies without
peers with selectors, and it doesn't cleanup default deny port groups.
New sync is based on acls, it will only skip empty policies without
any gress rules. This should be fixed later with proper ownership
indexing for port groups.
Note: default deny port groups not being deleted for non-existing namespace was recently noticed by RH customer during migration (case 03389949)

@coveralls
Copy link

coveralls commented Dec 21, 2022

Coverage Status

Changes Unknown when pulling 6fd7d9e on npinaeva:netpol-perf-updates into ** on ovn-org:master**.

@npinaeva npinaeva force-pushed the netpol-perf-updates branch 3 times, most recently from 6fd7d9e to 2b76f19 Compare December 22, 2022 12:16
@npinaeva npinaeva mentioned this pull request Jan 3, 2023
@npinaeva npinaeva force-pushed the netpol-perf-updates branch 3 times, most recently from e186b8a to 6685fe8 Compare March 3, 2023 16:14
@npinaeva npinaeva changed the title Network policy: use shared peer address sets Network policy: use shared pod selector address sets Mar 3, 2023
Copy link
Contributor

@trozet trozet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: i've only reviewed first 2 commits

// pod selector address sets
err = oc.cleanupPodSelectorAddressSets()
if err != nil {
return fmt.Errorf("default network controller start failed: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the caller should wrap an error with this kind of message since they are calling start on DNC. I think this error message should say "failed to clean up pod selector address sets: %w, err"

go-controller/pkg/ovn/pod_selector_address_set.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/pod_selector_address_set.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/pod_selector_address_set.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/policy.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/policy.go Show resolved Hide resolved

// backRefs is a map of objects that use this address set.
// keys must be unique for all possible users, e.g. for NetworkPolicy use (np *networkPolicy) getKeyWithKind()
backRefs map[string]bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backRefs looks like it is accessed without locking the struct itself. Should this be a syncmap? Should it be Rlocked before it is read?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it is always accessed with oc.podSelectorAddressSets.DoWithLock, since it is only changed on external calls, and handling may include create/delete the address set. I will add a comment for this field

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that coupling is not enforced outside of how you are using it today and it would be more prudent to protect the struct within itself rather than relying on an external entity.

go-controller/pkg/ovn/pod_selector_address_set.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/pod_selector_address_set.go Outdated Show resolved Hide resolved
Copy link
Contributor

@trozet trozet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename GetPodSelectorAddressSet -> EnsurePodSelectorAddressSet?

It will create the thing if missing, otherwise will just get so ensure seems more appropriate.

Also can we change the error message to not be the function names:

	if podSelector == nil {
		err = fmt.Errorf("GetPodSelectorAddressSet failed: pod selector is nil")
		return
	}
	if namespaceSelector == nil && namespace == "" {
		err = fmt.Errorf("GetPodSelectorAddressSet failed: namespace selector is nil and namespace is empty")
		return
	}

You can just return "namespace seleector is nil and namespace is empty". Then the caller should wrap the error and return "Failed to ensure pod selector address set: %w".

go-controller/pkg/ovn/default_network_controller.go Outdated Show resolved Hide resolved

// backRefs is a map of objects that use this address set.
// keys must be unique for all possible users, e.g. for NetworkPolicy use (np *networkPolicy) getKeyWithKind()
backRefs map[string]bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that coupling is not enforced outside of how you are using it today and it would be more prudent to protect the struct within itself rather than relying on an external entity.

v4AddressSetStr := strings.Join(v4AddressSets, ", ")
v4MatchStr = fmt.Sprintf("%s.%s == {%s}", "ip4", direction, v4AddressSetStr)
v4Match = fmt.Sprintf("%s.%s == {%s}", "ip4", direction, v4AddressSetStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Previously l3Match for gress with namespace selector that doesn't select
anything looked like "ip4.src == {}", now it will be
"ip4.src == {}"."

Do we really want to program ACLs that wont match on anything? In the previous behavior it made sense because address sets are populated without touching ACL again. But ACL will have to be updated to have the right address set once the selector finally matches something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the previous behaviour for namespace selector was

  1. create gress address set that will always be empty, use it in match
  2. add namespace address sets to that list on update

We can delete this empty-match acl, but that will complicate the code, since we need to delete the acl on update when no namespaces are matched, and we will need to track which acls got deleted. Now we just update all existing acls, and on namespace delete acl will be updated to empty match

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior here of ovn-controller when the match is an empty brace? Does it throw any error or warning, does it assume it does not match anything? @dceara ?

Copy link
Contributor

@dceara dceara Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ovn-controller complains, e.g.:

2023-03-15T11:06:40.219Z|00030|lflow|WARN|error parsing match "ip4.src == {}": Syntax error at `}' expecting constant.

So the flow will not be installed (and will obviously never match).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i will use the update instead of mutate for port group acl updates then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of extra clarification: the "ip4.src == {}" match can be replaced by "0" with the exact same effect - nothing will match.

go-controller/pkg/ovn/pod_selector_address_set_test.go Outdated Show resolved Hide resolved
@npinaeva
Copy link
Member Author

Can't reply on some of the comments again, so here are some quoted replies

Can we rename GetPodSelectorAddressSet -> EnsurePodSelectorAddressSet?

I didn't so that initially, because there is no other method like Get vs Ensure and the caller shouldn't care what happens on that call, but I guess it doesn't hurt to rename, since Ensure has some context in the ovn-k code

so you put thumbs up here but I'm not sure what you changed. I would have expected namespaceSelector to become a pointer.

labels.Selector is an interface, the change I did is leave nsSel nil if namespaceSelector is nil

var nsSel, podSel labels.Selector
	if namespaceSelector != nil {
		nsSel, err = metav1.LabelSelectorAsSelector(namespaceSelector)
		if err != nil {
			err = fmt.Errorf("can't parse namespace selector %v: %w", namespaceSelector, err)
			return
		}
	}

I think that coupling is not enforced outside of how you are using it today and it would be more prudent to protect the struct within itself rather than relying on an external entity.

not sure I understand, internal fields changes (together with creating and deleting the object itself) are only happening inside of oc.podSelectorAddressSets.DoWithLock, and that ensures the correct usage in general, so it is very tightly coupled, if that changes at some point, the whole locking around that object needs to change.

@trozet
Copy link
Contributor

trozet commented Mar 14, 2023

"not sure I understand, internal fields changes (together with creating and deleting the object itself) are only happening inside of oc.podSelectorAddressSets.DoWithLock, and that ensures the correct usage in general, so it is very tightly coupled, if that changes at some point, the whole locking around that object needs to change."

I think that DoWithLock is not always used, and that is the reason that you need a Lock on the struct itself. My issue is some fields inside of the struct you lock for changing, and others you do not because you rely on DoWithLock. This is confusing and I don't think we should assume some fields need lock and others do not based on who calls DoWithLock.

@npinaeva
Copy link
Member Author

the locking-related comments are hopefully addressed with the new PodSelectorAddrSetHandlerInfo struct
for the empty address set match I added the last commit to fix that, since it is a significant change compared to how network policy worked before

required, to unlock other handlers from the same namespace.

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
address sets for pod selector (network policy object is only
responsible for local pods and peer namespace-only handlers).
The locking mechanism is copied from networkPolicy.

Update deletre logic for completed pods: match collided pod
not only by podSeelctor, but also by namespace selector.

PeerPod functions were moved from policy.go and gress_policy.go.

Update syncNetworkPolicies to cleanup policies based on acls, and not
address sets, since address sets are not created for policies without
peers with selectors, and it doesn't cleanup default deny port groups.
New sync is based on acls, it will only skip empty policies without
any gress rules. This should be fixed later with proper ownership
indexing for port groups.

Rename metrics from peer/network_policy to pod_selector_address_set.

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
least one selector had peerAddressSet, and empty gress (allow all)
was identified by "gp.sizeOfAddressSet() > 0". Now we don't create
address sets like that, therefore a new hasPeerSelector field was added
to distinguish empty gress from the one that just doesn't have any
address sets added yet.

Previously l3Match for gress with namespace selector that doesn't select
anything looked like "ip4.src == {<empty address set>}", now it will be
"ip4.src == {}".

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>

update 2

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Add new functions to FakeAddressSetFactory for more insight into
existing address sets.
Add tests for PodSelectorAddressSet

Update Netpol-owned address sets to be shared.
Add netpol test that verifies that default deny port groups and port
groups for policies without peer selectors (IpBlock) are cleaned up
on sync.

Move completed pod test from policy_test to pod_selector_address_set_test,
simplify the test and make sure ip will be removed when collided pod
is not selected by the address set.

Update policy sync tests with existing policy in every namespace
to make sure port groups won't be deleted as stale before being updated.

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Previously "ip4.dst == {}" match was created, and ovn-controller
throws an error on such acl

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
@@ -1327,11 +1327,11 @@ func (oc *DefaultNetworkController) addNetworkPolicy(policy *knet.NetworkPolicy)
func (oc *DefaultNetworkController) buildNetworkPolicyACLs(np *networkPolicy, aclLogging *ACLLoggingLevels) []*nbdb.ACL {
acls := []*nbdb.ACL{}
for _, gp := range np.ingressPolicies {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: in the future we should make network policy actually have an update instead of delete+add

@trozet
Copy link
Contributor

trozet commented Mar 17, 2023

hmm this test failed in 2 different local gw jobs:

[Fail] Load Balancer Service Tests with MetalLB [It] Should ensure load balancer service works with 0 node ports when ETP=local 
/home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/service.go:831

[Fail] Load Balancer Service Tests with MetalLB [It] Should ensure load balancer service works with 0 node ports when ETP=local 
/home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/service.go:831

https://github.com/ovn-org/ovn-kubernetes/actions/runs/4438092578/jobs/7789311029?pr=3329
https://github.com/ovn-org/ovn-kubernetes/actions/runs/4438092578/jobs/7789311227?pr=3329

Not sure how this test would have anything to do with this PR, but going to re-run the failures to be safe.

@trozet trozet merged commit 43da7b1 into ovn-org:master Mar 17, 2023
20 of 21 checks passed
@npinaeva npinaeva deleted the netpol-perf-updates branch March 20, 2023 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants