-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Egressfirewall use port groups #3968
Conversation
0ae5503
to
4f77aa9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@npinaeva could you please add a brief explanation or comparison to the PR message on how this reduces the number of OVS flows, for posterity.
go-controller/pkg/ovn/namespace.go
Outdated
// to add pod's IP to the namespace's address set. | ||
func (oc *DefaultNetworkController) addPodToNamespace(ns string, ips []*net.IPNet) (*gatewayInfo, map[string]gatewayInfo, []ovsdb.Operation, error) { | ||
// to add pod's IP to the namespace's address set and port group. | ||
func (oc *DefaultNetworkController) addPodToNamespace(ns string, ips []*net.IPNet, portUUID string) (*gatewayInfo, map[string]gatewayInfo, []ovsdb.Operation, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I would like to avoid caching and moving around the port UUID. I would like for libovsdbops to be able to take the port name instead. But this is more a note to self than a comment for this PR.
@@ -247,6 +277,15 @@ func (bnc *BaseNetworkController) ensureNamespaceLockedCommon(ns string, readOnl | |||
if err != nil { | |||
return nil, nil, fmt.Errorf("failed to create address set for namespace: %s, error: %v", ns, err) | |||
} | |||
|
|||
// namespace port groups are only used by egress firewall and multicast for now | |||
if bnc.multicastSupport || config.OVNKubernetesFeature.EnableEgressFirewall { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want a second opinion, I wouldn't bother with this and just have the port at all times. But maybe you have your reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, I thought about microshift to avoid extra db load (and I don't even know if those have multicast and EF disabled). But then I realized that this helps me to not rewrite all unit tests, because namespace handler would create a port group for every namespace, and all our tests check the db (might be a good reason to improve the unit test framework of course, but for now it requires a lot of changes). So I thought it doesn't make anything worse, but not a strong opinion :)
4f77aa9
to
ed02df8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the review @jcaamano , I resolved what was obvious and left some comments to discuss more complicated cases, ptal
@@ -247,6 +277,15 @@ func (bnc *BaseNetworkController) ensureNamespaceLockedCommon(ns string, readOnl | |||
if err != nil { | |||
return nil, nil, fmt.Errorf("failed to create address set for namespace: %s, error: %v", ns, err) | |||
} | |||
|
|||
// namespace port groups are only used by egress firewall and multicast for now | |||
if bnc.multicastSupport || config.OVNKubernetesFeature.EnableEgressFirewall { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, I thought about microshift to avoid extra db load (and I don't even know if those have multicast and EF disabled). But then I realized that this helps me to not rewrite all unit tests, because namespace handler would create a port group for every namespace, and all our tests check the db (might be a good reason to improve the unit test framework of course, but for now it requires a lot of changes). So I thought it doesn't make anything worse, but not a strong opinion :)
4a75f10
to
3a655ec
Compare
caught a new unit test failure :( #3977 |
dcee783
to
5027f7a
Compare
reminder that this might have been missed: |
Oh I did update the commit message here 5027f7a with |
5027f7a
to
71b4c1b
Compare
Ah, sorry I was just looking and the PR message. I think this explanation is enough to justify the change. If you have numbers for an example, that would satisfy my curiosity. But if you don't have them, that's ok. I guess it would depend on the content of the address set. |
9299909
to
d772347
Compare
for namespace, acls := range batchNsACLs { | ||
if namespace != "" { | ||
pgName := oc.getNamespacePortGroupName(namespace) | ||
// delete stale ACLs from namespaced port group | ||
if !existingEFNamespaces[namespace] { | ||
ops, err = libovsdbops.DeleteACLsFromPortGroupOps(oc.nbClient, ops, pgName, acls...) | ||
} else { | ||
// re-attach from ClusterPortGroupNameBase to namespaced port group. | ||
// port group should exist, because namespace handler will create it. | ||
ops, err = libovsdbops.AddACLsToPortGroupOps(oc.nbClient, ops, pgName, acls...) | ||
} | ||
if err != nil { | ||
return fmt.Errorf("failed to build cleanup ops: %w", err) | ||
} | ||
} | ||
// delete all EF ACLs from ClusterPortGroupNameBase | ||
ops, err = libovsdbops.DeleteACLsFromPortGroupOps(oc.nbClient, ops, | ||
oc.getClusterPortGroupName(types.ClusterPortGroupNameBase), acls...) | ||
if err != nil { | ||
return fmt.Errorf("failed to build cleanup from ClusterPortGroup ops: %w", err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have one PG for the namespace, and if that PG becomes stale because the namespace was deleted, then the PG will be deleted by the namespace sync function and thus the ACLs will be garbage collected with it, we wouldn't need to explicitly delete those ACLs from here.
You might still be doing this for the UTs, but when talking about potential massive ops I would rather handle it specifically in the UTs than in the real code. In any case it might be appropriate to add a comment.
So overall I am looking at the added startup time here because of unconditional massive ops that we might end up doing. Would it make sense to read the cluster PG, and just move the EF ACLs that are still attached to it? Ideally once this thing, that is costing us time, has been done once, we wouldn't need to do it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's almost right, but the problem is that a namespace may not only be deleted, but also just get an egress firewall deleted, in that case it won't get to existingEFNamespaces
and a cleanup is needed. That is why the most part will have to stay, but I added an extra condition for clusterPG to reduce the number of ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! My final nit would be to reorganize the method a bit. It contains several migrations from several different versions and the thing is getting big and confusing. I would probably split them out to their own methods for clarity. But I know I am pushing it now, so up to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense :) hope I didn't break anything
3710ab2
to
b4d82c2
Compare
b4d82c2
to
ee5aab5
Compare
This flake might be new
https://github.com/ovn-org/ovn-kubernetes/actions/runs/6731387973/job/18297017440?pr=3968 |
namespace address set. Existing port groups don't need any change, since port group name always was a namespace name. Set ErrNotFound to false for delete acls and ports from port group ops. Namespace handler will create, update and delete namespace port group similar to the namespace address set. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
cluster port group + address set match to reduce the number of OVS flows. Previously, all EF ACLs were applied to the clusterPortGroup, and matched source on address set. In that case, all node will have the same set of OVS flows, matching sources that are not scheduled on that node at all. When using port group as a source match, only flow for local ports will be present on every node. The port group is created, updated and deleted by namespace handler similar to the namespace address set. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
ee5aab5
to
8b61856
Compare
We already have namespace port groups that are created by multicast and updated by pod handlers. That implementation has some races because port group is created conditionally. First commit changes that to create namespace port groups similar to namespace address sets on namespace initialization and update multicast handler to use that.
Second commit updates egress firewall to use namespaced port group instead of cluster port group + address set match to reduce the number of OVS flows.
Namespace port groups can be created conditionally, but that would require namespace locking for every port group-related operation, and may be worse for performance than 1 extra port group per namespace