Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-network policy support #3382

Merged
merged 3 commits into from Apr 26, 2023
Merged

Conversation

cathy-zhou
Copy link
Contributor

- What this PR does and why is it needed

- Special notes for reviewers

- How to verify it

- Description for the changelog

@coveralls
Copy link

coveralls commented Jan 31, 2023

Coverage Status

Coverage: 53.335% (+0.5%) from 52.869% when pulling 8c10787 on cathy-zhou:multi-network-mr6 into 9e21b3a on ovn-org:master.

@cathy-zhou cathy-zhou force-pushed the multi-network-mr6 branch 3 times, most recently from baa390f to 9c8db39 Compare February 7, 2023 01:04
@martinkennelly martinkennelly self-assigned this Feb 13, 2023
@cathy-zhou cathy-zhou force-pushed the multi-network-mr6 branch 2 times, most recently from 8ded4d4 to c022ab6 Compare March 15, 2023 16:45
Copy link
Contributor

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits I've found until now (I couldn't finish reviewing the entire PR).

Would it make sense to split the last commit in two ? One adding the "feature", and another adding the ACL logging ? I kept getting distracted by that while reviewing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change.

Does this DeleteAddressSetsWithPredicateOps function have another caller ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used by cleanupPolicyLogicalEntities()

go-controller/pkg/ovn/base_network_controller_acl.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/base_network_controller_namespace.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jcaamano jcaamano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reviewed the first few files.
@cathy-zhou is #2740 is pre-requisite to this PR?

@npinaeva
Copy link
Member

Some notes: there is an ACL ExternalIDs change that should be merged before this pr #3334, it will make sure the indexing of the acls is correct between controllers

I think we need to also implement new indices for port groups before this pr, because otherwise syncing port groups for different controllers will be very difficult (and I guess initially we wanted to make sure that all secondary-controller owned objects will have a new set of ids) @jcaamano

I also see multicast is enabled for secondary networks in the same pr, probably it makes sense to state that in the name, and maybe even split it to at least another commit, if not another pr?

A question about controller names: I see all secondary controllers are named netInfo.GetNetworkName() + "-network-controller", is it guaranteed that these names will always be unique?

One concern I have: it is extremely simple in the current version of the code to mix the objects used by the default and secondary network controllers, which may break the functionality for both. A simple example is createMulticastAllowPolicy. If you look at the implementation, it will reference the default network controller port group from
portGroupName := hashedPortGroup(ns) in its acls, and the real port group with another network-scoped name will be created in the end with bnc.buildPortGroup(portGroupName, ns, ports, acls).
I guess there are more examples of that, and maybe if we get the new IDs for all the used objects it will help somewhat, but we need to be extra careful about properly indexing db objects.

Copy link
Contributor

@jcaamano jcaamano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not synchronizing multi-network policies on NAD events?

Should the peers of a multinet policy be those on the same network or on the same NAD or all? On pod_selector_addres_set.go we are always using the default network for pod peers. For namespace peers we always look in the same network. Something doesn't add up or I am a bit confused.

go-controller/pkg/config/config.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/base_network_controller.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/base_network_controller_acl.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/gress_policy.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/gress_policy.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/base_network_controller_policy.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/base_network_controller_policy.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/base_network_controller_policy.go Outdated Show resolved Hide resolved
@cathy-zhou
Copy link
Contributor Author

cathy-zhou commented Mar 29, 2023

@npinaeva

Some notes: there is an ACL ExternalIDs change that should be merged before this pr #3334, it will make sure the indexing of the acls is correct between controllers

sure, please let me know when your PR is merged

I think we need to also implement new indices for port groups before this pr, because otherwise syncing port groups for different controllers will be very difficult (and I guess initially we wanted to make sure that all secondary-controller owned objects will have a new set of ids) @jcaamano

why it is needed? we have network external-ids to differenciate them. it is like logical_routers/logical_switches/logical_ports

I also see multicast is enabled for secondary networks in the same pr, probably it makes sense to state that in the name, and maybe even split it to at least another commit, if not another pr?

it is not enabled for secondary networks yet.

A question about controller names: I see all secondary controllers are named netInfo.GetNetworkName() + "-network-controller", is it guaranteed that these names will always be unique?

I would think so. do you have any specific concerns for this?

One concern I have: it is extremely simple in the current version of the code to mix the objects used by the default and secondary network controllers, which may break the functionality for both. A simple example is createMulticastAllowPolicy. If you look at the implementation, it will reference the default network controller port group from
portGroupName := hashedPortGroup(ns) in its acls, and the real port group with another network-scoped name will be created in the end with bnc.buildPortGroup(portGroupName, ns, ports, acls).
I guess there are more examples of that, and maybe if we get the new IDs for all the used objects it will help somewhat, but we need to be extra careful about properly indexing db objects.

for this particular case, note that bnc.buildPortGroup() and bnc.getACLMatch takes the portgroupname without network scope prefix as the input argument, so the final port group name would be corrrect. I probably missed some place in multicast code as multicast for secondary network is not enabled yet. I will fix it to not confuse the reviewer.

as I understand, we need to index the db object only if there is no index for this object type, for example, acls. but for port groups and address-set, name is unique within the ovndb table, which is index. So I am not sure I understand the needs to introduce yet another index for these kind of objects.

@jcaamano
Copy link
Contributor

Could we add some basic e2e tests?

@npinaeva
Copy link
Member

A question about controller names: I see all secondary controllers are named netInfo.GetNetworkName() + "-network-controller", is it guaranteed that these names will always be unique?

I would think so. do you have any specific concerns for this?

Yes, we need to be absolutely sure that the names of controllers are different, because it is a part of objects db index. So let's figure out if/how it is guaranteed.

I also see multicast is enabled for secondary networks in the same pr, probably it makes sense to state that in the name, and maybe even split it to at least another commit, if not another pr?

it is not enabled for secondary networks yet.

I mean it may not be enabled as a flag, but the code for is there, e.g. this pr adds updateNamespaceForSecondaryNetwork functino, that calls multicastUpdateNamespace. Also the fact that configureNamespaceCommon calls multicastUpdateNamespace is also confusing

About referencing port groups, looks like it is correct now, I will suggest some changes to make the code more clear about that.

go-controller/pkg/ovn/base_network_controller_acl.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/base_network_controller_acl.go Outdated Show resolved Hide resolved
@@ -106,8 +116,8 @@ func getACLMatch(portGroupName, match string, aclT aclType) string {

// GetNamespaceACLLogging retrieves ACLLoggingLevels for the Namespace.
// nsInfo will be locked (and unlocked at the end) for given namespace if it exists.
func (oc *DefaultNetworkController) GetNamespaceACLLogging(ns string) *ACLLoggingLevels {
nsInfo, nsUnlock := oc.getNamespaceLocked(ns, true)
func (bnc *BaseNetworkController) GetNamespaceACLLogging(ns string) *ACLLoggingLevels {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we will be able to leave this file just named acl without base_network_controller prefix, if we move this function to base_network_controller-namespace, which I guess makes sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we now have a getACLMatch() which is also receiver function of BaseNetworkController

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but getACLMatch doesn't use bnc in any way, I expected it to be reverted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so can we now do what is suggested in first comment?

@@ -87,60 +87,63 @@ func getMcastACLName(nsORpg, mcastSuffix string) string {
// - one "to-lport" ACL allowing ingress multicast traffic to pods in 'ns'.
// This matches only traffic originated by pods in 'ns' (based on the
// namespace address set).
func (oc *DefaultNetworkController) createMulticastAllowPolicy(ns string, nsInfo *namespaceInfo) error {
func (bnc *BaseNetworkController) createMulticastAllowPolicy(ns string, nsInfo *namespaceInfo) error {
portGroupName := hashedPortGroup(ns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the port group name is not that simple now, let's move building the port group here, and then just use pg.Name to build acl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand. The portgroup name would follow the same convention as the default network port group name, why it is no longer simple now?

@@ -477,32 +488,44 @@ func getARPAllowACLName(ns string) string {
return joinACLName(ns, arpAllowPolicySuffix)
}

// Note that the hashName argument is the portGroup's Name without network scope prefix
func (ncni *NetworkControllerNetInfo) buildPortGroup(hashName, name string, ports []*nbdb.LogicalSwitchPort, acls []*nbdb.ACL) *nbdb.PortGroup {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very important part, let's make sure we have a unified set of functions that all controllers will use.
So I suggest to use ncni.GetNetworkScopedName not on the hash, but before the hash, so it has a similar format with already exisiting port groups

func (ncni *NetworkControllerNetInfo) getPortGroupName(name string) string {
	networkScopedName := ncni.GetNetworkScopedName(name)
	return hashedPortGroup(networkScopedName)
}

// Note that the hashName argument is the portGroup's Name without network scope prefix
func (ncni *NetworkControllerNetInfo) buildPortGroup(name string, ports []*nbdb.LogicalSwitchPort, acls []*nbdb.ACL) *nbdb.PortGroup {
	var externalIds map[string]string
	pgName := ncni.getPortGroupName(name)
	externalIds = map[string]string{"name": name}
	if ncni.IsSecondary() {
		externalIds[types.NetworkExternalID] = ncni.GetNetworkName()
	}
	return libovsdbops.BuildPortGroup(pgName, name, ports, acls, externalIds)
}

then controller should never hash or GetNetworkScopedName themselves, they should call getPortGroupName to get the Name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand: this is afunction that all controller use. I don't see issue with prefix after hash, but I don't have strong opinions one way or another.

The problem for us is that our downstream zone would have issues when upgrading to the latest upstream code if we change the port group name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly, spreading the details about how the port group name is setup may be error-prone, so I am suggesting to replace GetNetworkScopedName calls with getPortGroupName calls
about the hash name - I see Jaime had a similar concern #3382 (comment), we can discuss what is the easiest solution for this situation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done the same way when we determine the logical router name/port name etc. for secondary networks. Note that I am getting rid of NetworkControllerNetInfo structure now and in both gressPolicy and BaseNetworkController's receiver functions needs to call the same function to get port group name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am saying is that calling GetNetworkScopedName every time someone needs a real port group name, and adding comments like Note that the hashName argument is the portGroup's Name without network scope prefix is error-prone
So I suggest to always build network-prefixed port group name as early as possible by using getPortGroupName function, that would be the only function that knows how a port group name is built, and free other functions from that logic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we can remove all hashedPortGroup and GetNetworkScopedName for port groups, and replace them with getPortGroupName and buildPortGroup functions that only take name as an argument, without any extra hashes or prefixes
then you can also remove all comments about if portGroup name is prefixed or not, because it will always be prefixed, e.g. getACLMatch won't need to prefix port group name, it will get the final port group name version

peerV6AddressSets: &sync.Map{},
portPolicies: make([]*portPolicy, 0),
NetworkControllerNetInfo: NetworkControllerNetInfo{NetInfo: netInfo},
controllerName: controllerName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controllerName should be enough here with the new acl indexes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only when acls are also indexed with controllerName. This will be changed after your acl index PR merged.

@@ -246,9 +249,11 @@ func (gp *gressPolicy) isEmpty() bool {
// by the parent NetworkPolicy)
// buildLocalPodACLs is safe for concurrent use, since it only uses gressPolicy fields that don't change
// since creation, or are safe for concurrent use like peerVXAddressSets
// Note that portGroupName is portGroup's name without network scope prefix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make sure portGroupName is the pg.Name

Copy link
Contributor Author

@cathy-zhou cathy-zhou Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's wait for the decision if we are going to use which form of portgroup name, then we decide what needs to be changed. if it is still in the form of networkPrefix+hash, I'll keep it as is.

"k8s.io/klog/v2"
)

func (bnc *BaseNetworkController) initRetryFramework() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be 2 functions for default and secondary controller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same logic I explained above: this function is directly called from the default and secondary controllers, so you can just call the right function from there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}

// event handlers handles policy related events
type networkControllerPolicyEventHandler struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

baseNetworkPolicyEventHandler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

// newRetryFrameworkWithParameters is instead called directly by the watchers that are
// dynamically created when a network policy is added: AddressSetNamespaceAndPodSelectorType,
// PeerNamespaceSelectorType, AddressSetPodSelectorType.
func (bnc *BaseNetworkController) newRetryFrameworkWithParameters(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newNetpolRetryFramework? also comments to this struct need to be updated, let's say which events are handled by this framework


// event handlers handles policy related events
type networkControllerPolicyEventHandler struct {
baseHandler baseNetworkControllerEventHandler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess baseHandler is not really required here, because networkControllerPolicyEventHandler handles all netpol-related events, so you can just move these parts frof baseNetworkControllerEventHandler?

Copy link
Contributor

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more questions.

klog.Infof("[%s] adding namespace took %v for network %s", ns.Name, time.Since(start), bsnc.GetNetworkName())
}()

_, nsUnlock, err := bsnc.ensureNamespaceLockedForSecondaryNetwork(ns.Name, false, ns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this ensureNamespaceLockedForSecondaryNetwork method is poorly named (I would expect it hint it creates the address set for this particular namespace), and I also think it does too much.

A method should do one thing this one, at least according to its docs does:

  • locks namespacesMutex
  • gets/creates an entry for ns
  • configures OVN nsInfo (which actually means creating the address set encapsulating the ns info), and returns it with its mutex locked.

Would it be possible to break this method down into smaller blocks ?

I'm afraid it'll be hard to unit test as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a variants of ensureNamespaceLocked() function for default networks. It does the same thing except the things that does not exist in secondary networks.

}
// we are creating nsInfo and going to set it in namespaces map
// so safe to hold the lock while we create and add it
defer bsnc.namespacesMutex.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we just move this immediately below the bsnc.namespacesMutex.Lock() ?

Why have it in both the if and else clauses ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because depends on if nsInfoExisted, the unlock happens in different place. This is the same as default network ensureNamespaceLocked().

Comment on lines +54 to +62
case factory.NamespaceType:
ns, ok := obj.(*kapi.Namespace)
if !ok {
return fmt.Errorf("could not cast %T object to *kapi.Namespace", obj)
}
return bsnc.AddNamespaceForSecondaryNetwork(ns)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: in the future, for when I introduce the IPAMless ipBlock policies, we won't have to do any of this right ? i.e. there won't be any IPs to keep track of in these address sets.

I could just wrap this in an if !bsnc.doesNetworkRequireIPAM() right ?

... or I could introduce an interface, and provide a separate implementation for the ipamless scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't think this through yet. But I don't think you need namespace related handling if policy is not based on namespace/pod selector. If so, you don't need to watch for namespace at all. you could simply direct return in WatchNamespaces() without calling WatchResource()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed yesterday, I haven't considered the non-ipam case in thsi PR yet. I am adding it now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cathy-zhou feel free to keep working exclusively the IPAM case in this PR.

I can tackle the non-IPAM case as a follow-up.

I think this would help keep this PR "small", otherwise, it'll bloat even more.

Comment on lines +79 to +101
case factory.NamespaceType:
oldNs, newNs := oldObj.(*kapi.Namespace), newObj.(*kapi.Namespace)
return bsnc.updateNamespaceForSecondaryNetwork(oldNs, newNs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: this will not be needed for IPAMless scenario, right?

Comment on lines +103 to +157
case factory.NamespaceType:
ns := obj.(*kapi.Namespace)
return bsnc.deleteNamespace4SecondaryNetwork(ns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 183 to 257
// Ensure the namespace/nsInfo exists
addOps, err := bsnc.addPodToNamespaceForSecondaryNetwork(pod.Namespace, podAnnotation.IPs)
if err != nil {
return err
}
ops = append(ops, addOps...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, this is not needed for the ipamless scenario.

go-controller/pkg/ovn/base_network_controller_secondary.go Outdated Show resolved Hide resolved
@cathy-zhou cathy-zhou force-pushed the multi-network-mr6 branch 3 times, most recently from 3ea669b to 7e82300 Compare April 6, 2023 04:03
@@ -69,6 +69,11 @@ func joinACLName(substrings ...string) string {
return strings.Join(substrings, "_")
}

func buildPortGroup(hashName, name string, ports []*nbdb.LogicalSwitchPort, acls []*nbdb.ACL) *nbdb.PortGroup {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sync should never be updated, because the code only works with the previous version of ids, so I think you can just revert changes of this file and it should work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change because it does not compile

Copy link
Member

@npinaeva npinaeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another round :)

@@ -45,7 +45,7 @@ func hasResourceAnUpdateFunc(objType reflect.Type) bool {
func (h *baseNetworkControllerEventHandler) areResourcesEqual(objType reflect.Type, obj1, obj2 interface{}) (bool, error) {
// switch based on type
switch objType {
case factory.PolicyType:
case factory.PolicyType: //
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

@@ -123,6 +115,17 @@ func (h *baseNetworkControllerEventHandler) areResourcesEqual(objType reflect.Ty
case factory.NamespaceType:
// force update path for Namespace resource.
return false, nil

case factory.MultiNetworkPolicyType: //
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove //?

@@ -106,8 +116,8 @@ func getACLMatch(portGroupName, match string, aclT aclType) string {

// GetNamespaceACLLogging retrieves ACLLoggingLevels for the Namespace.
// nsInfo will be locked (and unlocked at the end) for given namespace if it exists.
func (oc *DefaultNetworkController) GetNamespaceACLLogging(ns string) *ACLLoggingLevels {
nsInfo, nsUnlock := oc.getNamespaceLocked(ns, true)
func (bnc *BaseNetworkController) GetNamespaceACLLogging(ns string) *ACLLoggingLevels {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but getACLMatch doesn't use bnc in any way, I expected it to be reverted

if err != nil {
klog.Warningf("Failed to get pods for namespace %q: %v", ns, err)
}
for _, pod := range pods {
if util.PodCompleted(pod) {
continue
}
if portInfo, err := oc.logicalPortCache.get(pod, types.DefaultNetworkName); err != nil {
portInfoMap, err := bnc.logicalPortCache.getAll(pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should only add logical ports for a specific nad, not all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should for all NADs belong to this network controller

if err != nil {
return err
}

ops, err = libovsdbops.AddACLsToPortGroupOps(oc.nbClient, ops, types.ClusterPortGroupName, acls...)
ops, err = libovsdbops.AddACLsToPortGroupOps(bnc.nbClient, ops, bnc.getPortGroupName(types.ClusterPortGroupName), acls...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getPortGroupName(...PortGroupName) doesn't read well, because it means that the argument is not a PortGroupName. So I suggest to update variable names ClusterPortGroupName and ClusterRtrPortGroupName to something like ClusterPortGroupNameBase and ClusterRtrPortGroupNameBase

Comment on lines 1040 to 1042
case factory.PolicyType:
syncFunc = h.oc.syncNetworkPolicies

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert?

return &gressPolicy{
NetInfo: netInfo,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetInfo is not used anymore?

@@ -295,7 +295,6 @@ var _ = ginkgo.Describe("OVN Namespace Operations", func() {

err = fakeOvn.controller.WatchNamespaces()
gomega.Expect(err).NotTo(gomega.HaveOccurred())
fakeOvn.asf.EventuallyExpectEmptyAddressSetExist(hostNetworkNamespace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please comment on what changed here

@@ -494,3 +429,73 @@ func newEgressServiceController(client clientset.Interface, nbClient libovsdbcli
svcFactory.Discovery().V1().EndpointSlices(),
svcFactory.Core().V1().Nodes())
}

func (oc *DefaultNetworkController) syncNetworkPolicies(networkPolicies []interface{}) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what is our common strategy, but I think it may be better to leave these netpol-specific function under base_network_controller_policy, otherwise all the default controller specific functions will flood this file, and then it is difficult to find them
wdyt @cathy-zhou @jcaamano ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I am all for moving stuff out of ovn.go :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably to default_network_controller.go? this is not for base_network_controller, so it should not belong to base_network_controller_policy.go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we had proper packaging, every file would be split into bnc/<feature_name>, default_network_controller/<feature_name> and where needed same for the secondary controllers. Now when we put the default network controller parts to the same file (ovn or default_network_controller) it may difficult to split them back when/if we change the packages structure.
I agree that it doesn't make a lot of sense to put these function under bnc_ files, so we can make default_network_controller_policy file for that purpose, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a file only for one function? I am not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it should be based on if we are planning to have the same separate file in the future under default network controller package, or if we want to leave all functions in one big file . For now e.g. I am not sure what is the difference between master.go ovn.go and default_network_controller.go, so what is the right place for default network controller netpol functions (that also includes addHairpinAllowACL)
@jcaamano any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make default_network_controller_policy with WatchNetwoerkPolicies, addHairpinAllowACL and syncNetworkPolicies.

ipv4Mode: ipv4Mode,
ipv6Mode: ipv6Mode,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is different now since we have netInfo also. Are we going to be able to get ipMode from it as @jcaamano suggested?

@cathy-zhou
Copy link
Contributor Author

@npinaeva @jcaamano again, please take a look.

continue
nadNames := bnc.getPodNADNames(pod)
for _, nadName := range nadNames {
portInfo, err := bnc.getNewLocalPolicyPortsForNAD(np, pod, nadName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess at this point it makes sense to inline getNewLocalPolicyPortsForNAD back to this function, since it is not called from anywhere else?

for _, obj := range objs {
pod := obj.(*kapi.Pod)

logicalPortName := util.GetLogicalPortName(pod.Namespace, pod.Name)
nadNames := bnc.getPodNADNames(pod)
if !bnc.IsSecondary() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just do

for _, nadName := range bnc.getPodNADNames(pod) {
			logicalPortName := bnc.GetLogicalPortName(pod, nadName)

then the rest of this function can stay as it was

if err != nil {
return err
}

// Remove old multicastDefaultDeny port group now that all ports
// have been added to the clusterPortGroup by WatchPods()
ops, err = libovsdbops.DeletePortGroupsOps(oc.nbClient, ops, legacyMulticastDefaultDenyPortGroup)
ops, err = libovsdbops.DeletePortGroupsOps(bnc.nbClient, ops, bnc.getPortGroupName(legacyMulticastDefaultDenyPortGroup))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to prefix stale port group right? since it was deleted before the networkPrefix is defined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case, it is no-op. Otherwise, secondary network controllers would be trying to delete the legacy port groups of default network, which I don't want to happen. Another way is to do it only when if !bnc.IsSecondary() , but seems something we want to avoid as much as possible?

I personally is fine with either way.

@@ -58,6 +58,10 @@ type defaultDenyPortGroups struct {
policies map[string]bool
}

func (bnc *BaseNetworkController) getPortGroupName(readableGroupName string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess at this point this function is not needed, we will have a separate function for every type of port group to get its name?

@cathy-zhou
Copy link
Contributor Author

@npinaeva I made the change, please take a look.

@@ -30,10 +29,17 @@ const (
legacyMulticastDefaultDenyPortGroup = "mcastPortGroupDeny"
)

func getACLMatchAF(ipv4Match, ipv6Match string) string {
if config.IPv4Mode && config.IPv6Mode {
func (bnc *BaseNetworkController) GetMulticastPortGroupName(base string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that should be getClusterPortGroupName? Because cluster port groups are not only used by multicast, but e.g. by egress firewall
If so, I would suggest to move it to base_network_controller, update the name and add a comment for which base names it should be used

Copy link
Member

@npinaeva npinaeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress, we are getting there :)
Added some new comments, and here are some comments I don't see addressed:
#3382 (comment)
#3382 (comment)

}
defer oc.namespacesMutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think releasing the lock that is acquired in another function may be error-prone, so maybe we can add configureNamespace func(nsInfo *namespaceInfo, ns *kapi.Namespace) as an argument to ensureNamespaceLockedCommon, and leave the internal logic the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how, note that in the two functions for default network controllers, configureNamespace() is receiver function of DefaultNetworkController. We need to somehow pass that to ensureNamespaceLockedCommon(), which is strange to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand, being a method doesn't change function signature, so you still can do

func (bnc *BaseNetworkController) ensureNamespaceLockedCommon(ns string, readOnly bool, namespace *kapi.Namespace,
	ips []net.IP, configureNamespace func(nsInfo *namespaceInfo, ns *kapi.Namespace) error) (*kapi.Namespace, *namespaceInfo, func(), error) {...}

namespace, nsInfo, unlockFunc, err := oc.ensureNamespaceLockedCommon(ns, readOnly, namespace, ips, oc.configureNamespace)

namespace, nsInfo, unlockFunc, err := bsnc.ensureNamespaceLockedCommon(ns, readOnly, namespace, ips, bsnc.configureNamespaceCommon)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. fixed. Please take a look

@@ -476,57 +284,36 @@ func (oc *DefaultNetworkController) ensureNamespaceLocked(ns string, readOnly bo
return nsInfo, unlockFunc, nil
}

func (oc *DefaultNetworkController) createNamespaceAddrSetAllPods(ns string) (addressset.AddressSet, error) {
//nolint:unused
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this lint command required?

}
nadNames := bnc.getPodNADNames(pod)
for _, nadName := range nadNames {
var logicalPortName string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logicalPortName := bnc.GetLogicalPortName(pod, nadName) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you suggesting the change the function signature of GetSecondaryNetworkLogicalPortName(), or you meant to have one single function for both default and secondary network?

for latter, the problem is that we'd still need to check if it is default network, to decide weather or not prefix the logical port name. I believe that is the reason why we agreed to have two different functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this function already exists

func (bnc *BaseNetworkController) GetLogicalPortName(pod *kapi.Pod, nadName string) string {
	if !bnc.IsSecondary() {
		return util.GetLogicalPortName(pod.Namespace, pod.Name)
	} else {
		return util.GetSecondaryNetworkLogicalPortName(pod.Namespace, pod.Name, nadName)
	}
}

and seems to do exactly the same thing, so I just suggest using it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I didn't even remember we have this function. thanks!

// if there are problems with fetching port info from logicalPortCache, pod will be added to errObjs.
func (oc *DefaultNetworkController) getExistingLocalPolicyPorts(np *networkPolicy,
objs ...interface{}) (policyPortsToUUIDs map[string]string, policyPortUUIDs []string, errObjs []interface{}) {
// if there are problems with fetching port info from logicalPortCache, error will be added to return error array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like errors are not returned anymore, probably we can delete that comment line

Comment on lines 583 to 587
var logicalPortName string
if !bnc.IsSecondary() {
logicalPortName = util.GetLogicalPortName(pod.Namespace, pod.Name)
} else {
logicalPortName = util.GetSecondaryNetworkLogicalPortName(pod.Namespace, pod.Name, nadName)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logicalPortName := bnc.GetLogicalPortName(pod, nadName) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -52,7 +52,7 @@ func setIpMode(m ipMode) {
config.IPv6Mode = m.IPv6Mode
}

func getMulticastDefaultExpectedData(clusterPortGroup, clusterRtrPortGroup *nbdb.PortGroup) []libovsdb.TestData {
func getMulticastDefaultExpectedData(fakeController *DefaultNetworkController, clusterPortGroup, clusterRtrPortGroup *nbdb.PortGroup) []libovsdb.TestData {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fakeController unused?

@@ -135,25 +135,25 @@ func getMulticastDefaultExpectedData(clusterPortGroup, clusterRtrPortGroup *nbdb
}
}

func getMulticastDefaultStaleData(clusterPortGroup, clusterRtrPortGroup *nbdb.PortGroup) []libovsdb.TestData {
testData := getMulticastDefaultExpectedData(clusterPortGroup, clusterRtrPortGroup)
func getMulticastDefaultStaleData(fakeController *DefaultNetworkController, clusterPortGroup, clusterRtrPortGroup *nbdb.PortGroup) []libovsdb.TestData {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fakeController will be unused, if getMulticastDefaultExpectedData is updated?

pg_hash := getMulticastPortGroupName(ns)
egressMatch := getACLMatch(pg_hash, getMulticastACLEgrMatch(), aclEgress)
func getMulticastPolicyExpectedData(fakeController *DefaultNetworkController, ns string, ports []string) []libovsdb.TestData {
pg_hash := fakeController.getMulticastPortGroupName(ns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually use
fakeController = getFakeController(DefaultNetworkControllerName) right here in the function to avoid adding arguments (of course in case the controller type is predefined)

@cathy-zhou cathy-zhou force-pushed the multi-network-mr6 branch 2 times, most recently from 47a46d9 to 0697af0 Compare April 20, 2023 17:38
Copy link
Member

@npinaeva npinaeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the previous comments look addressed, just a couple more for the unit test commit.
It's great to have a unit test with both multi and regular netpols, thanks for implementing that @cathy-zhou !

@@ -40,17 +40,6 @@ const (
arpAllowPolicySuffix = "ARPallowPolicy"
)

func getFakeController(controllerName string) *DefaultNetworkController {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say we can leave both getFakeController and getFakeBaseController, so that function that used DefaultNetworkController can keep using it, and multinet-tests can use getFakeBaseController?

@@ -101,7 +98,7 @@ func getDefaultDenyData(networkPolicy *knet.NetworkPolicy, ports []string,
"apply-after-lb": "true",
},
)
egressDenyACL.UUID = *egressDenyACL.Name + "-egressDenyACL-UUID"
egressDenyACL.UUID = netInfo.GetNetworkScopedName(*egressDenyACL.Name + "-egressDenyACL-UUID")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be a good time to switch to
UUID = aclIDs.String()
it should be unique, so hopefully we won't need to change it again

@@ -67,8 +67,8 @@ func newNamespace(namespace string) *v1.Namespace {
}
}

func getNsAddrSetHashNames(ns string) (string, string) {
return addressset.GetHashNamesForAS(getNamespaceAddrSetDbIDs(ns, DefaultNetworkControllerName))
func getNsAddrSetHashNames(ns, controllerName string) (string, string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say let's leave this function as is, and make a new one like getMultinetNsAddrSetHashNames(ns, controllerName string), and it will have only 1 usage from getGressACLs, and will reduce the number of required changes

@@ -186,11 +186,11 @@ func getDefaultPortGroups() (clusterPortGroup, clusterRtrPortGroup *nbdb.PortGro
}

func getMulticastPolicyExpectedData(ns string, ports []string) []libovsdb.TestData {
fakeController := getFakeController(DefaultNetworkControllerName)
fakeController := getFakeBaseController(&util.DefaultNetInfo{}, &util.DefaultNetConfInfo{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we get getFakeController back, this may be reverted

startOvn(initialDB, []v1.Namespace{namespace1, namespace2}, nil, nil,
[]nettypes.NetworkAttachmentDefinition{*nad}, nil, nil)

ocInfo, ok := fakeOvn.secondaryControllers["network1"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make "network1" constant and use in nad definition too?

Comment on lines 365 to 367
netconf, err := util.ParseNetConf(nad)
gomega.Expect(err).NotTo(gomega.HaveOccurred())
netInfo := util.NewNetInfo(netconf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can make netconf and netInfo variables, since they only depend on nad and used in all tests?

Comment on lines 372 to 376
ocInfo, ok := fakeOvn.secondaryControllers["network1"]
gomega.Expect(ok).To(gomega.Equal(true))
asf := ocInfo.asf
gomega.Expect(asf).NotTo(gomega.Equal(nil))
gomega.Expect(asf.ControllerName).To(gomega.Equal("network1-network-controller"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can make it a part of startOvn, so it checks expected secondary controllers are created based on given nads?

NetConfInfo: netConfInfo,
}
}

func getDefaultDenyData(networkPolicy *knet.NetworkPolicy, ports []string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we merge #3540 soon, we will be able to change the signature of this function, then pass default network args for existing wrappers, and create new functions like getDefaultDenyDataMultinet and getPolicyDataMultinet.
In that case calls in this file won't need to be changed

if err != nil {
return err
}

// Remove old multicastDefaultDeny port group now that all ports
// have been added to the clusterPortGroup by WatchPods()
ops, err = libovsdbops.DeletePortGroupsOps(oc.nbClient, ops, legacyMulticastDefaultDenyPortGroup)
ops, err = libovsdbops.DeletePortGroupsOps(bnc.nbClient, ops, bnc.getClusterPortGroupName(legacyMulticastDefaultDenyPortGroup))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe in that case it makes sense to do

if !bnc.IsSecondary() {
		ops, err = libovsdbops.DeletePortGroupsOps(bnc.nbClient, ops, legacyMulticastDefaultDenyPortGroup)
		if err != nil {
			return err
		}
	}

to avoid confusion with prefixing

@cathy-zhou
Copy link
Contributor Author

/retest

…oning

Also updated the OVNK Kube client to feature a multi-policy client.

Signed-off-by: Yun Zhou <yunz@nvidia.com>
Signed-off-by: Yun Zhou <yunz@nvidia.com>
Copy link
Member

@npinaeva npinaeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last nit, otherwise
/lgtm
thanks @cathy-zhou!

@@ -130,7 +130,7 @@ var _ = ginkgo.Describe("OVN EgressFirewall Operations", func() {
config.Gateway.Mode = gwMode
app.Action = func(ctx *cli.Context) error {
// owned by non-existing namespace
fakeController := getFakeController(DefaultNetworkControllerName)
fakeController := &DefaultNetworkController{BaseNetworkController: *getFakeBaseController(&util.DefaultNetInfo{}, &util.DefaultNetConfInfo{})}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. thanks!

Signed-off-by: Yun Zhou <yunz@nvidia.com>
@jcaamano jcaamano merged commit b24e304 into ovn-org:master Apr 26, 2023
22 checks passed
@tssurya tssurya added this to the v1.0.0 milestone Mar 11, 2024
@tssurya tssurya added network-policy Issues related to network policies (primary and secondary networks) multi-networks Issues related to secondary networks, L3, L2, localnet kind/feature All issues/PRs that are new features labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature All issues/PRs that are new features multi-networks Issues related to secondary networks, L3, L2, localnet network-policy Issues related to network policies (primary and secondary networks)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants