ANP: Add Support for Nodes as Egress Peers #4164

tssurya · 2024-02-17T23:37:08Z

- What this PR does and why is it needed
This PR adds support for node Egress peer in ANP and BANP.
API changes added as part of: kubernetes-sigs/network-policy-api@23d3882

- Special notes for reviewers

- How to verify it
unit tests and e2e's have been added (e2e's will run and merge as part of #4235)

- Description for the changelog

allow support for selecting nodes as egress peers

coveralls · 2024-02-18T15:27:03Z

coverage: 52.247% (-0.2%) from 52.442%
when pulling a06afda on tssurya:anp-node-selectors
into 7fed976 on ovn-org:master.

netlify · 2024-03-15T20:08:04Z

✅ Deploy Preview for subtle-torrone-bb0c84 ready!

Name	Link
🔨 Latest commit	`a06afda`
🔍 Latest deploy log	https://app.netlify.com/sites/subtle-torrone-bb0c84/deploys/660bd1bcbf348300086cc353
😎 Deploy Preview	https://deploy-preview-4164--subtle-torrone-bb0c84.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

jcaamano · 2024-03-25T15:23:36Z

@tssurya looks good to me.

Couple of things:
a) any criteria that I might be missing about using logging level 4 vs for example 5? default is 3 in our downstream, right?
b) wanted to bring to your attention this generic controller implementation @npinaeva worked on: https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/controller/controller.go
c) should we remove the TODOs unless they are real TODOs? They might get flagged by IDEs and be distracting.

tssurya · 2024-03-26T15:26:12Z

@tssurya looks good to me.

Couple of things: a) any criteria that I might be missing about using logging level 4 vs for example 5? default is 3 in our downstream, right?

good question, default is 4 actually in OCP. so 5 is not printed, 4 is, do you think I move some of the anp_controller logging to 5 to avoid noise?

b) wanted to bring to your attention this generic controller implementation @npinaeva worked on: https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/controller/controller.go

ah thanks Jaime for pointing me to this (I admit I got to know of that code only like 2 weeks ago when I was reviewing Peri's PR on QoS), (self-note-for-cncf-developer-docs: so these things should be brought up in upstream/team meetings more around the "templates and refactor changes that sneak in to reuse them" ; perhaps even developer docs :P "how to add a controller" section for someone starting new in this project). I am not against moving to this, its the right thing to do, but it will be a FUP if you are ok with it.

c) should we remove the TODOs unless they are real TODOs? They might get flagged by IDEs and be distracting.

yes agreed, I am starting scale testing so let me leave some of the relevant ones but delete the rest of them, I will do a new push with new commit to remove the TODOs.

jcaamano · 2024-03-27T15:02:23Z

@tssurya looks good to me.
Couple of things: a) any criteria that I might be missing about using logging level 4 vs for example 5? default is 3 in our downstream, right?

good question, default is 4 actually in OCP. so 5 is not printed, 4 is, do you think I move some of the anp_controller logging to 5 to avoid noise?

I see things like this that could potentially be level 5

klog.Infof("Deleting peerIPs %+v from address-set %s for ANP %s", ipsToRemove, as.GetName(), anpName)
...
klog.Infof("Adding peerIPs %+v to address-set %s for ANP %s", ipsToAdd, as.GetName(), anpName)
...
klog.V(4).Infof("Node %s used to match ANP %s egress rule %d peer, requeuing...", node.Name, anpCache.name, rule.priority)
...
klog.V(4).Infof("Node %s started to match ANP %s egress rule %d peer, requeuing...", node.Name, anpCache.name, rule.priority)

A lot of potential level 5 logging in the run method as well, there we would just need a start and shutdown message on level 4.

I am certainly more mindful about this things now but I am ok if you want to do a FUP

b) wanted to bring to your attention this generic controller implementation @npinaeva worked on: https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/controller/controller.go

ah thanks Jaime for pointing me to this (I admit I got to know of that code only like 2 weeks ago when I was reviewing Peri's PR on QoS), (self-note-for-cncf-developer-docs: so these things should be brought up in upstream/team meetings more around the "templates and refactor changes that sneak in to reuse them" ; perhaps even developer docs :P "how to add a controller" section for someone starting new in this project). I am not against moving to this, its the right thing to do, but it will be a FUP if you are ok with it.

that's ok

tssurya · 2024-03-27T15:39:08Z

@tssurya looks good to me.
Couple of things: a) any criteria that I might be missing about using logging level 4 vs for example 5? default is 3 in our downstream, right?

good question, default is 4 actually in OCP. so 5 is not printed, 4 is, do you think I move some of the anp_controller logging to 5 to avoid noise?

I see things like this that could potentially be level 5
klog.Infof("Deleting peerIPs %+v from address-set %s for ANP %s", ipsToRemove, as.GetName(), anpName)
...
klog.Infof("Adding peerIPs %+v to address-set %s for ANP %s", ipsToAdd, as.GetName(), anpName)
...
klog.V(4).Infof("Node %s used to match ANP %s egress rule %d peer, requeuing...", node.Name, anpCache.name, rule.priority)
...
klog.V(4).Infof("Node %s started to match ANP %s egress rule %d peer, requeuing...", node.Name, anpCache.name, rule.priority)
A lot of potential level 5 logging in the run method as well, there we would just need a start and shutdown message on level 4.

I am certainly more mindful about this things now but I am ok if you want to do a FUP

so out of all the things above I felt it might be good to keep those logs to debug things but I think you are absolutely right that this will cause a lot noise. Let me fix this in this PR itself. I will do a new commit on top so that is easier to review where I move a bunch of these to level5. I will also do a TODO-Removal commit :) ok let me repush with these two things and then we are good.

b) wanted to bring to your attention this generic controller implementation @npinaeva worked on: https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/controller/controller.go

ah thanks Jaime for pointing me to this (I admit I got to know of that code only like 2 weeks ago when I was reviewing Peri's PR on QoS), (self-note-for-cncf-developer-docs: so these things should be brought up in upstream/team meetings more around the "templates and refactor changes that sneak in to reuse them" ; perhaps even developer docs :P "how to add a controller" section for someone starting new in this project). I am not against moving to this, its the right thing to do, but it will be a FUP if you are ok with it.

that's ok

+1 for FUP here.

tssurya · 2024-03-27T16:35:32Z

the last push adds the last 2 commits, first commits have no difference.

jcaamano · 2024-03-27T18:33:29Z

I was torn about the sync logs. I would probably have left them at 4 if you have some update filtering in place before hand, but up to you.

klog.V(5).Infof("Processing sync for Admin Network Policy %s", anpName)
defer func() {
klog.V(5).Infof("Finished syncing Admin Network Policy %s : %v", anpName, time.Since(startTime))
}()

What about the intial ones

klog.Info("Setting up event handlers for Admin Network Policy")
klog.Info("Setting up event handlers for Baseline Admin Network Policy")
klog.Info("Setting up event handlers for Namespaces in Admin Network Policy controller")
klog.Info("Setting up event handlers for Pods in Admin Network Policy controller")
klog.Info("Setting up event handlers for Nodes in Admin Network Policy controller")
klog.Info("Waiting for informer caches to sync")
klog.Infof("Repairing Admin Network Policies")
klog.Info("Starting Admin Network Policy workers")
klog.Info("Starting Baseline Admin Network Policy workers")
klog.Info("Starting Namespace Admin Network Policy workers")
klog.Info("Starting Pod Admin Network Policy workers")
klog.Info("Starting Node Admin Network Policy workers")

I know they only happen once but I think they are excessive compared to other controllers and distracting towards quickly determining if everything started successfully or not. But I will leave it up to you as well.

tssurya · 2024-03-27T22:32:37Z

I was torn about the sync logs. I would probably have left them at 4 if you have some update filtering in place before hand, but up to you.
klog.V(5).Infof("Processing sync for Admin Network Policy %s", anpName)
defer func() {
klog.V(5).Infof("Finished syncing Admin Network Policy %s : %v", anpName, time.Since(startTime))
}()

yeah I thought about this and I was extremely torn about the levels, so I do have update filtering in place actually but the problem is we have a centralized queuing system. So ultimately syncAdminNetworkPolicy is called whenever relevant info for a pod (ips, labels), namespace (labels), nodes (ips,labels) + anp changes => which is not a whole lot unless the ENV is heavy pod/add/delete so that's why I moved it to 5 because I imagine ANP to span across all pods in a cluster and if a pod add/delete triggers this then we would flood too much during churn.
So these logs might just tell us which anp was getting reconciled which i guess ovsdb transact logs should help us with that info. So leaning towards keeping them at 5 itself.

What about the intial ones

klog.Info("Setting up event handlers for Admin Network Policy")
klog.Info("Setting up event handlers for Baseline Admin Network Policy")
klog.Info("Setting up event handlers for Namespaces in Admin Network Policy controller")
klog.Info("Setting up event handlers for Pods in Admin Network Policy controller")
klog.Info("Setting up event handlers for Nodes in Admin Network Policy controller")
klog.Info("Waiting for informer caches to sync")
klog.Infof("Repairing Admin Network Policies")
klog.Info("Starting Admin Network Policy workers")
klog.Info("Starting Baseline Admin Network Policy workers")
klog.Info("Starting Namespace Admin Network Policy workers")
klog.Info("Starting Pod Admin Network Policy workers")
klog.Info("Starting Node Admin Network Policy workers")

I know they only happen once but I think they are excessive compared to other controllers and distracting towards quickly determining if everything started successfully or not. But I will leave it up to you as well.

yeah i left them since they only happen once during startup, but you have a great point. I think just 1 or 2 that ensure all started up correctly can be kept at Info, rest of them I will move to level5.

tssurya · 2024-03-27T22:37:15Z

saving link for @kyrtapz : https://github.com/ovn-org/ovn-kubernetes/actions/runs/8455510894/job/23164322209?pr=4164 I see Load Balancer Service Tests with MetalLB [It] Should ensure load balancer service works with 0 node ports when ETP=local flaking but let's see if its a one-off or not

tssurya · 2024-03-27T22:38:16Z

unrelated flake: https://github.com/ovn-org/ovn-kubernetes/actions/runs/8455510894/job/23164321973?pr=4164 : #4144

tssurya · 2024-03-27T22:50:18Z

done, last push was fixing the previous comment (only last commit changed, rest was rebased on master)

jcaamano · 2024-04-01T09:54:55Z

done, last push was fixing the previous comment (only last commit changed, rest was rebased on master)

Sorry for being so nitpicky, I can't resist :P

I would keep this one at five as well

klog.Info("Setting up event handlers for Admin Network Policy")

It can be even more distracting to have one log for one type of handler and not for the rest. Any potential problems here are perfectly traceable since we return a specific error for any problem that can happen here.

tssurya · 2024-04-02T09:35:14Z

done, last push was fixing the previous comment (only last commit changed, rest was rebased on master)

Sorry for being so nitpicky, I can't resist :P

I would keep this one at five as well
klog.Info("Setting up event handlers for Admin Network Policy")
It can be even more distracting to have one log for one type of handler and not for the rest. Any potential problems here are perfectly traceable since we return a specific error for any problem that can happen here.

haha nw, aye aye captain, let me push with this fixed.

This commit adds logic for processing node to the ANP controller Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

This commit adds the plumbing required to queue the anp key when a node get's added, updated or deleted to handle node type peer. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

This commit adds support to also insert nodeIPs as peerIPs of a specific rule. The podIPs peer has been renamed to be peerIPs which is more accurate moving forward. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

1. creating an admin network policy with 3 egress rules that have node peers 2. creating another node2 that will act as peer of admin network policy; check if IP is added to address-set 3. update the labels of node2 such that it starts to match the DENY RULE peer selector; check if IPs are updated in address-sets 4. update the hostCIDR annotation of node2; check if IPs are updated in address-sets 5. update the peer selector of rule3 in admin network policy so that node1 stops matching; check if address-set is updated 6. delete node matching peer selector; check if IPs are updated in address-sets 7. update the ANP by deleting all rules; check if all objects are deleted correctly 8. delete the ANP; check if all objects are deleted correctly Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

1. creating a baseline admin network policy with 2 egress rules that have node peers 2. creating another node2 that will act as peer of baseline admin network policy; check if IP is added to address-set 3. update the labels of node2 that matches the DENY RULE peer selector; check if IPs are updated in address-sets 4. update the hostCIDR annotation of node2 that matches the peer selector; check if IPs are updated in address-sets 5. update the peer selector of rule1 in admin network policy so that node1 stops matching; check if address-set is updated 6. delete node matching peer selector; check if IPs are updated in address-sets 7. update the BANP by deleting all rules; check if all objects are deleted correctly 8. delete the BANP; check if all objects are deleted correctly Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

I was being over generous, many of this can be determined from the ovsdb transact logs so we can move most of them to level5. I moved all logs in the update path to level5, kept add/delete at level4 since those should not happen frequently. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

tssurya · 2024-04-02T09:37:16Z

I have pushed now, only change is the last commit + rebased on master

tssurya requested review from trozet, dcbw, girishmg and jcaamano as code owners February 17, 2024 23:37

tssurya added do-not-merge/hold do-not-merge/wip labels Feb 17, 2024

tssurya force-pushed the anp-node-selectors branch from 11ff8d4 to e041b7b Compare February 18, 2024 15:02

tssurya mentioned this pull request Mar 3, 2024

Add conformance tests for node and cidr selector peers kubernetes-sigs/network-policy-api#193

Merged

tssurya force-pushed the anp-node-selectors branch 2 times, most recently from d625f21 to 2bd065b Compare March 8, 2024 13:09

tssurya mentioned this pull request Mar 8, 2024

Bump network policy api version + Kubernetes libraries to 1.29 #4207

Merged

tssurya force-pushed the anp-node-selectors branch from 2bd065b to 9ca20d1 Compare March 9, 2024 16:29

tssurya added this to the v1.0.0 milestone Mar 11, 2024

tssurya mentioned this pull request Mar 12, 2024

Implement Admin Network Policy API in OVNK #4214

Closed

8 tasks

tssurya added kind/feature All issues/PRs that are new features feature/admin-network-policy labels Mar 12, 2024

tssurya mentioned this pull request Mar 13, 2024

OCPBUGS-27821: [DownstreamMerge] 3-8-24 openshift/ovn-kubernetes#2093

Merged

tssurya force-pushed the anp-node-selectors branch from 9ca20d1 to 6a41512 Compare March 15, 2024 20:07

tssurya removed do-not-merge/hold do-not-merge/wip labels Mar 15, 2024

tssurya linked an issue Mar 15, 2024 that may be closed by this pull request

Implement Admin Network Policy API in OVNK #4214

Closed

8 tasks

tssurya mentioned this pull request Mar 15, 2024

ANP: Add Support for Networks as Egress Peers #4235

Merged

tssurya force-pushed the anp-node-selectors branch 2 times, most recently from 1e7d1f1 to c011c03 Compare March 25, 2024 09:31

tssurya assigned jcaamano Mar 26, 2024

tssurya force-pushed the anp-node-selectors branch from c011c03 to 2031c6d Compare March 27, 2024 16:35

tssurya force-pushed the anp-node-selectors branch from 2031c6d to fdd1f8d Compare March 27, 2024 16:48

tssurya force-pushed the anp-node-selectors branch from fdd1f8d to 3eebbde Compare March 27, 2024 22:49

tssurya added 7 commits April 2, 2024 11:36

Add Node processing to ANP controller

b2c465a

This commit adds logic for processing node to the ANP controller Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Add syncAdminNetworkPolicyNode

81ef0a2

This commit adds the plumbing required to queue the anp key when a node get's added, updated or deleted to handle node type peer. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Add support for nodeIPs as egress peers

4788e3e

This commit adds support to also insert nodeIPs as peerIPs of a specific rule. The podIPs peer has been renamed to be peerIPs which is more accurate moving forward. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Remove unwanted TODO's

6e2dc95

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

tssurya force-pushed the anp-node-selectors branch from 3eebbde to a06afda Compare April 2, 2024 09:36

jcaamano approved these changes Apr 2, 2024

View reviewed changes

jcaamano merged commit 8a5e83b into ovn-org:master Apr 2, 2024
37 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ANP: Add Support for Nodes as Egress Peers #4164

ANP: Add Support for Nodes as Egress Peers #4164

tssurya commented Feb 17, 2024 •

edited

coveralls commented Feb 18, 2024 •

edited

netlify bot commented Mar 15, 2024 •

edited

jcaamano commented Mar 25, 2024

tssurya commented Mar 26, 2024 •

edited

jcaamano commented Mar 27, 2024

tssurya commented Mar 27, 2024

tssurya commented Mar 27, 2024

jcaamano commented Mar 27, 2024

tssurya commented Mar 27, 2024 •

edited

tssurya commented Mar 27, 2024

tssurya commented Mar 27, 2024

tssurya commented Mar 27, 2024

jcaamano commented Apr 1, 2024

tssurya commented Apr 2, 2024

tssurya commented Apr 2, 2024

ANP: Add Support for Nodes as Egress Peers #4164

ANP: Add Support for Nodes as Egress Peers #4164

Conversation

tssurya commented Feb 17, 2024 • edited

coveralls commented Feb 18, 2024 • edited

netlify bot commented Mar 15, 2024 • edited

✅ Deploy Preview for subtle-torrone-bb0c84 ready!

jcaamano commented Mar 25, 2024

tssurya commented Mar 26, 2024 • edited

jcaamano commented Mar 27, 2024

tssurya commented Mar 27, 2024

tssurya commented Mar 27, 2024

jcaamano commented Mar 27, 2024

tssurya commented Mar 27, 2024 • edited

tssurya commented Mar 27, 2024

tssurya commented Mar 27, 2024

tssurya commented Mar 27, 2024

jcaamano commented Apr 1, 2024

tssurya commented Apr 2, 2024

tssurya commented Apr 2, 2024

tssurya commented Feb 17, 2024 •

edited

coveralls commented Feb 18, 2024 •

edited

netlify bot commented Mar 15, 2024 •

edited

tssurya commented Mar 26, 2024 •

edited

tssurya commented Mar 27, 2024 •

edited