Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANP: Add Support for Nodes as Egress Peers #4164

Merged
merged 7 commits into from
Apr 2, 2024

Conversation

tssurya
Copy link
Member

@tssurya tssurya commented Feb 17, 2024

- What this PR does and why is it needed
This PR adds support for node Egress peer in ANP and BANP.
API changes added as part of: kubernetes-sigs/network-policy-api@23d3882

- Special notes for reviewers

- How to verify it
unit tests and e2e's have been added (e2e's will run and merge as part of #4235)

- Description for the changelog

allow support for selecting nodes as egress peers

@coveralls
Copy link

coveralls commented Feb 18, 2024

Coverage Status

coverage: 52.247% (-0.2%) from 52.442%
when pulling a06afda on tssurya:anp-node-selectors
into 7fed976 on ovn-org:master.

Copy link

netlify bot commented Mar 15, 2024

Deploy Preview for subtle-torrone-bb0c84 ready!

Name Link
🔨 Latest commit a06afda
🔍 Latest deploy log https://app.netlify.com/sites/subtle-torrone-bb0c84/deploys/660bd1bcbf348300086cc353
😎 Deploy Preview https://deploy-preview-4164--subtle-torrone-bb0c84.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@tssurya tssurya linked an issue Mar 15, 2024 that may be closed by this pull request
8 tasks
@tssurya tssurya force-pushed the anp-node-selectors branch 2 times, most recently from 1e7d1f1 to c011c03 Compare March 25, 2024 09:31
@jcaamano
Copy link
Contributor

@tssurya looks good to me.

Couple of things:
a) any criteria that I might be missing about using logging level 4 vs for example 5? default is 3 in our downstream, right?
b) wanted to bring to your attention this generic controller implementation @npinaeva worked on: https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/controller/controller.go
c) should we remove the TODOs unless they are real TODOs? They might get flagged by IDEs and be distracting.

@tssurya
Copy link
Member Author

tssurya commented Mar 26, 2024

@tssurya looks good to me.

Couple of things: a) any criteria that I might be missing about using logging level 4 vs for example 5? default is 3 in our downstream, right?

good question, default is 4 actually in OCP. so 5 is not printed, 4 is, do you think I move some of the anp_controller logging to 5 to avoid noise?

b) wanted to bring to your attention this generic controller implementation @npinaeva worked on: https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/controller/controller.go

ah thanks Jaime for pointing me to this (I admit I got to know of that code only like 2 weeks ago when I was reviewing Peri's PR on QoS), (self-note-for-cncf-developer-docs: so these things should be brought up in upstream/team meetings more around the "templates and refactor changes that sneak in to reuse them" ; perhaps even developer docs :P "how to add a controller" section for someone starting new in this project). I am not against moving to this, its the right thing to do, but it will be a FUP if you are ok with it.

c) should we remove the TODOs unless they are real TODOs? They might get flagged by IDEs and be distracting.

yes agreed, I am starting scale testing so let me leave some of the relevant ones but delete the rest of them, I will do a new push with new commit to remove the TODOs.

@jcaamano
Copy link
Contributor

@tssurya looks good to me.
Couple of things: a) any criteria that I might be missing about using logging level 4 vs for example 5? default is 3 in our downstream, right?

good question, default is 4 actually in OCP. so 5 is not printed, 4 is, do you think I move some of the anp_controller logging to 5 to avoid noise?

I see things like this that could potentially be level 5

klog.Infof("Deleting peerIPs %+v from address-set %s for ANP %s", ipsToRemove, as.GetName(), anpName)
...
klog.Infof("Adding peerIPs %+v to address-set %s for ANP %s", ipsToAdd, as.GetName(), anpName)
...
klog.V(4).Infof("Node %s used to match ANP %s egress rule %d peer, requeuing...", node.Name, anpCache.name, rule.priority)
...
klog.V(4).Infof("Node %s started to match ANP %s egress rule %d peer, requeuing...", node.Name, anpCache.name, rule.priority)

A lot of potential level 5 logging in the run method as well, there we would just need a start and shutdown message on level 4.

I am certainly more mindful about this things now but I am ok if you want to do a FUP

b) wanted to bring to your attention this generic controller implementation @npinaeva worked on: https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/controller/controller.go

ah thanks Jaime for pointing me to this (I admit I got to know of that code only like 2 weeks ago when I was reviewing Peri's PR on QoS), (self-note-for-cncf-developer-docs: so these things should be brought up in upstream/team meetings more around the "templates and refactor changes that sneak in to reuse them" ; perhaps even developer docs :P "how to add a controller" section for someone starting new in this project). I am not against moving to this, its the right thing to do, but it will be a FUP if you are ok with it.

that's ok

@tssurya
Copy link
Member Author

tssurya commented Mar 27, 2024

@tssurya looks good to me.
Couple of things: a) any criteria that I might be missing about using logging level 4 vs for example 5? default is 3 in our downstream, right?

good question, default is 4 actually in OCP. so 5 is not printed, 4 is, do you think I move some of the anp_controller logging to 5 to avoid noise?

I see things like this that could potentially be level 5

klog.Infof("Deleting peerIPs %+v from address-set %s for ANP %s", ipsToRemove, as.GetName(), anpName)
...
klog.Infof("Adding peerIPs %+v to address-set %s for ANP %s", ipsToAdd, as.GetName(), anpName)
...
klog.V(4).Infof("Node %s used to match ANP %s egress rule %d peer, requeuing...", node.Name, anpCache.name, rule.priority)
...
klog.V(4).Infof("Node %s started to match ANP %s egress rule %d peer, requeuing...", node.Name, anpCache.name, rule.priority)

A lot of potential level 5 logging in the run method as well, there we would just need a start and shutdown message on level 4.

I am certainly more mindful about this things now but I am ok if you want to do a FUP

so out of all the things above I felt it might be good to keep those logs to debug things but I think you are absolutely right that this will cause a lot noise. Let me fix this in this PR itself. I will do a new commit on top so that is easier to review where I move a bunch of these to level5. I will also do a TODO-Removal commit :) ok let me repush with these two things and then we are good.

b) wanted to bring to your attention this generic controller implementation @npinaeva worked on: https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/controller/controller.go

ah thanks Jaime for pointing me to this (I admit I got to know of that code only like 2 weeks ago when I was reviewing Peri's PR on QoS), (self-note-for-cncf-developer-docs: so these things should be brought up in upstream/team meetings more around the "templates and refactor changes that sneak in to reuse them" ; perhaps even developer docs :P "how to add a controller" section for someone starting new in this project). I am not against moving to this, its the right thing to do, but it will be a FUP if you are ok with it.

that's ok

+1 for FUP here.

@tssurya
Copy link
Member Author

tssurya commented Mar 27, 2024

the last push adds the last 2 commits, first commits have no difference.

@jcaamano
Copy link
Contributor

I was torn about the sync logs. I would probably have left them at 4 if you have some update filtering in place before hand, but up to you.

klog.V(5).Infof("Processing sync for Admin Network Policy %s", anpName)
defer func() {
klog.V(5).Infof("Finished syncing Admin Network Policy %s : %v", anpName, time.Since(startTime))
}()

What about the intial ones

klog.Info("Setting up event handlers for Admin Network Policy")
klog.Info("Setting up event handlers for Baseline Admin Network Policy")
klog.Info("Setting up event handlers for Namespaces in Admin Network Policy controller")
klog.Info("Setting up event handlers for Pods in Admin Network Policy controller")
klog.Info("Setting up event handlers for Nodes in Admin Network Policy controller")
klog.Info("Waiting for informer caches to sync")
klog.Infof("Repairing Admin Network Policies")
klog.Info("Starting Admin Network Policy workers")
klog.Info("Starting Baseline Admin Network Policy workers")
klog.Info("Starting Namespace Admin Network Policy workers")
klog.Info("Starting Pod Admin Network Policy workers")
klog.Info("Starting Node Admin Network Policy workers")

I know they only happen once but I think they are excessive compared to other controllers and distracting towards quickly determining if everything started successfully or not. But I will leave it up to you as well.

@tssurya
Copy link
Member Author

tssurya commented Mar 27, 2024

I was torn about the sync logs. I would probably have left them at 4 if you have some update filtering in place before hand, but up to you.

klog.V(5).Infof("Processing sync for Admin Network Policy %s", anpName)
defer func() {
klog.V(5).Infof("Finished syncing Admin Network Policy %s : %v", anpName, time.Since(startTime))
}()

yeah I thought about this and I was extremely torn about the levels, so I do have update filtering in place actually but the problem is we have a centralized queuing system. So ultimately syncAdminNetworkPolicy is called whenever relevant info for a pod (ips, labels), namespace (labels), nodes (ips,labels) + anp changes => which is not a whole lot unless the ENV is heavy pod/add/delete so that's why I moved it to 5 because I imagine ANP to span across all pods in a cluster and if a pod add/delete triggers this then we would flood too much during churn.
So these logs might just tell us which anp was getting reconciled which i guess ovsdb transact logs should help us with that info. So leaning towards keeping them at 5 itself.

What about the intial ones

klog.Info("Setting up event handlers for Admin Network Policy")
klog.Info("Setting up event handlers for Baseline Admin Network Policy")
klog.Info("Setting up event handlers for Namespaces in Admin Network Policy controller")
klog.Info("Setting up event handlers for Pods in Admin Network Policy controller")
klog.Info("Setting up event handlers for Nodes in Admin Network Policy controller")
klog.Info("Waiting for informer caches to sync")
klog.Infof("Repairing Admin Network Policies")
klog.Info("Starting Admin Network Policy workers")
klog.Info("Starting Baseline Admin Network Policy workers")
klog.Info("Starting Namespace Admin Network Policy workers")
klog.Info("Starting Pod Admin Network Policy workers")
klog.Info("Starting Node Admin Network Policy workers")

I know they only happen once but I think they are excessive compared to other controllers and distracting towards quickly determining if everything started successfully or not. But I will leave it up to you as well.

yeah i left them since they only happen once during startup, but you have a great point. I think just 1 or 2 that ensure all started up correctly can be kept at Info, rest of them I will move to level5.

@tssurya
Copy link
Member Author

tssurya commented Mar 27, 2024

saving link for @kyrtapz : https://github.com/ovn-org/ovn-kubernetes/actions/runs/8455510894/job/23164322209?pr=4164 I see Load Balancer Service Tests with MetalLB [It] Should ensure load balancer service works with 0 node ports when ETP=local flaking but let's see if its a one-off or not

@tssurya
Copy link
Member Author

tssurya commented Mar 27, 2024

@tssurya
Copy link
Member Author

tssurya commented Mar 27, 2024

done, last push was fixing the previous comment (only last commit changed, rest was rebased on master)

@jcaamano
Copy link
Contributor

jcaamano commented Apr 1, 2024

done, last push was fixing the previous comment (only last commit changed, rest was rebased on master)

Sorry for being so nitpicky, I can't resist :P

I would keep this one at five as well

klog.Info("Setting up event handlers for Admin Network Policy")

It can be even more distracting to have one log for one type of handler and not for the rest. Any potential problems here are perfectly traceable since we return a specific error for any problem that can happen here.

@tssurya
Copy link
Member Author

tssurya commented Apr 2, 2024

done, last push was fixing the previous comment (only last commit changed, rest was rebased on master)

Sorry for being so nitpicky, I can't resist :P

I would keep this one at five as well

klog.Info("Setting up event handlers for Admin Network Policy")

It can be even more distracting to have one log for one type of handler and not for the rest. Any potential problems here are perfectly traceable since we return a specific error for any problem that can happen here.

haha nw, aye aye captain, let me push with this fixed.

This commit adds logic for processing
node to the ANP controller

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds the plumbing required
to queue the anp key when a node get's
added, updated or deleted to handle node
type peer.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds support to also insert nodeIPs
as peerIPs of a specific rule. The podIPs peer has
been renamed to be peerIPs which is more accurate
moving forward.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
1. creating an admin network policy with 3 egress rules that have node peers
2. creating another node2 that will act as peer of admin network policy; check if IP is added to address-set
3. update the labels of node2 such that it starts to match the DENY RULE peer selector; check if IPs are updated in address-sets
4. update the hostCIDR annotation of node2; check if IPs are updated in address-sets
5. update the peer selector of rule3 in admin network policy so that node1 stops matching; check if address-set is updated
6. delete node matching peer selector; check if IPs are updated in address-sets
7. update the ANP by deleting all rules; check if all objects are deleted correctly
8. delete the ANP; check if all objects are deleted correctly

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
1. creating a baseline admin network policy with 2 egress rules that have node peers
2. creating another node2 that will act as peer of baseline admin network policy; check if IP is added to address-set
3. update the labels of node2 that matches the DENY RULE peer selector; check if IPs are updated in address-sets
4. update the hostCIDR annotation of node2 that matches the peer selector; check if IPs are updated in address-sets
5. update the peer selector of rule1 in admin network policy so that node1 stops matching; check if address-set is updated
6. delete node matching peer selector; check if IPs are updated in address-sets
7. update the BANP by deleting all rules; check if all objects are deleted correctly
8. delete the BANP; check if all objects are deleted correctly

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
I was being over generous, many of this can be determined
from the ovsdb transact logs so we can move most of them
to level5.
I moved all logs in the update path to level5, kept
add/delete at level4 since those should not happen frequently.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
@tssurya
Copy link
Member Author

tssurya commented Apr 2, 2024

I have pushed now, only change is the last commit + rebased on master

@jcaamano jcaamano merged commit 8a5e83b into ovn-org:master Apr 2, 2024
37 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/admin-network-policy kind/feature All issues/PRs that are new features
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Implement Admin Network Policy API in OVNK
3 participants