Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] ANP (Pre)Merge Testing #1761

Closed
wants to merge 24 commits into from
Closed

Conversation

tssurya
Copy link
Contributor

@tssurya tssurya commented Jul 15, 2023

- What this PR does and why is it needed

- Special notes for reviewers

- How to verify it

- Description for the changelog

@tssurya tssurya requested a review from dcbw as a code owner July 15, 2023 22:00
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 15, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tssurya
Once this PR has been reviewed and has the lgtm label, please assign danwinship for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 26, 2023
This commit adds a new flag called
`admin-network-policy-enable` which can be used
to feature gate Admin Network Policy feature.
This will be enabled by default on KIND clusters.
If that is not desired then we might want to
expose this option also via CLI in kind to help
users turn it off. But as of now, like other
features this knob is not user facing.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This PR adds the plumbing required to curl the CRDs
from a released upstream version of ANP and BANP.
It also installs these into the KIND cluster.
We also provide the necessary permissions that are
required to access the new CRDs in ovn-setup.yaml.j2.

NOTE: We curl the CRDs into the dist/templates
directory and maintain a copy of the version OVNK
uses so that it is consistent with what the other
features do. It will also help us see the version
OVNK is using versus the upstream version

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds the preparation bits needed to be
used by the ANP controller in the following commit.
It also provides sufficient permissions to the
systemaccount to list/watch/patch the new CRDs
accordingly.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds a level driven controller using
workqueues and shared informers inspired off the
documentation and recommendations done by sig-api-machinery.
This follows suit to existing level driven controllers like
services, egressQoS, egressSVC, apbRoute.

It watches for ANP, BANP, Namespaces and Pods.

Note that in order for easier reviews, this commit doesn't
add the full fledged sync functions yet, they will be added
in the next commit. We can squash these in the end.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
go get <pkg>
go mod tidy
go mod vendor

brings in the clientset gen vendoring for ANP&BANP

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds a new utility called
NewAddressSetOps. This unlike NewAddressSet doesn't
create the address-sets but returns the ops used to
create the address-sets.
This commit also changes NewAddressSet and EnsureAddressSet
to use this ops function and then transact the ops in the
end.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commits adds a utility to update
ACLs on existing portgroups

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This PR adds the main logic of our
central ANP sync function. This will
be the brain behind deciding what to
reconcile, how to reconcile, when to
reconcile. So review with care.

Currently we are being lame and doing
a central lock on controller. This will
be inefficient and we can improve on this
in the future. First step is to write a
race-less controller.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds the logic for syncing
namespaces upon their label updates.
It involves comparing with anpCaches
to see if this namespace was managed
by a policy and if so requeuing the anp
key back to the main queue.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds the logic for syncing
pods upon their label, ip, state updates.
It involves comparing with anpCaches
to see if this pod was managed
by a policy and if so requeuing the anp
key back to the main queue.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This PR adds the main logic of our
central BANP sync function. This will
be the brain behind deciding what to
reconcile, how to reconcile, when to
reconcile. So review with care specially
the locks.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit modifies syncAdminNetworkPolicyNamespace to
also add support for BANP changes. So if the namespace
stopped or started matching BANP we take the appropriate
actions of adding it back to banpQueue for central
loop to take action

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit modifies syncAdminNetworkPolicyPod to
also add support for BANP changes. So if the pod
stopped or started matching BANP we take the appropriate
actions of adding it back to banpQueue for central
loop to take action

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Most of this feature has no dependency on IC
or multi-zones or single zones or anything. It is
unfortunately one of those features that won't benefit
scale wise from using IC mode.

However the two parts we care about for which we need to
introduce the concept of zones are:

a) adding ports to port groups - we need to do this
only for local pods
b) status update support - comes in upcoming commits

The commit also takes care of passing the feature flag
into the IC deployment mode.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds support for updating the ANP
CRD with the configured status so that users/admins
can know what's the state of their ANP without
having to look at logs or DBs to know what went
wrong.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds support for updating the BANP
CRD with the configured status so that users/admins
can know what's the state of their BANP without
having to look at logs or DBs to know what went
wrong.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds an elaborate test that tries to
ensure all plausible perms/combs work for ANP
and its interaction with pods/namespaces.

Steps:
1. creating an admin network policy with 0 rules; check if port group is created for the subject
2. creating a pod that will act as subject of admin network policy; check if lsp is added to port-group after retries
3. update the ANP by adding one ingress rule; check if acl is created and added on the port-group
4. creating a pod that will act as peer of admin network policy; check if IP is added to address-set after retries
5. update the ANP by adding two more ingress rules with allow and pass actions; check if AS-es are created + acls are created and added on the port-group
6. update the ANP by adding three egress rules with deny, allow and pass
   actions; check if acls are created and added on the port-group
7. update the labels of the pod that matches the peer selector; check if IPs are updated in address-sets
8. update the labels of the namespace that matches the peer selector; check if IPs are updated in address-sets
9. update the subject of admin network policy so that subject pod stops matching; check if port group is updated
10. update the resource version and labels of the pod that matches the new subject selector; check if port group is updated
11. update the labels of the namespace that matches the subject selector to stop matching; check if port group is updated
12. update the labels of the namespace that matches the subject selector to start matching; check if port group is updated
13. update the ANP by changing its priority and deleting 1 ingress rule & 1 egress rule; check if all objects are re-created correctly
14. delete pod matching peer selector; check if IPs are updated in address-sets
15. update the subject pod to go into completed state; check if port group and address-set is updated
16. delete the subject and peer selected namespaces; check if port group and address-set's are updated
17. update the ANP by deleting all rules; check if all objects are deleted correctly
18. delete the ANP; check if all objects are deleted correctly

I have also added two unit tests to test OVNK specific
constraints around anp.Spec.Priority field

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds an elaborate test that tries to
ensure all plausible perms/combs work for BANP
and its interaction with pods/namespaces.

Steps:
1. creating a baseline admin network policy with 0 rules; check if port group is created for the subject
2. creating a pod that will act as subject of baseline admin network policy; check if lsp is added to port-group after retries
3. update the BANP by adding one ingress rule; check if acl is created and added on the port-group
4. creating a pod that will act as peer of baseline admin network policy; check if pod ip is added to address-set
5. update the BANP by adding two more ingress rules with allow and deny actions; check if acls are created and added on the port-group
6. update the BANP by adding three egress rules with deny, allow and deny actions; check if acls are created and added on the port-group
7. update the labels of the pod that matches the peer selector; check if IPs are updated in address-sets
8. update the labels of the namespace that matches the peer selector; check if IPs are updated in address-sets
9. update the subject of baseline admin network policy so that subject pod stops matching; check if port group is updated
10. update the resource version and labels of the pod that matches the new subject selector; check if port group is updated
11. update the labels of the namespace that matches the subject selector to stop matching; check if port group is updated
12. update the labels of the namespace that matches the subject selector to start matching; check if port group is updated
13. update the BANP by deleting 1 ingress rule & 1 egress rule; check if all objects are re-created correctly
14. delete pod matching peer selector; check if IPs are updated in address-sets
15. update the subject pod to go into completed state; check if port group is updated
16. delete the subject and peer selected namespaces; check if port group and address-set's are updated
17. update the BANP by deleting all rules; check if all objects are re-created correctly
18. delete the BANP; check if all objects are re-created correctly
19. create new BANP with same name; check if all objects are re-created correctly; test's cache is cleared and then repopulated properly

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
The actual tests are defined upstream,
but this commit ensures we can
consume them from upstream and run it
middlestream before bringing it downstream.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit adds basic repair functionality
that focusses on removing stale items from
the database by considering k8s as the source
of truth.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Create a new folder for documenting network
policies. Move the docs we have for networkpolicies
into that folder.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 5, 2023
@martinkennelly
Copy link
Contributor

martinkennelly commented Aug 6, 2023

Surya, for your last required failing job e2e-vsphere-windows, I have been watching my DS PR and I saw the same failure:

level=fatal msg=failed to fetch Terraform Variables: failed to generate asset "Terraform Variables": failed to get vsphere Terraform variables: failed to use cached vsphere image: Checksum mismatch for /tmp/.cache/openshift-installer/image_cache/rhcos-414.92.202307070025-0-vmware.x86_64.ova; expected=101330e9bdb6eb1730d1684a71cc00ada0b6ccf6a60eaa7223b6d5a3f2c49365 found=e90406b91d7f8c7435fed546e1a1102c1ac08c9bb5cacb746655bb1cb286ab2b

Its a flake for sure.

/test e2e-vsphere-windows

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 12, 2023
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

openshift-ci bot commented Feb 24, 2024

@tssurya: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-live-migration-sdn-ovn 89c7706 link true /test e2e-aws-live-migration-sdn-ovn
ci/prow/4.16-upgrade-from-stable-4.15-e2e-gcp-ovn-rt-upgrade 89c7706 link true /test 4.16-upgrade-from-stable-4.15-e2e-gcp-ovn-rt-upgrade
ci/prow/e2e-azure-ovn-upgrade 89c7706 link true /test e2e-azure-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@tssurya tssurya closed this Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants