New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DownstreamMerge] 9-23-22 b - dualstack fixed #1289
[DownstreamMerge] 9-23-22 b - dualstack fixed #1289
Conversation
- Add design doc which depicts the current implementation in OVN-Kubernetes - Add DPU section in ovs_offload docs Note: Currently only a general overview is provided in the in-project documents plus links to google doc. This is a first stage to have some initial documentation in the project. Later on these docs should be converted to Markdown and be part of the project. Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
NetworkPolicyNotCreated error only returned from updateACLLoggingForPolicy when network policy that is passed as an argument have np.created = false. updateACLLoggingForPolicy called from 2 places: func (oc *Controller) setNetworkPolicyACLLoggingForNamespace(ns string, nsInfo *namespaceInfo) error { ... for name, policy := range nsInfo.networkPolicies { } ... } func (oc *Controller) addNetworkPolicy(policy *knet.NetworkPolicy) error { ... if err := oc.createNetworkPolicy(np, policy, aclLogDeny, aclLogAllow,portGroupIngressDenyName, portGroupEgressDenyName); ... nsInfo.networkPolicies[policy.Name] = np if err := oc.updateACLLoggingForPolicy(np, nsInfo.aclLogging.Allow); err != nil { ... } np.created is set to true in createNetworkPolicy, and createNetworkPolicy is only called from addNetworkPolicy. To make sure that np.created = true for every updateACLLoggingForPolicy call, we need to check that 1. every policy from nsInfo.networkPolicies has .created = true, which is true since network policy is added to nsInfo.networkPolicies only when createNetworkPolicy successfully completed 2. addNetworkPolicy only calls updateACLLoggingForPolicy after createNetworkPolicy successfully completed, which i=s also true, since error will be returned earlier if createNetworkPolicy fails Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Commit 16ea40b changed the logical switch other-config to be overwriten this needs to be changed so that the hostsubnet values are appended. Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
Signed-off-by: Yun Zhou <yunz@nvidia.com>
GetConfigDurationRecorder is called from two paths. 1) nodeTracker detects a switch or GR has been added for the node and does a full RequestFullSync which calls onServiceAdd 2) onServiceAdd is called from the service handler for adds. If we don't lock the cdr variable we access it from two routines while one routine edits it; the other tries to read it for example. Let's lock this variable to prevent races. 2022-09-09T14:33:01.6982688Z WARNING: DATA RACE 2022-09-09T14:33:01.6982976Z Read at 0x000004402a78 by goroutine 1044: 2022-09-09T14:33:01.6983629Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/metrics.GetConfigDurationRecorder() 2022-09-09T14:33:01.6984387Z /home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/metrics/master.go:847 +0x29b 2022-09-09T14:33:01.6985239Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn/controller/services.(*Controller).onServiceAdd() 2022-09-09T14:33:01.6986069Z /home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/ovn/controller/services/services_controller.go:366 +0x33f 2022-09-09T14:33:01.6986732Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn/controller/services.(*Controller).RequestFullSync() 2022-09-09T14:33:01.6988638Z /home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/ovn/controller/services/services_controller.go:351 +0x2b2 2022-09-09T14:33:01.6989314Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn/controller/services.(*Controller).RequestFullSync-fm() 2022-09-09T14:33:01.6989705Z <autogenerated>:1 +0x39 2022-09-09T14:33:01.6990287Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn/controller/services.(*nodeTracker).updateNodeInfo() 2022-09-09T14:33:01.6991204Z /home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/ovn/controller/services/node_tracker.go:150 +0x67b 2022-09-09T14:33:01.6993403Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn/controller/services.(*nodeTracker).updateNode() 2022-09-09T14:33:01.6994200Z /home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/ovn/controller/services/node_tracker.go:201 +0x709 2022-09-09T14:33:01.6996051Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn/controller/services.newNodeTracker.func1() 2022-09-09T14:33:01.6996855Z /home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/ovn/controller/services/node_tracker.go:78 +0x55 2022-09-09T14:33:01.6998163Z k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd() versus 2022-09-09T14:33:01.7016687Z Previous write at 0x000004402a78 by goroutine 1070: 2022-09-09T14:33:01.7017329Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/metrics.GetConfigDurationRecorder() 2022-09-09T14:33:01.7018197Z /home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/metrics/master.go:848 +0x310 2022-09-09T14:33:01.7018912Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn/controller/services.(*Controller).onServiceAdd() 2022-09-09T14:33:01.7019866Z /home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/ovn/controller/services/services_controller.go:366 +0x33f 2022-09-09T14:33:01.7020529Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn/controller/services.(*Controller).onServiceAdd-fm() 2022-09-09T14:33:01.7020900Z <autogenerated>:1 +0x4d 2022-09-09T14:33:01.7021431Z k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd() 2022-09-09T14:33:01.7022328Z /home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:232 +0x74 2022-09-09T14:33:01.7022857Z k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnAdd() Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Lock the cdr global variable
The `err` variable was not being set to the result of the endpoint generation, which meant that if - for whatever reason - creating those endpoints failed, the test would continue and fail further down the execution. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
…when-creating-endpoints e2e tests, services: assert endpoints were created
The `checkPingOnContainer` function is duplicated in the tests. It also returns an error, that is both already asserted within the function, and not checked by the caller. Thus, we can safely remove that call. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Mohamed Mahmoud <mmahmoud@redhat.com>
BZ2117255 append instead of overwrite logicalSwitch other-config
Add retry logic to Namespace resources
Due to the way how IPv6 multipath is implemented, it is not possible to filter by ifindex as the ifindex for multihop routes will always be 0. Instead, implement a custom filter. Reported-at: https://issues.redhat.com/browse/OCPBUGS-1318 Signed-off-by: Andreas Karis <ak.karis@gmail.com>
In environments where PodSecurity is enforced we can't use the default ns unless certain labels are set. Namespaces we create with the framework have these labels which let us create pods on them without restrictions. Since we're using disposable namespaces that are deleted in AfterEach there is no longer need to delete the gw pods. Signed-off-by: Ori Braunshtein <obraunsh@redhat.com>
EnsureAddressSet may be called by multiple go threads around the same time. While transactions are serialized, the operations that populate the transactions are not and that is where the determination on whether a row in ovsdb is to be created or updated via libovsdbops. The result of that is that the creation fails due to the constraints on the table index. This change is a placebo. It simply detects and logs this case since a higher level retry logic is expected to succeed on the following invocation. Co-authored-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> Signed-off-by: Flavio Fernandes <flaviof@redhat.com> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2108026
newOvnAddressSet: log mid-air collisions
getDefaultGatewayInterfaceByFamily: custom filter for MultiHop
Since the `br-ex` bridge already has the node masquerade IP address, it is no longer required to add the masquerade route to the nodes. The SNAT/unSNAT traffic to/from the node to use the node masquerade IP address will **still** be used, to specifically support the use case of packets sent from the node having a different source IP address. The `V4MasqueradeSubnet` had to be updated, since follow-up commits will require the `.4` address on that subnet to use as a dummy next-hop for deployments where there isn't a next-hop available. netlink utils: rename method This `LinkRoutesAddOrUpdateMTU` function can now be used to update the route MTU *or* the route source. Thus, rename the method accordingly. A follow up commit should try to come up with a better abstraction, since this function already does too much: It either: - adds a route **or** - updates a route - MTU **or** - source I think we should have 3 different functions, each doing a single thing, and leave to the caller to know which of those to invoke. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com> Co-authored-by: Tim Rozet <trozet@redhat.com> Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
For simplicity, the `.4` address on the node masquerade subnet was chosen - this one was the next available IP address on that subnet. Finally, also provision the MAC binding tables for each of the gateway routers, thus providing ARP resolution for the aforementioned IP address. The resolved MAC address is computed from the IP address, and is the same on each of the nodes, since traffic is not supposed to leave the node - and as such, there is no risk for collisions. The OVN nodes are also updates to use this dummy next-hop, and a permanent neighbor is added for said IP / MAC. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com> Co-authored-by: Tim Rozet <trozet@redhat.com>
Now that host -> service can work without using a default gateway there is no need to require it. With this change no next hops will be written to the l3 gateway config annotation for the node if none are detected. However, this only will work if --gateway-interface config is provided at ovnkube node start time. If that is not provided, we still rely on default gateway lookup to determine which interface should be used to communicate with the external bridge. Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
…ute on br-ex Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
remove networkPolicy.created field
E2E (exgw): create ns for the serving gw pods instead of using default
e2e, external gw test: remove duplicate funcs
This patch makes route filtering generic - it is up to the caller to specify which route to look for, along with the filter mask specifying the relevant attributes. With this, we can now look for a simple route - match stricly on the dst route - when the node is being created, but also have a finer match, also providing the GW, for the services controller. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
We were not creating healthchecks for service updates when the etp changed from local to cluster or back. This PR fixes this. It also fixes some endpoint slices code that earlier assumed svc:endpointslice was 1:1 mapping. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Don't assume policy type when evaluating isolation
Trim ACL names according to RFC1123
Fix flakiness in DNS tests
…brid remove the requirement that hybrid overlay needs subnet .3 address
49f284e introduced an issue where it never release the nsInfo lock. Fix this. Signed-off-by: Andreas Karis <ak.karis@gmail.com>
Fix lock issue in ensureNamespaceLocked
Conflicts due to hybrid overlay OCP HACK Signed-off-by: Andreas Karis <ak.karis@gmail.com> Conflicts: go-controller/hybrid-overlay/pkg/types/types.go go-controller/pkg/ovn/pods_test.go
/test e2e-metal-ipi-ovn-dualstack |
Now that the special masquerade IPs are assigned to the interface, they default flows that end up DNAT'ing return traffic from the GR->host from masquerade ip -> host ip were being overridden with secondary ip flows for extra ips. This fixes it by skipping special ips when evaluting the extra ips on a node. Signed-off-by: Tim Rozet <trozet@redhat.com> (cherry picked from commit 5a056e7)
data race issue for unit tests: ovn-org/ovn-kubernetes#3203 |
/retest-required |
/retest |
@trozet: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreaskaris, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreaskaris, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
- What this PR does and why is it needed
- Special notes for reviewers
- How to verify it
- Description for the changelog