Always allow egressip-node-updates #4102

tssurya · 2024-01-26T16:26:06Z

ovnkube-node pod goes offline
healthcheck fails
node deleted from EIP.status, patched
all OVN DB setup deleted across zones
CPIPC starts to process delete on Cloud Platforms
ovnkube-node pod comes back online
healtchcheck succeeds
egress-node add happens, BUT patch fails because CPIPC is still being
deleted: err: cloud update request failed, CloudPrivateIPConfig: 10.0.128.100 is being deleted
node add is added to retry queue, we wait for 30seconds
kubelet sends update every 10seconds -> update succeeds and
overrides the retry add entry because oldNode.ready == newNode.ready
THUS WE MISS TO PROCESS AN EVENT, patch never happens
OVN setup never happens, we are stuck.

This PR removes the oldNode.ready == newNode.ready; return logic which is present for scale reasons since this is needed to allow events to get synced correctly. Not proud of this fix, but this is why we need level driven EIP controllers big time :)

Does this have scale implications? Maybe at 0.0001% because the addEgressNode which is what will be called always checks if EIP.Spec != EIP.Status before calling reconcileEIP and thus we should be fine...

- What this PR does and why is it needed

- Special notes for reviewers

- How to verify it

- Description for the changelog

kyrtapz · 2024-02-07T14:25:44Z

This approach also addresses an issue reported here: https://issues.redhat.com/browse/OCPBUGS-27266.
Without it the retry mechanism for egressIP nodes doesn't really work for anything other than reachability change so I am in favor of getting this in, it would be best to move egressIP to level driven entirely... :)

If we go with the proposed solution I think we would have to modify the addNode function as it currently lists all egress IPs from the API:

ovn-kubernetes/go-controller/pkg/clustermanager/egressip_controller.go

Line 754 in 63641aa

egressIPs, err := eIPC.kube.GetEgressIPs()

ovn-kubernetes/go-controller/pkg/kube/kube.go

Lines 411 to 412 in 7447607

    
           err := pager.New(func(ctx context.Context, opts metav1.ListOptions) (runtime.Object, error) { 
        
           	return k.EIPClient.K8sV1().EgressIPs().List(ctx, opts)

We might also consider adapting deleteEgressNode and setNodeEgressReachable if possible.

tssurya · 2024-02-10T14:32:44Z

great point @kyrtapz : yeah let's start using the informer cache instead of making list calls to API because that will cause scale issues

tssurya · 2024-02-15T13:50:36Z

This is a ~~bandaid~~ knife bandaid that may or may not hurt another spot in the body :D

tssurya · 2024-02-15T13:54:03Z

@jcaamano will be our batman bandaid who will do a permanent bandaid when he does level driven controller

tssurya · 2024-02-15T14:18:29Z

get back to pat's offline comments for this one

jcaamano · 2024-02-15T14:27:40Z

Side question: Do we update the reach-ability checker with added/removed nodes?

jcaamano · 2024-02-15T15:31:48Z

Side question: Do we update the reach-ability checker with added/removed nodes?

Nevermind, confused on the initialization receiving an array of nodes that is not used for anything

func (eIPC *egressIPClusterController) initEgressNodeReachability(nodes []interface{}) error {
	go eIPC.checkEgressNodesReachability()
	return nil
}

Also confusing that that is run from sync function instead of just normal startup. Wondering if there is a reason behind that.

tssurya · 2024-02-16T15:26:48Z

go-controller/pkg/clustermanager/egressip_event_handler.go

 		if !isNewReady {
 			klog.Warningf("Node: %s is not ready, deleting it from egress assignment", newNode.Name)
 			if err := h.eIPC.deleteEgressNode(newNode.Name); err != nil {
 				return err
 			}
 		} else if isNewReady && isNewReachable {
-			klog.Infof("Node: %s is ready and reachable, adding it for egress assignment", newNode.Name)
+			if isOldReady != isNewReady {
+				klog.Infof("Node: %s is ready and reachable, adding it for egress assignment", newNode.Name)


@kyrtapz suggested not to do this as its stupid and not useful
so fixed this.

1) ovnkube-node pod goes offline 2) healthcheck fails 3) node deleted from EIP.status, patched 4) all OVN DB setup deleted across zones 5) CPIPC starts to process delete on Cloud Platforms 6) ovnkube-node pod comes back online 7) healtchcheck succeeds 8) egress-node add happens, BUT patch fails because CPIPC is still being deleted: err: cloud update request failed, CloudPrivateIPConfig: 10.0.128.100 is being deleted 9) node add is added to retry queue, we wait for 30seconds 10) kubelet sends update every 10seconds -> update succeeds and overrides the retry add entry because oldNode.ready == newNode.ready 11) THUS WE MISS TO PROCESS AN EVENT, patch never happens 12) OVN setup never happens, we are stuck. This PR removes the oldNode.ready == newNode.ready; return logic which is present for scale reasons since this is needed to allow events to get synced correctly. Not proud of this fix, but this is why we need level driven EIP controllers big time :) Does this have scale implications? Maybe at 0.0001% because the addEgressNode which is what will be called always checks if EIP.Spec != EIP.Status before calling reconcileEIP and thus we should be fine... Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

Since in the previous commit we will always call addEgressNode for every kubelet update which is every 10seconds for every node, we must stop using kapi listing for EIPs since this will not scale well. Let us use the informer caches to pull EIPs and then unless there are EIPs that have status.items != spec.items rest will be shortcircuited in this function. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

tssurya · 2024-02-16T15:30:27Z

/assign @trozet

trozet · 2024-02-16T21:24:57Z

go-controller/pkg/clustermanager/egressip_controller_test.go

@@ -1579,91 +1579,6 @@ var _ = ginkgo.Describe("OVN cluster-manager EgressIP Operations", func() {
 			err := app.Run([]string{app.Name})
 			gomega.Expect(err).NotTo(gomega.HaveOccurred())
 		})
-
-		ginkgo.It("egress node update should not mark the node as reachable if there was no label/readiness change", func() {


So the original code that bypasses adding/updating a node if the ready states are equal: 1f101eb#diff-796adb9d8a9b5b5636dc81e7350a00ef143e6e5bcb52ec30dc350c866cd9208cR784

This pre-dates the reachability check. So this test case was added later. This test case seems flawed to me in the first place. We should use any event to trigger us to check and ensure that the node is added, whether or not that is a node update or a reachability check success.

trozet · 2024-02-16T21:40:39Z

go-controller/pkg/clustermanager/egressip_event_handler.go

@@ -133,16 +132,12 @@ func (h *egressIPClusterControllerEventHandler) UpdateResource(oldObj, newObj in
 			}
 			return nil
 		}
-		if isOldReady == isNewReady && !isHostCIDRsAltered {


so walking through the consequences of not skipping potentially duplicate events:

Delete path:

We list all egress ips, with kube client (bad and we should fix that), negligible impact

If we already deleted this node, none of the status will match so no reconcile, no perf impact

Add path:

Set node reachable, takes a global allocator lock, we could improve this with per key map lock, but shouldnt be a big deal since 15 threads competing for the lock on an operation that takes no time

Add egress node: we get egress ips via kube client again (wth...), then we go through every egress IP and reconcile...yikes

This seems like a pretty big performance issue if there were a lot of egress IPs. We would constantly be churning on every node update all egress ips. Or am I missing something?

Would a better approach be to expose the reAddorDelete map that the reachability check uses, and then we can have the state if the node wasReachable. Then we can change this if statement to be:

if isOldReady == isNewReady && !isHostCIDRsAltered && wasReachable == isReachable ?

Would a better approach be to expose the reAddorDelete map that the reachability check uses, and then we can have the state if the node wasReachable. Then we can change this if statement to be:

I had the same thought to have a previous state of Reachability but the problem here is that the kubelet updates we get with oldNode, newNode both those obejcts will have healthy Reachability matrix so it still doesn't take action :( and doesn't help here.

This seems like a pretty big performance issue if there were a lot of egress IPs. We would constantly be churning on every node update all egress ips. Or am I missing something?

SO this is a legit scale question, one I had too. But the thing to notice inside the add/delete node func is that we call the reconcileEIP only when status!=spec; so only if the EIP needs to be reconciled we are calling reconcile not otherwise; I meant: https://github.com/ovn-org/ovn-kubernetes/pull/4102/files#diff-f706ecf5fbc17ede0f5a27577b1d4b3c08c00fe2837f6c1be461503de71ce21bR760 for the update/add case.

The delete seems to be doing something else.. lemme double check

trozet · 2024-02-16T21:43:07Z

go-controller/pkg/clustermanager/egressip_controller_test.go

-				gomega.Expect(egressIPs[0]).To(gomega.Equal(egressIP))
-				hcClient := fakeClusterManagerOVN.eIPC.allocator.cache[node.Name].healthClient.(*fakeEgressIPHealthClient)
-				hcClient.FakeProbeFailure = true
-				// explicitly call check reachability, periodic checker is not active


I assume even if we delete this test case, there is another test case that checks that the reachability functionality works correctly, right?

yeah good point, ; let me check this

tssurya requested review from trozet, dcbw, girishmg and jcaamano as code owners January 26, 2024 16:26

tssurya force-pushed the always-allow-egress-node-updates branch from 5f9fc9d to 417368c Compare January 26, 2024 16:32

tssurya force-pushed the always-allow-egress-node-updates branch 3 times, most recently from cd36244 to 422bfd4 Compare February 10, 2024 15:57

tssurya mentioned this pull request Feb 10, 2024

flake: flake at [FAIL] Node IP and MAC address migration when the node IPv4 address is updated when ETP=Local service with host network backend is configured #4119

Open

tssurya commented Feb 16, 2024

View reviewed changes

tssurya force-pushed the always-allow-egress-node-updates branch from 422bfd4 to ff8d45c Compare February 16, 2024 15:28

tssurya added 2 commits February 16, 2024 16:29

tssurya force-pushed the always-allow-egress-node-updates branch from ff8d45c to f9bd58d Compare February 16, 2024 15:29

trozet reviewed Feb 16, 2024

View reviewed changes

tssurya added kind/bug All issues that are bugs and PRs opened to fix bugs egress-ip Issues related to EgressIP feature labels Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always allow egressip-node-updates #4102

Always allow egressip-node-updates #4102

tssurya commented Jan 26, 2024 •

edited

kyrtapz commented Feb 7, 2024 •

edited

tssurya commented Feb 10, 2024

tssurya commented Feb 15, 2024

tssurya commented Feb 15, 2024

tssurya commented Feb 15, 2024

jcaamano commented Feb 15, 2024

jcaamano commented Feb 15, 2024

tssurya Feb 16, 2024

tssurya commented Feb 16, 2024

trozet Feb 16, 2024

trozet Feb 16, 2024

tssurya Mar 14, 2024

tssurya Mar 14, 2024 •

edited

trozet Feb 16, 2024

tssurya Mar 14, 2024

Always allow egressip-node-updates #4102

Are you sure you want to change the base?

Always allow egressip-node-updates #4102

Conversation

tssurya commented Jan 26, 2024 • edited

kyrtapz commented Feb 7, 2024 • edited

tssurya commented Feb 10, 2024

tssurya commented Feb 15, 2024

tssurya commented Feb 15, 2024

tssurya commented Feb 15, 2024

jcaamano commented Feb 15, 2024

jcaamano commented Feb 15, 2024

tssurya Feb 16, 2024

Choose a reason for hiding this comment

tssurya commented Feb 16, 2024

trozet Feb 16, 2024

Choose a reason for hiding this comment

trozet Feb 16, 2024

Choose a reason for hiding this comment

tssurya Mar 14, 2024

Choose a reason for hiding this comment

tssurya Mar 14, 2024 • edited

Choose a reason for hiding this comment

trozet Feb 16, 2024

Choose a reason for hiding this comment

tssurya Mar 14, 2024

Choose a reason for hiding this comment

tssurya commented Jan 26, 2024 •

edited

kyrtapz commented Feb 7, 2024 •

edited

tssurya Mar 14, 2024 •

edited