New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always allow egressip-node-updates #4102
base: master
Are you sure you want to change the base?
Conversation
5f9fc9d
to
417368c
Compare
This approach also addresses an issue reported here: https://issues.redhat.com/browse/OCPBUGS-27266. If we go with the proposed solution I think we would have to modify the
ovn-kubernetes/go-controller/pkg/kube/kube.go Lines 411 to 412 in 7447607
We might also consider adapting deleteEgressNode and setNodeEgressReachable if possible.
|
great point @kyrtapz : yeah let's start using the informer cache instead of making list calls to API because that will cause scale issues |
cd36244
to
422bfd4
Compare
This is a |
@jcaamano will be our batman bandaid who will do a permanent bandaid when he does level driven controller |
get back to pat's offline comments for this one |
Side question: Do we update the reach-ability checker with added/removed nodes? |
Nevermind, confused on the initialization receiving an array of nodes that is not used for anything
Also confusing that that is run from sync function instead of just normal startup. Wondering if there is a reason behind that. |
if !isNewReady { | ||
klog.Warningf("Node: %s is not ready, deleting it from egress assignment", newNode.Name) | ||
if err := h.eIPC.deleteEgressNode(newNode.Name); err != nil { | ||
return err | ||
} | ||
} else if isNewReady && isNewReachable { | ||
klog.Infof("Node: %s is ready and reachable, adding it for egress assignment", newNode.Name) | ||
if isOldReady != isNewReady { | ||
klog.Infof("Node: %s is ready and reachable, adding it for egress assignment", newNode.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyrtapz suggested not to do this as its stupid and not useful
so fixed this.
422bfd4
to
ff8d45c
Compare
1) ovnkube-node pod goes offline 2) healthcheck fails 3) node deleted from EIP.status, patched 4) all OVN DB setup deleted across zones 5) CPIPC starts to process delete on Cloud Platforms 6) ovnkube-node pod comes back online 7) healtchcheck succeeds 8) egress-node add happens, BUT patch fails because CPIPC is still being deleted: err: cloud update request failed, CloudPrivateIPConfig: 10.0.128.100 is being deleted 9) node add is added to retry queue, we wait for 30seconds 10) kubelet sends update every 10seconds -> update succeeds and overrides the retry add entry because oldNode.ready == newNode.ready 11) THUS WE MISS TO PROCESS AN EVENT, patch never happens 12) OVN setup never happens, we are stuck. This PR removes the oldNode.ready == newNode.ready; return logic which is present for scale reasons since this is needed to allow events to get synced correctly. Not proud of this fix, but this is why we need level driven EIP controllers big time :) Does this have scale implications? Maybe at 0.0001% because the addEgressNode which is what will be called always checks if EIP.Spec != EIP.Status before calling reconcileEIP and thus we should be fine... Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Since in the previous commit we will always call addEgressNode for every kubelet update which is every 10seconds for every node, we must stop using kapi listing for EIPs since this will not scale well. Let us use the informer caches to pull EIPs and then unless there are EIPs that have status.items != spec.items rest will be shortcircuited in this function. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
ff8d45c
to
f9bd58d
Compare
/assign @trozet |
@@ -1579,91 +1579,6 @@ var _ = ginkgo.Describe("OVN cluster-manager EgressIP Operations", func() { | |||
err := app.Run([]string{app.Name}) | |||
gomega.Expect(err).NotTo(gomega.HaveOccurred()) | |||
}) | |||
|
|||
ginkgo.It("egress node update should not mark the node as reachable if there was no label/readiness change", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the original code that bypasses adding/updating a node if the ready states are equal: 1f101eb#diff-796adb9d8a9b5b5636dc81e7350a00ef143e6e5bcb52ec30dc350c866cd9208cR784
This pre-dates the reachability check. So this test case was added later. This test case seems flawed to me in the first place. We should use any event to trigger us to check and ensure that the node is added, whether or not that is a node update or a reachability check success.
@@ -133,16 +132,12 @@ func (h *egressIPClusterControllerEventHandler) UpdateResource(oldObj, newObj in | |||
} | |||
return nil | |||
} | |||
if isOldReady == isNewReady && !isHostCIDRsAltered { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so walking through the consequences of not skipping potentially duplicate events:
- Delete path:
- We list all egress ips, with kube client (bad and we should fix that), negligible impact
- If we already deleted this node, none of the status will match so no reconcile, no perf impact
- Add path:
- Set node reachable, takes a global allocator lock, we could improve this with per key map lock, but shouldnt be a big deal since 15 threads competing for the lock on an operation that takes no time
- Add egress node: we get egress ips via kube client again (wth...), then we go through every egress IP and reconcile...yikes
This seems like a pretty big performance issue if there were a lot of egress IPs. We would constantly be churning on every node update all egress ips. Or am I missing something?
Would a better approach be to expose the reAddorDelete map that the reachability check uses, and then we can have the state if the node wasReachable. Then we can change this if statement to be:
if isOldReady == isNewReady && !isHostCIDRsAltered && wasReachable == isReachable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a better approach be to expose the reAddorDelete map that the reachability check uses, and then we can have the state if the node wasReachable. Then we can change this if statement to be:
I had the same thought to have a previous state of Reachability but the problem here is that the kubelet updates we get with oldNode, newNode both those obejcts will have healthy Reachability matrix so it still doesn't take action :( and doesn't help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a pretty big performance issue if there were a lot of egress IPs. We would constantly be churning on every node update all egress ips. Or am I missing something?
SO this is a legit scale question, one I had too. But the thing to notice inside the add/delete node func is that we call the reconcileEIP only when status!=spec
; so only if the EIP needs to be reconciled we are calling reconcile not otherwise; I meant: https://github.com/ovn-org/ovn-kubernetes/pull/4102/files#diff-f706ecf5fbc17ede0f5a27577b1d4b3c08c00fe2837f6c1be461503de71ce21bR760 for the update/add case.
The delete seems to be doing something else.. lemme double check
gomega.Expect(egressIPs[0]).To(gomega.Equal(egressIP)) | ||
hcClient := fakeClusterManagerOVN.eIPC.allocator.cache[node.Name].healthClient.(*fakeEgressIPHealthClient) | ||
hcClient.FakeProbeFailure = true | ||
// explicitly call check reachability, periodic checker is not active |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume even if we delete this test case, there is another test case that checks that the reachability functionality works correctly, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah good point, ; let me check this
deleted: err: cloud update request failed, CloudPrivateIPConfig: 10.0.128.100 is being deleted
overrides the retry add entry because oldNode.ready == newNode.ready
This PR removes the oldNode.ready == newNode.ready; return logic which is present for scale reasons since this is needed to allow events to get synced correctly. Not proud of this fix, but this is why we need level driven EIP controllers big time :)
Does this have scale implications? Maybe at 0.0001% because the addEgressNode which is what will be called always checks if EIP.Spec != EIP.Status before calling reconcileEIP and thus we should be fine...
- What this PR does and why is it needed
- Special notes for reviewers
- How to verify it
- Description for the changelog