Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
1) ovnkube-node pod goes offline 2) healthcheck fails 3) node deleted from EIP.status, patched 4) all OVN DB setup deleted across zones 5) CPIPC starts to process delete on Cloud Platforms 6) ovnkube-node pod comes back online 7) healtchcheck succeeds 8) egress-node add happens, BUT patch fails because CPIPC is still being deleted: err: cloud update request failed, CloudPrivateIPConfig: 10.0.128.100 is being deleted 9) node add is added to retry queue, we wait for 30seconds 10) kubelet sends update every 10seconds -> update succeeds and overrides the retry add entry because oldNode.ready == newNode.ready 11) THUS WE MISS TO PROCESS AN EVENT, patch never happens 12) OVN setup never happens, we are stuck. This PR removes the oldNode.ready == newNode.ready; return logic which is present for scale reasons since this is needed to allow events to get synced correctly. Not proud of this fix, but this is why we need level driven EIP controllers big time :) Does this have scale implications? Maybe at 0.0001% because the addEgressNode which is what will be called always checks if EIP.Spec != EIP.Status before calling reconcileEIP and thus we should be fine... Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
- Loading branch information