Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always allow egressip-node-updates #4102

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Commits on Feb 16, 2024

  1. Always allow egressip-node-updates

    1) ovnkube-node pod goes offline
    2) healthcheck fails
    3) node deleted from EIP.status, patched
    4) all OVN DB setup deleted across zones
    5) CPIPC starts to process delete on Cloud Platforms
    6) ovnkube-node pod comes back online
    7) healtchcheck succeeds
    8) egress-node add happens, BUT patch fails because CPIPC is still being
       deleted: err: cloud update request failed, CloudPrivateIPConfig: 10.0.128.100 is being deleted
    9) node add is added to retry queue, we wait for 30seconds
    10) kubelet sends update every 10seconds -> update succeeds and
        overrides the retry add entry because oldNode.ready == newNode.ready
    11) THUS WE MISS TO PROCESS AN EVENT, patch never happens
    12) OVN setup never happens, we are stuck.
    
    This PR removes the oldNode.ready == newNode.ready; return logic which
    is present for scale reasons since this is needed to allow events to get
    synced correctly. Not proud of this fix, but this is why we need level
    driven EIP controllers big time :)
    
    Does this have scale implications? Maybe at 0.0001% because the
    addEgressNode which is what will be called always checks if EIP.Spec !=
    EIP.Status before calling reconcileEIP and thus we should be fine...
    
    Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
    tssurya committed Feb 16, 2024
    Configuration menu
    Copy the full SHA
    09bdb5e View commit details
    Browse the repository at this point in the history
  2. Make addEgressNode and deleteEgressNode use caches for pulling EIPs

    Since in the previous commit we will always call addEgressNode
    for every kubelet update which is every 10seconds for every node,
    we must stop using kapi listing for EIPs since this will not
    scale well. Let us use the informer caches to pull EIPs and then
    unless there are EIPs that have status.items != spec.items
    rest will be shortcircuited in this function.
    
    Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
    tssurya committed Feb 16, 2024
    Configuration menu
    Copy the full SHA
    f9bd58d View commit details
    Browse the repository at this point in the history