Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1854402: Improve bootstrap reliability on heterogeneous UPI network configurations #385

Commits on Jul 17, 2020

  1. Improve bootstrap reliability on heterogeneous UPI network configurat…

    …ions
    
    Before this change, bootstrap IP discovery assumed that the first address of the
    unicast interface must be the bootstrap IP. This assumption doesn't always hold
    in the face of user-defined interfaces and addresses whose ordering isn't
    guaranteed. When the assumptions are broken and the incorrect bootstrap IP is
    selected, bootstrapping fails because quorum cannot be established.
    
    This change improves the accuracy of bootstrap IP discovery by more flexibly
    accounting for a wider variety of possible network interface configurations.
    
    An IP is now considered the bootstrap IP if all of the following are true.
    
    For IPv4:
    
    * The IP is contained by the machine CIDR defined in the cluster configuration
    * On bare metal platforms, the IP is not the API or DNS VIP in the cluster configuration
    
    For IPv6, the same must be true in addition to the following:
    
    * The IP is not deprecated
    * The IP is routable according at least one non-default route
    
    This work is adapted from https://github.com/openshift/baremetal-runtimecfg/blob/master/pkg/utils/utils.go.
    ironcladlou committed Jul 17, 2020
    Configuration menu
    Copy the full SHA
    002e22a View commit details
    Browse the repository at this point in the history
  2. Make bootstrap IP discovery backwards compatible with previous assump…

    …tions
    
    Before this patch, the new bootstrap IP discovery mechanism would fail bootstrapping
    if no IP could be intelligently discovered. A side-effect of that is effectively
    validating the machine network CIDR by asserting the bootstrap IPs ability to be
    discovered within it. Because there may still be edge cases where we fail to detect
    but where the old assumption to choose the "first IP" would still work, we could
    introduce an undue burden to fix all existing uses of machine network CIDR even
    when our fallback could continue to work in those cases.
    
    This patch adds a fallback behavior so that when intelligent discovery fails, the
    first listed IP is selected with a warning, preserving the original discovery
    behavior.
    
    This does effectively mean that clusters can still fail to bootstrap if even the
    first IP assumption is wrong, but we can presumably use those failures to further
    improve detection.
    
    A worthwhile future improvement would be to find a way to more loudly and clearly
    surface to the user when we're blindly guessing about the IP, as the resulting
    downstream failure may obfuscate the source of failure if bootkube logs are lost.
    ironcladlou committed Jul 17, 2020
    Configuration menu
    Copy the full SHA
    68ef6a9 View commit details
    Browse the repository at this point in the history