Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-25753,OCPBUGS-22721: Run resolv-prepender entirely async #4102

Merged

Commits on Jan 12, 2024

  1. OCPBUGS-25753: Run resolv-prepender entirely async

    Currently the resolv-prepender dispatcher script starts the systemd
    service and then waits for it to complete. This can cause the
    dispatcher script to time out if the runtimecfg image pull is slow
    or if resolv.conf does not get populated in a timely fashion (it's
    not entirely clear to me why the latter happens, but it does). This
    can cause configure-ovs to time out if there are a large number of
    interfaces on the system triggering the dispatcher script, such as
    when there are many VLANs configured.
    
    To avoid this, we can stop waiting for the systemd service in the
    dispatcher script. In fact, there's an argument that we shouldn't
    wait since we need to be able to handle asynchronous execution
    anyway for the slow image pull case (which was the entire reason the
    script was split into a service the way it is).
    
    I have found a few possible issues with async execution however:
    * If we start the service with an empty $DHCP6_FQDN_FQDN value and
      then later get a new value for that, we may not correctly apply
      the new value if the service is still running because we only
      ever "systemd start" the service, which is a noop if the service
      is already running.
    * Similarly, if new IP4/6_DOMAINS values come in on a later
      connection that may not be reflected in the service either.
    
    Even though these may sound like the same problem, I mention them
    separately on purpose because the solutions are different:
    * For the DHCP6 case, we can move that logic back into the dispatcher
      script so we will always set the hostname no matter what happens
      with the prepender code. One could argue that this should be in
      its own script anyway since it's largely unrelated to resolv.conf.
    * For the domains case, we do need to restart the service since the
      domains are involved in resolv.conf generation. However, we do not
      want to restart the service every time since that may be unnecessary
      and if we restart in the middle of the image pull it could result
      in a corrupt image (the whole thing we were trying to avoid by
      running this as a service in the first place).
    
      To avoid problems with restarting the service when we don't want to,
      I've added logic that only restarts the service if there are
      changed env values AND the runtimecfg image has already been pulled.
      This should mean the worst case scenario is that we don't properly
      set the domains and resolv.conf is temporarily generated with and
      incorrect search line. This should be resolved the next time any
      event that triggers the dispatcher script happens.
    cybertron committed Jan 12, 2024
    Configuration menu
    Copy the full SHA
    10a4774 View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2024

  1. Configuration menu
    Copy the full SHA
    4609fda View commit details
    Browse the repository at this point in the history