Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1975296: Respect MaxUnhealthy limit for external remediation #902

Merged

Commits on Aug 5, 2021

  1. Respect MaxUnhealthy limit for external remediation

    Conventional remediation consists of simply deleting the Machine object.
    In consequence, it was safe to consider that any Machines that do not
    need remediation, have a Node, and are not in the process of being
    deleted, are 'healthy'.
    
    However, external remediation takes place not by deleting a Machine but
    by adding an annotation to it. While the Machine continues to exist (and
    may be associated with a Node for part of the time), it will not be in a
    working state throughout the remediation (generally because they are
    being rebooted).
    
    Because these Machines were considered 'healthy', additional Machines
    could be remediated during this process in violation of the MaxUnhealthy
    limit. If the process of acting on the external remediation annotation
    was delayed, potentially the whole cluster could be remediated
    simultaneously, thus taking it out of service.
    
    To prevent this, treat Machines with the external remediation annotation
    as unhealthy so that the MaxUnhealthy limit is respected.
    
    Note that when a RemediationTemplate (as added in
    338eab5) is provided, it will *not* be
    taken into account in determining whether a Machine is healthy (unless
    it also results in the external remediation annotation being applied to
    the Machine), so the same issue still exists in that case.
    zaneb committed Aug 5, 2021
    Copy the full SHA
    202f10b View commit details
    Browse the repository at this point in the history

Commits on Aug 10, 2021

  1. Refactor external remediation annotation check into its own func

    Signed-off-by: Marc Sluiter <msluiter@redhat.com>
    slintes committed Aug 10, 2021
    Copy the full SHA
    e5078df View commit details
    Browse the repository at this point in the history
  2. Add unit test

    Signed-off-by: Marc Sluiter <msluiter@redhat.com>
    slintes committed Aug 10, 2021
    Copy the full SHA
    9b85699 View commit details
    Browse the repository at this point in the history