New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement explicit reboot mode options #795
Conversation
|
Hi @rdoxenham. Thanks for your PR. I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start. I have a few comments inline, and there's the TODO item for making a public hard power off API in the provisioner.
We should also make sure metal3-io/metal3-docs#164 is fully up to date with the new API and that it's approved before this PR.
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
|
This looks really close. The commit messages don't match the code (there's no isHardReboot now) so fixing that up should make this ready to go, as soon as the design is approved. |
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
|
/test-integration |
|
/ok-to-test |
|
This looks good, let's see what the tests say. I don't see this annotation documented in this repo anywhere at all, so I won't hold up the code change over adding documentation. Maybe we can add that in another PR. |
In this commit we add further integration for the RebootMode type and no longer rely on a boolean for understanding whether the reboot request was for a hardPowerOff() or softPowerOff(). This will allow us to expand the modes we support later down the line if required without any significant modifications required to the provisioner API.
|
/retitle support hard reboot mode via annotation |
|
/test-integration |
|
/retitle Implement explicit reboot mode options That should make the title match the design doc PR in metal3-io/metal3-docs#164 a little better. |
|
I'm happy with this version, so after the design doc merges next week if there are no other negative comments on this PR it should be able to merge quickly. /approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dhellmann, rdoxenham The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
This change adds an additional mode to the reboot annotation that forces all nodes sent for remediation, e.g. via a MachineHealthCheck, to be forcefully rebooted rather than defaulting to a soft reboot first, as it is today. The primary drive behind this change is to enable quicker recovery of workloads, e.g. for high-availability use cases, and by defaulting to forced hard reboot we can enable functionality very close to fencing. This change shouldn't impact any other non-remediation reboot requests, as the hard reboot functionality only takes place when the mode=hard annotation is applied to the node. All of the work on the BMO can be found in the link below. Whilst we depend on this PR to have a complete solution, we don't have a hard dependency on them merging together. BMO PR: metal3-io/baremetal-operator#795
The default reboot-interface behaviour is to attempt a soft power
off, and if this fails, revert to a hard power off (PR #294). For high
availability use-cases we require the ability to immediately power-off
a node. This PR attempts to address that requirement and is a WIP given
that other changes are going to be required to the provisioner API and
the CAPBM actuator to actually enact these changes.
Also see: https://bugzilla.redhat.com/show_bug.cgi?id=1927678
And design doc: metal3-io/metal3-docs#164