-
Notifications
You must be signed in to change notification settings - Fork 1.8k
OSDOCS#17158: node replacement procedure #102520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@skopacz1: This pull request references OSDOCS-17158 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@skopacz1: No Jira issue is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
🤖 Tue Nov 25 16:14:30 - Prow CI generated the docs preview: |
8f930b9 to
b79c46f
Compare
| . Generate `providerID` lines for control plane nodes by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc get -n openshift-machine-api baremetalhost -l installer.openshift.io/role=control-plane -ojson | jq -r '.items[] | "baremetalhost:///openshift-machine-api/" + .metadata.name + "/" + .metadata.uid' | ||
| ---- | ||
|
|
||
| . Identify the cluster by running the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can I get example output to show for both of these commands, if that would be helpful?
- For the second step, is there a more descriptive phrasing I could use for what the purpose of the step is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion for these 2 commands:
1, Get providerID for control plane nodes by running the following command:
2, Get cluster information for labels by running the following command:
<cluster_api_cluster> should be the cluster_name, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can update both commands to use the new suggested wording.
As for whether cluster-api-cluster in the second command is the actual name of the cluster, I am not sure. I can try to find the original authors and ask them if that was their intention.
| -L machine.openshift.io/cluster-api-cluster | ||
| ---- | ||
|
|
||
| . Create a `Machine` object for the new control plane node by creating a yaml file similar to the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this step, should I break it out into two substeps, where they create a YAML file like new-machine.yaml and then run oc create -f new-machine.yaml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One step is fine to me
| $ oc get bmh -n openshift-machine-api -ojson | jq -r '.items[] | .metadata.name + " ProvisioningState:" + .status.provisioning.state' | ||
| ---- | ||
|
|
||
| .. If the provisioning state is not `unmanaged`: Change the provisioning state by running the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have only one note here, maybe we don't need a list?
| .. If the provisioning state is not `unmanaged`: Change the provisioning state by running the following command: | |
| If the provisioning state is not `unmanaged`, change the provisioning state by running the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it to an IMPORTANT admonition rather than having it be a substep, sure. The only thing I shouldn't do is make this instruction a second paragraph of the step to "Review the BareMetalHost object's provisioning state"
e80bffe to
a11b7ea
Compare
|
@skopacz1: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/lgtm |
maxwelldb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of comments. I think the formatting and .Procedure ones should be addressed prior to merge.
| * You have access to the cluster as a user with the `cluster-admin` role. | ||
| * You have taken an xref:../../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backup-etcd[etcd backup] in case you encounter any issues. | ||
| * You have downloaded and installed the link:https://console.redhat.com/openshift/downloads#tool-coreos-installer[`coreos-installer` CLI]. | ||
| * Your cluster does not have a control plane `machineset`. You can check for `machinesets` by running the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could theoretically be a procedure module, particularly if it's a common situation.
| $ oc rsh -n openshift-etcd <running_pod> | ||
| ---- | ||
| + | ||
| Where `<running_pod>` is the name of a running pod shown in the previous step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See SSG for user-replaceable value DL format, i.e. where: and so on.
| + | ||
| [IMPORTANT] | ||
| ==== | ||
| * Make note of the ID and the name of the unhealthy etcd member because these values are required later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the importance, could make this its own step or a second sentence for this list item. Flow consideration. 🤷
| After you remove the `BareMetalHost` and `Machine` objects, the machine controller automatically deletes the `Node` object. | ||
| ==== | ||
|
|
||
| . If deletion of the machine is delayed for any reason or the command is obstructed and delayed: Force deletion by removing the machine object finalizer field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a reformat?
| .. Remove the `status` section of the file. | ||
| -- | ||
| + | ||
| The resulting file should resemble the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The resulting file should resemble the following: | |
| The resulting file should resemble the following example: |
| + | ||
| This command also removes the starting `userData` line of the ignition secret. | ||
|
|
||
| . Create an nmstate YAML file titled `new_controlplane_nmstate.yaml` for the new node's network configuration, using the following example for reference: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nmstate or NMState?
|
|
||
| . Ensure that the Bare Metal Operator is available by running the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| . Ensure that the Bare Metal Operator is available by running the following command: | |
| .Procedure | |
| . Ensure that the Bare Metal Operator is available by running the following command: |
OSDOCS-17158
Version(s): 4.19+
This PR takes this KCS and puts it into product docs.
QE review:
Preview: Replacing a failed bare-metal control plane node without DHCP and BMC credentials