Skip to content

Conversation

@skopacz1
Copy link
Contributor

@skopacz1 skopacz1 commented Nov 13, 2025

OSDOCS-17158

Version(s): 4.19+

This PR takes this KCS and puts it into product docs.

QE review:

  • QE has approved this change.

Preview: Replacing a failed bare-metal control plane node without DHCP and BMC credentials

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 13, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 13, 2025

@skopacz1: This pull request references OSDOCS-17158 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

OSDOCS-17158

Version(s): 4.19+

This PR takes this KCS and puts it into product docs.

QE review:

  • QE has approved this change.

Preview:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@skopacz1 skopacz1 changed the title OSDOCS-17158: node replacement procedure OSDOCS#17158: node replacement procedure Nov 13, 2025
@openshift-ci-robot openshift-ci-robot removed the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 13, 2025
@openshift-ci-robot
Copy link

@skopacz1: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

In response to this:

OSDOCS-17158

Version(s): 4.19+

This PR takes this KCS and puts it into product docs.

QE review:

  • QE has approved this change.

Preview:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 13, 2025
@skopacz1 skopacz1 added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. branch/enterprise-4.19 branch/enterprise-4.20 branch/enterprise-4.21 and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 13, 2025
@skopacz1 skopacz1 added this to the Continuous Release milestone Nov 13, 2025
@openshift-ci openshift-ci bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 13, 2025
@ocpdocs-previewbot
Copy link

ocpdocs-previewbot commented Nov 13, 2025

🤖 Tue Nov 25 16:14:30 - Prow CI generated the docs preview:

https://102520--ocpdocs-pr.netlify.app/openshift-enterprise/latest/nodes/nodes/nodes-nodes-replace-control-plane.html

Comment on lines 13 to 20
. Generate `providerID` lines for control plane nodes by running the following command:
+
[source,terminal]
----
$ oc get -n openshift-machine-api baremetalhost -l installer.openshift.io/role=control-plane -ojson | jq -r '.items[] | "baremetalhost:///openshift-machine-api/" + .metadata.name + "/" + .metadata.uid'
----

. Identify the cluster by running the following command:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can I get example output to show for both of these commands, if that would be helpful?
  2. For the second step, is there a more descriptive phrasing I could use for what the purpose of the step is?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion for these 2 commands:
1, Get providerID for control plane nodes by running the following command:
2, Get cluster information for labels by running the following command:
<cluster_api_cluster> should be the cluster_name, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update both commands to use the new suggested wording.

As for whether cluster-api-cluster in the second command is the actual name of the cluster, I am not sure. I can try to find the original authors and ask them if that was their intention.

-L machine.openshift.io/cluster-api-cluster
----

. Create a `Machine` object for the new control plane node by creating a yaml file similar to the following:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this step, should I break it out into two substeps, where they create a YAML file like new-machine.yaml and then run oc create -f new-machine.yaml?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One step is fine to me

$ oc get bmh -n openshift-machine-api -ojson | jq -r '.items[] | .metadata.name + " ProvisioningState:" + .status.provisioning.state'
----

.. If the provisioning state is not `unmanaged`: Change the provisioning state by running the following command:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have only one note here, maybe we don't need a list?

Suggested change
.. If the provisioning state is not `unmanaged`: Change the provisioning state by running the following command:
If the provisioning state is not `unmanaged`, change the provisioning state by running the following command:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to an IMPORTANT admonition rather than having it be a substep, sure. The only thing I shouldn't do is make this instruction a second paragraph of the step to "Review the BareMetalHost object's provisioning state"

@openshift-ci
Copy link

openshift-ci bot commented Nov 25, 2025

@skopacz1: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@zniu1011
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 26, 2025
@skopacz1 skopacz1 added the merge-review-needed Signifies that the merge review team needs to review this PR label Nov 26, 2025
@maxwelldb maxwelldb self-requested a review November 26, 2025 15:42
@maxwelldb maxwelldb added the merge-review-in-progress Signifies that the merge review team is reviewing this PR label Nov 26, 2025
Copy link
Contributor

@maxwelldb maxwelldb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments. I think the formatting and .Procedure ones should be addressed prior to merge.

* You have access to the cluster as a user with the `cluster-admin` role.
* You have taken an xref:../../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backup-etcd[etcd backup] in case you encounter any issues.
* You have downloaded and installed the link:https://console.redhat.com/openshift/downloads#tool-coreos-installer[`coreos-installer` CLI].
* Your cluster does not have a control plane `machineset`. You can check for `machinesets` by running the following command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could theoretically be a procedure module, particularly if it's a common situation.

$ oc rsh -n openshift-etcd <running_pod>
----
+
Where `<running_pod>` is the name of a running pod shown in the previous step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See SSG for user-replaceable value DL format, i.e. where: and so on.

+
[IMPORTANT]
====
* Make note of the ID and the name of the unhealthy etcd member because these values are required later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the importance, could make this its own step or a second sentence for this list item. Flow consideration. 🤷

After you remove the `BareMetalHost` and `Machine` objects, the machine controller automatically deletes the `Node` object.
====

. If deletion of the machine is delayed for any reason or the command is obstructed and delayed: Force deletion by removing the machine object finalizer field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a reformat?

.. Remove the `status` section of the file.
--
+
The resulting file should resemble the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The resulting file should resemble the following:
The resulting file should resemble the following example:

+
This command also removes the starting `userData` line of the ignition secret.

. Create an nmstate YAML file titled `new_controlplane_nmstate.yaml` for the new node's network configuration, using the following example for reference:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nmstate or NMState?

Comment on lines +10 to +11

. Ensure that the Bare Metal Operator is available by running the following command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
. Ensure that the Bare Metal Operator is available by running the following command:
.Procedure
. Ensure that the Bare Metal Operator is available by running the following command:

@maxwelldb maxwelldb removed merge-review-in-progress Signifies that the merge review team is reviewing this PR merge-review-needed Signifies that the merge review team needs to review this PR labels Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

branch/enterprise-4.19 branch/enterprise-4.20 branch/enterprise-4.21 jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants