diff --git a/_topic_maps/_topic_map.yml b/_topic_maps/_topic_map.yml index 520e9c4f01c9..2ed6321c279c 100644 --- a/_topic_maps/_topic_map.yml +++ b/_topic_maps/_topic_map.yml @@ -2430,6 +2430,8 @@ Topics: File: cpmso-troubleshooting - Name: Disabling the control plane machine set File: cpmso-disabling + - Name: Manually scaling control plane machines + File: cpmso-manually-scaling-control-planes - Name: Managing machines with the Cluster API Dir: cluster_api_machine_management Topics: diff --git a/machine_management/control_plane_machine_management/cpmso-manually-scaling-control-planes.adoc b/machine_management/control_plane_machine_management/cpmso-manually-scaling-control-planes.adoc new file mode 100644 index 000000000000..e0f560d05a95 --- /dev/null +++ b/machine_management/control_plane_machine_management/cpmso-manually-scaling-control-planes.adoc @@ -0,0 +1,17 @@ +:_mod-docs-content-type: ASSEMBLY +[id="cpmso-manually-scaling-control-planes"] += Manually scaling control plane machines +include::_attributes/common-attributes.adoc[] +:context: cpmso-manually-scaling-control-planes + +toc::[] + +When installing a cluster on bare-metal infrastructure, you can manually scale up to 4 or 5 control plane nodes for your cluster. Consider this use case in situations where you need to recover your cluster from a degraded state, perform deep-level debugging, or ensure stability and security of the control planes in complex scenarios. + +[IMPORTANT] +==== +Red{nbsp}Hat supports a cluster that has 4 or 5 control plane nodes only on bare-metal infrastructure. +==== + +// Adding a control plane node to your cluster +include::modules/creating-control-plane-node.adoc[leveloffset=+1] diff --git a/modules/creating-control-plane-node.adoc b/modules/creating-control-plane-node.adoc new file mode 100644 index 000000000000..897652aa0972 --- /dev/null +++ b/modules/creating-control-plane-node.adoc @@ -0,0 +1,380 @@ +// Module included in the following assemblies: +// +// * machine_management/control_plane_machines_management/cpmso-manually-scaling-control-planes.adoc + +:_mod-docs-content-type: PROCEDURE +[id="creating-control-plane-node_{context}"] += Adding a control plane node to your cluster + +When installing a cluster on bare-metal infrastructure, you can manually scale up to 4 or 5 control plane nodes for your cluster. The example in the procedure uses `node-5` as the new control plane node. + +.Prerequisites + +* You have installed a healthy cluster with at least three control plane nodes. +* You have created a single control plane node that you intend to add to your cluster as a postinstalltion task. + +.Procedure + +. Retrieve pending Certificate Signing Requests (CSRs) for the new control plane node by entering the following command: ++ +[source,terminal] +---- +$ oc get csr | grep Pending +---- + +. Approve all pending CSRs for the control plane node by entering the following command: ++ +[source,terminal] +---- +$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve +---- ++ +[IMPORTANT] +==== +You must approve the CSRs to complete the installation. +==== + +. Confirm that the control plane node is in the `Ready` status by entering the following command: ++ +[source,terminal] +---- +$ oc get nodes +---- ++ +[NOTE] +==== +On installer-provisioned infrastructure, the etcd Operator relies on the Machine API to manage the control plane and ensure etcd quorum. The Machine API then uses `Machine` CRs to represent and manage the underlying control plane nodes. +==== + +. Create the `BareMetalHost` and `Machine` CRs and link them to the `Node` CR of the control plane node. ++ +.. Create the `BareMetalHost` CR with a unique `.metadata.name` value as demonstrated in the following example: ++ +[source,yaml] +---- +apiVersion: metal3.io/v1alpha1 +kind: BareMetalHost +metadata: + name: node-5 + namespace: openshift-machine-api +spec: + automatedCleaningMode: metadata + bootMACAddress: 00:00:00:00:00:02 + bootMode: UEFI + customDeploy: + method: install_coreos + externallyProvisioned: true + online: true + userData: + name: master-user-data-managed + namespace: openshift-machine-api +# ... +---- ++ +.. Apply the `BareMetalHost` CR by entering the following command: ++ +[source,terminal] +---- +$ oc apply -f <1> +---- +<1> Replace with the name of the `BareMetalHost` CR. ++ +.. Create the `Machine` CR by using the unique `.metadata.name` value as demonstrated in the following example: ++ +[source,yaml] +---- +apiVersion: machine.openshift.io/v1beta1 +kind: Machine +metadata: + annotations: + machine.openshift.io/instance-state: externally provisioned + metal3.io/BareMetalHost: openshift-machine-api/node-5 + finalizers: + - machine.machine.openshift.io + labels: + machine.openshift.io/cluster-api-cluster: <1> + machine.openshift.io/cluster-api-machine-role: master + machine.openshift.io/cluster-api-machine-type: master + name: node-5 + namespace: openshift-machine-api +spec: + metadata: {} + providerSpec: + value: + apiVersion: baremetal.cluster.k8s.io/v1alpha1 + customDeploy: + method: install_coreos + hostSelector: {} + image: + checksum: "" + url: "" + kind: BareMetalMachineProviderSpec + metadata: + creationTimestamp: null + userData: + name: master-user-data-managed +# ... +---- +<1> Replace `` with the name of the specific cluster, for example, `test-day2-1-6qv96`. ++ +.. Get the cluster name by running the following command: ++ +[source,terminal] +---- +$ oc get infrastructure cluster -o=jsonpath='{.status.infrastructureName}{"\n"}' +---- ++ +.. Apply the `Machine` CR by entering the following command: ++ +[source,terminal] +---- +$ oc apply -f <1> +---- +<1> Replace `` with the name of the `Machine` CR. ++ +.. Link `BareMetalHost`, `Machine`, and `Node` objects by running the `link-machine-and-node.sh` script: ++ +... Copy the following `link-machine-and-node.sh` script to a local machine: ++ +[source,text] +---- +#!/bin/bash + +# Credit goes to +# https://bugzilla.redhat.com/show_bug.cgi?id=1801238. +# This script will link Machine object +# and Node object. This is needed +# in order to have IP address of +# the Node present in the status of the Machine. + +set -e + +machine="$1" +node="$2" + +if [ -z "$machine" ] || [ -z "$node" ]; then + echo "Usage: $0 MACHINE NODE" + exit 1 +fi + +node_name=$(echo "${node}" | cut -f2 -d':') + +oc proxy & +proxy_pid=$! +function kill_proxy { + kill $proxy_pid +} +trap kill_proxy EXIT SIGINT + +HOST_PROXY_API_PATH="http://localhost:8001/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts" + +function print_nics() { + local ips + local eob + declare -a ips + + readarray -t ips < <(echo "${1}" \ + | jq '.[] | select(. | .type == "InternalIP") | .address' \ + | sed 's/"//g') + + eob=',' + for (( i=0; i<${#ips[@]}; i++ )); do + if [ $((i+1)) -eq ${#ips[@]} ]; then + eob="" + fi + cat <<- EOF + { + "ip": "${ips[$i]}", + "mac": "00:00:00:00:00:00", + "model": "unknown", + "speedGbps": 10, + "vlanId": 0, + "pxe": true, + "name": "eth1" + }${eob} +EOF + done +} + +function wait_for_json() { + local name + local url + local curl_opts + local timeout + + local start_time + local curr_time + local time_diff + + name="$1" + url="$2" + timeout="$3" + shift 3 + curl_opts="$@" + echo -n "Waiting for $name to respond" + start_time=$(date +%s) + until curl -g -X GET "$url" "${curl_opts[@]}" 2> /dev/null | jq '.' 2> /dev/null > /dev/null; do + echo -n "." + curr_time=$(date +%s) + time_diff=$((curr_time - start_time)) + if [[ $time_diff -gt $timeout ]]; then + printf '\nTimed out waiting for %s' "${name}" + return 1 + fi + sleep 5 + done + echo " Success!" + return 0 +} +wait_for_json oc_proxy "${HOST_PROXY_API_PATH}" 10 -H "Accept: application/json" -H "Content-Type: application/json" + +addresses=$(oc get node -n openshift-machine-api "${node_name}" -o json | jq -c '.status.addresses') + +machine_data=$(oc get machines.machine.openshift.io -n openshift-machine-api -o json "${machine}") +host=$(echo "$machine_data" | jq '.metadata.annotations["metal3.io/BareMetalHost"]' | cut -f2 -d/ | sed 's/"//g') + +if [ -z "$host" ]; then + echo "Machine $machine is not linked to a host yet." 1>&2 + exit 1 +fi + +# The address structure on the host doesn't match the node, so extract +# the values we want into separate variables so we can build the patch +# we need. +hostname=$(echo "${addresses}" | jq '.[] | select(. | .type == "Hostname") | .address' | sed 's/"//g') + +set +e +read -r -d '' host_patch << EOF +{ + "status": { + "hardware": { + "hostname": "${hostname}", + "nics": [ +$(print_nics "${addresses}") + ], + "systemVendor": { + "manufacturer": "Red Hat", + "productName": "product name", + "serialNumber": "" + }, + "firmware": { + "bios": { + "date": "04/01/2014", + "vendor": "SeaBIOS", + "version": "1.11.0-2.el7" + } + }, + "ramMebibytes": 0, + "storage": [], + "cpu": { + "arch": "x86_64", + "model": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz", + "clockMegahertz": 2199.998, + "count": 4, + "flags": [] + } + } + } +} +EOF +set -e + +echo "PATCHING HOST" +echo "${host_patch}" | jq . + +curl -s \ + -X PATCH \ + "${HOST_PROXY_API_PATH}/${host}/status" \ + -H "Content-type: application/merge-patch+json" \ + -d "${host_patch}" + +oc get baremetalhost -n openshift-machine-api -o yaml "${host}" +---- ++ +... Make the script executable by entering the following command: ++ +[source,terminal] +---- +$ chmod +x link-machine-and-node.sh +---- ++ +... Run the script by entering the following command: ++ +[source,terminal] +---- +$ bash link-machine-and-node.sh node-5 node-5 +---- ++ +[NOTE] +==== +The first `node-5` instance represents the machine, and the second instance represents the node. +==== + +.Verification + +. Confirm members of etcd by executing into one of the pre-existing control plane nodes: ++ +.. Open a remote shell session to the control plane node by entering the following command: ++ +[source,terminal] +---- +$ oc rsh -n openshift-etcd etcd-node-0 +---- ++ +.. List etcd members: ++ +[source,terminal] +---- +# etcdctl member list -w table +---- + +. Check the etcd Operator configuration process until completion by entering the following command. Expected output shows `False` under the `PROGRESSING` column. ++ +[source,terminal] +---- +$ oc get clusteroperator etcd +---- + +. Confirm etcd health by running the following commands: ++ +.. Open a remote shell session to the control plane node: ++ +[source,terminal] +---- +$ oc rsh -n openshift-etcd etcd-node-0 +---- ++ +.. Check endpoint health. Expected output shows `is healthy` for the endpoint. ++ +[source,terminal] +---- +# etcdctl endpoint health +---- + +. Verify that all nodes are ready by entering the following command. The expected output shows the `Ready` status beside each node entry. ++ +[source,terminal] +---- +$ oc get nodes +---- + +. Verify that the cluster Operators are all available by entering the following command. Expected output lists each Operator and shows the available status as `True` beside each listed Operator. ++ +[source,terminal] +---- +$ oc get ClusterOperators +---- + +. Verify that the cluster version is correct by entering the following command: ++ +[source,terminal] +---- +$ oc get ClusterVersion +---- ++ +.Example output +[source,terminal,subs="attributes+"]] +---- +NAME VERSION AVAILABLE PROGRESSING SINCE STATUS +version {product-title}.5 True False 5h57m Cluster version is {product-title}.5 +---- diff --git a/post_installation_configuration/cluster-tasks.adoc b/post_installation_configuration/cluster-tasks.adoc index 1d0c692fbe9b..e75560c7db8e 100644 --- a/post_installation_configuration/cluster-tasks.adoc +++ b/post_installation_configuration/cluster-tasks.adoc @@ -18,12 +18,9 @@ You complete most of the cluster configuration and customization after you deplo If you install your cluster on {ibm-z-name}, not all features and functions are available. ==== -You modify the configuration resources to configure the major features of the -cluster, such as the image registry, networking configuration, image build -behavior, and the identity provider. +You modify the configuration resources to configure the major features of the cluster, such as the image registry, networking configuration, image build behavior, and the identity provider. -For current documentation of the settings that you control by using these resources, use -the `oc explain` command, for example `oc explain builds --api-version=config.openshift.io/v1` +For current documentation of the settings that you control by using these resources, use the `oc explain` command, for example `oc explain builds --api-version=config.openshift.io/v1` [id="configuration-resources_{context}"] === Cluster configuration resources @@ -236,6 +233,9 @@ include::modules/nodes-cluster-worker-latency-profiles-using.adoc[leveloffset=+2 xref:../machine_management/control_plane_machine_management/cpmso-about.adoc#cpmso-about[Control plane machine sets] provide management capabilities for control plane machines that are similar to what compute machine sets provide for compute machines. The availability and initial status of control plane machine sets on your cluster depend on your cloud provider and the version of {product-title} that you installed. For more information, see xref:../machine_management/control_plane_machine_management/cpmso-getting-started.adoc#cpmso-getting-started[Getting started with control plane machine sets]. +// Adding a control plane node to your cluster +include::modules/creating-control-plane-node.adoc[leveloffset=+2] + [id="post-install-creating-infrastructure-machinesets-production"] == Creating infrastructure machine sets for production environments