Skip to content

Commit

Permalink
Merge pull request #8294 from rwsu/AGENT-863
Browse files Browse the repository at this point in the history
AGENT-906: Script to run monitor-add-nodes in cluster
  • Loading branch information
openshift-merge-bot[bot] committed May 9, 2024
2 parents d9a10f0 + cc833e9 commit bbca50f
Show file tree
Hide file tree
Showing 2 changed files with 173 additions and 4 deletions.
62 changes: 58 additions & 4 deletions docs/user/agent/add-node/add-nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ hosts:
```

## ISO generation
Run the [node-joiner.sh](./node-joiner.sh):
Run [node-joiner.sh](./node-joiner.sh):
```bash
$ ./node-joiner.sh
```
Expand All @@ -87,11 +87,12 @@ $ ./node-joiner.sh config.yaml
Use the iso image to boot all the nodes listed in the configuration file, and wait for the related
certificate signing requests (CSRs) to appear. When adding a new node to the cluster, two pending CSRs will
be generated, and they must be manually approved by the user.
Use the following command to monitor the pending certificates:

Use the following command or [node-joiner-monitor.sh](./node-joiner-monitor.sh) described below to monitor the pending certificates:
```
$ oc get csr
```
User the `oc` `approve` command to approve them:
Use the `oc` `approve` command to approve them:
```
$ oc adm certificate approve <csr_name>
```
Expand All @@ -103,4 +104,57 @@ extra-worker-0 Ready worker 1h v1.29.3+8628c3c
master-0 Ready control-plane,master 31h v1.29.3+8628c3c
master-1 Ready control-plane,master 32h v1.29.3+8628c3c
master-2 Ready control-plane,master 32h v1.29.3+8628c3c
```
```

# Monitoring
After a node is booted using the ISO image, progress can be monitored using the node-joiner-monitor.sh script.

Download the [node-joiner-monitor.sh](./node-joiner-monitor.sh) script to a local directory.

The script requires the IP address of the node to monitor.

Run [node-joiner-monitor.sh](./node-joiner-monitor.sh):
```bash
$ ./node-joiner-monitor.sh 192.168.111.90
```

The script will execute a command to monitor the node using a temporary namespace with
prefix `openshift-node-joiner-monitor` in the target cluster. The output of this command
is printed out to stdout.

The script shows useful information about the node as it joins the cluster.
* Pre-flight validations. In case the node does not pass one or more validations, the installation will not start. The output of the failed validations are reported to allow users to fix the problem(s) when required.
* Installation progress indicating the current stage is shown. For example, writing of the image to disk, and initial reboot are reported.
* CSRs requiring the user's approval are shown.

The script exits either after the node has joined the cluster and is in ready state or after 90 minutes have elapsed.

Sample monitoring output:
```
INFO[2024-04-29T22:45:39-04:00] Monitoring IPs: [192.168.111.90]
INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Assisted Service API is available
INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Cluster is adding hosts
INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Updated image information (Image type is "full-iso", SSH public key is set)
INFO[2024-04-29T22:48:22-04:00] Node 192.168.111.90: Host ca241aa5-4f86-42bf-95a3-6b7ab7d4d66a: Successfully registered
WARNING[2024-04-29T22:48:32-04:00] Node 192.168.111.90: Host couldn't synchronize with any NTP server
WARNING[2024-04-29T22:48:32-04:00] Node 192.168.111.90: Host extraworker-0: updated status from discovering to insufficient (Host does not meet the minimum hardware requirements: Host couldn't synchronize with any NTP server)
INFO[2024-04-29T22:49:28-04:00] Node 192.168.111.90: Host extraworker-0: updated status from known to installing (Installation is in progress)
INFO[2024-04-29T22:50:28-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 5%
INFO[2024-04-29T22:50:33-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 16%
INFO[2024-04-29T22:50:38-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 28%
INFO[2024-04-29T22:50:43-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 40%
INFO[2024-04-29T22:50:48-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 51%
INFO[2024-04-29T22:50:53-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 67%
INFO[2024-04-29T22:50:58-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 77%
INFO[2024-04-29T22:51:03-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 88%
INFO[2024-04-29T22:51:08-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 93%
INFO[2024-04-29T22:51:13-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Rebooting
INFO[2024-04-29T22:56:35-04:00] Node 192.168.111.90: Kubelet is running
INFO[2024-04-29T22:56:45-04:00] Node 192.168.111.90: First CSR Pending approval
INFO[2024-04-29T22:56:45-04:00] CSR csr-257ms with signerName kubernetes.io/kube-apiserver-client-kubelet and username system:serviceaccount:openshift-machine-config-operator:node-bootstrapper is Pending and awaiting approval
INFO[2024-04-29T22:58:50-04:00] Node 192.168.111.90: Second CSR Pending approval
INFO[2024-04-29T22:58:50-04:00] CSR csr-tc8xt with signerName kubernetes.io/kubelet-serving and username system:node:extraworker-0 is Pending and awaiting approval
INFO[2024-04-29T22:58:50-04:00] Node 192.168.111.90: Node joined cluster
INFO[2024-04-29T23:00:00-04:00] Node 192.168.111.90: Node is Ready
```

115 changes: 115 additions & 0 deletions docs/user/agent/add-node/node-joiner-monitor.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
#!/bin/bash

set -eu

if [ $# -eq 0 ]; then
echo "At least one IP address must be provided"
exit 1
fi

ipAddresses=$@

# Setup a cleanup function to ensure to remove the temporary
# file when the script will be completed.
cleanup() {
if [ -f "$pullSecretFile" ]; then
echo "Removing temporary file $pullSecretFile"
rm "$pullSecretFile"
fi
}
trap cleanup EXIT TERM

# Retrieve the pullsecret and store it in a temporary file.
pullSecretFile=$(mktemp -p "/tmp" -t "nodejoiner-XXXXXXXXXX")
oc get secret -n openshift-config pull-secret -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d > "$pullSecretFile"

# Extract the baremetal-installer image pullspec from the current cluster.
nodeJoinerPullspec=$(oc adm release info --image-for=baremetal-installer --registry-config="$pullSecretFile")

# Use the same random temp file suffix for the namespace.
namespace=$(echo "openshift-node-joiner-${pullSecretFile#/tmp/nodejoiner-}" | tr '[:upper:]' '[:lower:]')

# Create the namespace to run the node-joiner-monitor, along with the required roles and bindings.
staticResources=$(cat <<EOF
apiVersion: v1
kind: Namespace
metadata:
name: ${namespace}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: node-joiner-monitor
namespace: ${namespace}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: node-joiner-monitor
rules:
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests
verbs:
- get
- list
- apiGroups:
- ""
resources:
- pods
- nodes
verbs:
- get
- list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: node-joiner-monitor
subjects:
- kind: ServiceAccount
name: node-joiner-monitor
namespace: ${namespace}
roleRef:
kind: ClusterRole
name: node-joiner-monitor
apiGroup: rbac.authorization.k8s.io
EOF
)
echo "$staticResources" | oc apply -f -

# Run the node-joiner-monitor to monitor node joining cluster
nodeJoinerPod=$(cat <<EOF
apiVersion: v1
kind: Pod
metadata:
name: node-joiner-monitor
namespace: ${namespace}
annotations:
openshift.io/scc: anyuid
labels:
app: node-joiner-monitor
spec:
restartPolicy: Never
serviceAccountName: node-joiner-monitor
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: node-joiner-monitor
imagePullPolicy: IfNotPresent
image: $nodeJoinerPullspec
command: ["/bin/sh", "-c", "node-joiner monitor-add-nodes $ipAddresses --log-level=info; sleep 5"]
EOF
)
echo "$nodeJoinerPod" | oc apply -f -

oc project "${namespace}"

oc wait --for=condition=Ready=true --timeout=300s pod/node-joiner-monitor

oc logs -f -n "${namespace}" node-joiner-monitor

echo "Cleaning up"
oc delete namespace "${namespace}" --grace-period=0 >/dev/null 2>&1 &

0 comments on commit bbca50f

Please sign in to comment.