Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AGENT-864: New make target to add day2 node with agent #1654

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ agent_gather:
agent_tests:
./agent/agent_tests.sh

agent_add_node:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a separate step, since it will be a useful way to develop/test locally the node-joiner. From an high level point, we should aim to support two different scenarios:

  1. Dev/Local testing
    In this scenario, the dev first runs a make agent to deploy a cluster, then a make agent_add_node multiple times while developing. For this reason, the step should be:
  • If there are are extra workers already added as nodes, drain/cordon them
  • Stop the extra workers vm and cleanup their disks
  • Rebuild the baremetal-installer image (step 4) and update it in the local registry
  • Re-run the portion of code that adds the etraworker
  1. CI jobs
    From a CI point of view, we should have a way to invoke the agent_add_node step when required. We could either override the DEVSCRIPTS_TARGET in a agent workflow (ie here), or add always this step in the make agent and exclude it via a config var. Personally I'd be more inclined to the former option

Now, since the reference card is targeting the second scenario (CI jobs), and since the first one may require more additional work not strictly needed within this sprint, I think it would be better to have a new card for scenario 1 in the carry over epic AGENT-896 and limit this one just to the second scenario. This should also help in simplify the code

./agent/07_agent_add_node.sh

redeploy: ocp_cleanup ironic_cleanup build_installer ironic install_config ocp_run bell

requirements:
Expand Down
92 changes: 9 additions & 83 deletions agent/05_agent_configure.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,89 +17,6 @@ early_deploy_validation

export CLUSTER_NAMESPACE=${CLUSTER_NAMESPACE:-"cluster0"}

function add_ip_host_entry {
ip=${1}
hostname=${2}

echo "${ip} ${hostname}">>"${OCP_DIR}"/hosts
}

function add_dns_entry {
ip=${1}
hostname=${2}

# Add a DNS entry for this hostname if it's not already defined
if ! $(sudo virsh net-dumpxml ${BAREMETAL_NETWORK_NAME} | xmllint --xpath "//dns/host[@ip = '${ip}']" - &> /dev/null); then
sudo virsh net-update ${BAREMETAL_NETWORK_NAME} add dns-host "<host ip='${ip}'> <hostname>${hostname}</hostname> </host>" --live --config
fi

# Add entries to etc/hosts for SNO IPV6 to sucessfully run the openshift conformance tests
if [[ $NUM_MASTERS == 1 && $IP_STACK == "v6" ]]; then
AGENT_NODE0_IPSV6=${ip}
echo "${ip} console-openshift-console.apps.${CLUSTER_DOMAIN}" | sudo tee -a /etc/hosts
echo "${ip} oauth-openshift.apps.${CLUSTER_DOMAIN}" | sudo tee -a /etc/hosts
echo "${ip} thanos-querier-openshift-monitoring.apps.${CLUSTER_DOMAIN}" | sudo tee -a /etc/hosts
fi
}

function get_static_ips_and_macs() {

AGENT_NODES_IPS=()
AGENT_NODES_IPSV6=()
AGENT_NODES_MACS=()
AGENT_NODES_HOSTNAMES=()

if [[ "$AGENT_STATIC_IP_NODE0_ONLY" = "true" ]]; then
static_ips=1
else
static_ips=$NUM_MASTERS+$NUM_WORKERS
fi

if [[ $NETWORKING_MODE == "DHCP" ]]; then
base_ip=20
else
# Set outside the range used for dhcp
base_ip=80
fi

for (( i=0; i<${static_ips}; i++ ))
do
if [[ $i < $NUM_MASTERS ]]; then
AGENT_NODES_HOSTNAMES+=($(printf ${MASTER_HOSTNAME_FORMAT} ${i}))
cluster_name=${CLUSTER_NAME}_master_${i}
else
worker_num=$((${i}-$NUM_MASTERS))
AGENT_NODES_HOSTNAMES+=($(printf ${WORKER_HOSTNAME_FORMAT} ${worker_num}))
cluster_name=${CLUSTER_NAME}_worker_${worker_num}
fi

ip=${base_ip}+${i}
if [[ "$IP_STACK" = "v4" ]]; then
AGENT_NODES_IPS+=($(nth_ip ${EXTERNAL_SUBNET_V4} ${ip}))
add_dns_entry ${AGENT_NODES_IPS[i]} ${AGENT_NODES_HOSTNAMES[i]}
add_ip_host_entry ${AGENT_NODES_IPS[i]} ${AGENT_NODES_HOSTNAMES[i]}
elif [[ "$IP_STACK" = "v6" ]]; then
AGENT_NODES_IPSV6+=($(nth_ip ${EXTERNAL_SUBNET_V6} ${ip}))
add_dns_entry ${AGENT_NODES_IPSV6[i]} ${AGENT_NODES_HOSTNAMES[i]}
add_ip_host_entry ${AGENT_NODES_IPSV6[i]} ${AGENT_NODES_HOSTNAMES[i]}
else
# v4v6
AGENT_NODES_IPS+=($(nth_ip ${EXTERNAL_SUBNET_V4} ${ip}))
AGENT_NODES_IPSV6+=($(nth_ip $EXTERNAL_SUBNET_V6 ${ip}))
add_dns_entry ${AGENT_NODES_IPS[i]} ${AGENT_NODES_HOSTNAMES[i]}
add_ip_host_entry ${AGENT_NODES_IPS[i]} ${AGENT_NODES_HOSTNAMES[i]}
add_dns_entry ${AGENT_NODES_IPSV6[i]} ${AGENT_NODES_HOSTNAMES[i]}
fi

# Get the generated mac addresses
AGENT_NODES_MACS+=($(sudo virsh dumpxml $cluster_name | xmllint --xpath "string(//interface[descendant::source[@bridge = '${BAREMETAL_NETWORK_NAME}']]/mac/@address)" -))
if [[ ! -z "${BOND_PRIMARY_INTERFACE:-}" ]]; then
# For a bond, a random mac is added for the 2nd interface
AGENT_NODES_MACS+=($(sudo virsh domiflist ${cluster_name} | grep ${BAREMETAL_NETWORK_NAME} | grep -v ${AGENT_NODES_MACS[-1]} | awk '{print $5}'))
fi
done
}

function generate_extra_cluster_manifests() {

mkdir -p ${EXTRA_MANIFESTS_PATH}
Expand Down Expand Up @@ -230,6 +147,15 @@ function generate_cluster_manifests() {
nodes_hostnames=$(printf '%s,' "${AGENT_NODES_HOSTNAMES[@]}")
export AGENT_NODES_HOSTNAMES_STR=${nodes_hostnames::-1}

extra_workers_ips=$(printf '%s,' "${AGENT_EXTRA_WORKERS_IPS[@]}")
export AGENT_EXTRA_WORKERS_IPS_STR=${extra_workers_ips::-1}
extra_workers_ipsv6=$(printf '%s,' "${AGENT_EXTRA_WORKERS_IPSV6[@]}")
export AGENT_EXTRA_WORKERS_IPSV6_STR=${extra_workers_ipsv6::-1}
extra_workers_macs=$(printf '%s,' "${AGENT_EXTRA_WORKERS_MACS[@]}")
export AGENT_EXTRA_WORKERS_MACS_STR=${extra_workers_macs::-1}
extra_workers_hostnames=$(printf '%s,' "${AGENT_EXTRA_WORKERS_HOSTNAMES[@]}")
export AGENT_EXTRA_WORKERS_HOSTNAMES_STR=${extra_workers_hostnames::-1}

if [[ "${NUM_MASTERS}" > "1" ]]; then
export API_VIPS=${API_VIPS}
export INGRESS_VIPS=${INGRESS_VIPS}
Expand Down
72 changes: 72 additions & 0 deletions agent/07_agent_add_node.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#!/usr/bin/env bash
set -euxo pipefail

SCRIPTDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/.." && pwd )"

source $SCRIPTDIR/common.sh
source $SCRIPTDIR/network.sh
source $SCRIPTDIR/ocp_install_env.sh
source $SCRIPTDIR/utils.sh
source $SCRIPTDIR/validation.sh
source $SCRIPTDIR/agent/common.sh

early_deploy_validation

function build_node_joiner() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the previous comment, the build function for the node-joiner can be removed one and addressed in a separate card.

Note for the future development: the build_local_release function in step 4 already contains the logic for properly building/pushing a local image, it would be useful to extract it and reuse it when a single image update is required

# Build installer
pushd .
cd $OPENSHIFT_INSTALL_PATH
TAGS="${OPENSHIFT_INSTALLER_BUILD_TAGS:-libvirt baremetal}" DEFAULT_ARCH=$(get_arch) hack/build-node-joiner.sh
popd
}

function approve_csrs() {
while true; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this will be run in a CI job, it would be useful also to have a timeout condition (ie 10mins), to avoid relying on the CI timeout (usually 4h) in case of an issue during the approval

pending_csrs=$(oc get csr | grep Pending)
if [[ ${pending_csrs} != "" ]]; then
echo "Approving CSRs: $pending_csrs"
echo $pending_csrs | cut -d ' ' -f 1 | xargs oc adm certificate approve
fi
sleep 10
done
}

if [ -z "$KNI_INSTALL_FROM_GIT" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested before, we don't need now this part for local rebuild, so it could be removed

# Extract node-joiner from the release image
baremetal_installer_image=$(oc adm release info --image-for=baremetal-installer --registry-config=$PULL_SECRET_FILE)
container_id=$(podman create --authfile $PULL_SECRET_FILE ${baremetal_installer_image} ls)
podman start $container_id
podman export $container_id > baremetal-installer.tar
tar xf baremetal-installer.tar usr/bin/node-joiner
cp usr/bin/node-joiner $OCP_DIR/node-joiner
rm -rf usr baremetal-installer.tar
podman rm $container_id
else
# Build the node-joiner from source
build_node_joiner
cp "$OPENSHIFT_INSTALL_PATH/bin/node-joiner" "$OCP_DIR"
fi

rm $OCP_DIR/add-node/.openshift_install_state.json | true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should never be required. When running transient commands, the ${WORKING_DIR} should be used as a base dir (and cleaned up at the end)


get_static_ips_and_macs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments below


node_joiner="$(realpath "${OCP_DIR}/node-joiner")"

$node_joiner add-nodes --dir $OCP_DIR/add-node --kubeconfig $OCP_DIR/auth/kubeconfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary test interface must be the node-joiner.sh script. What we really need here it's just to download it and run it - ie like a real user would do.


sudo virt-xml ${CLUSTER_NAME}_extraworker_${AGENT_EXTRAWORKER_NODE_TO_ADD} --add-device --disk "${OCP_DIR}/add-node/node.x86_64.iso,device=cdrom,target.dev=sdc"
sudo virt-xml ${CLUSTER_NAME}_extraworker_${AGENT_EXTRAWORKER_NODE_TO_ADD} --edit target=sda --disk="boot_order=1"
sudo virt-xml ${CLUSTER_NAME}_extraworker_${AGENT_EXTRAWORKER_NODE_TO_ADD} --edit target=sdc --disk="boot_order=2" --start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this part we currently have a small tech debt that should be addressed, and very likely this sounds as the right occasion to tackle it. The concept of EXTRA_WORKER is really to add a worker node like the others but not to deploy it (because later someone else will take care of it, like in our case).
Generally speaking, this means that whenever a WORKER is managed in the code, the same should be applied for the EXTRA_WORKER currently defined in the config_<user>.sh file. From the agen flow point of view, this means that the EXTRA_WORKER should be included as part of the step 4 and 5 where required.
This step 7 instead should only take care of:

  1. Generating the desired nodes-config.yaml
  2. Run the node-joiner.sh script (by reusing the current baremetal-installer image, either local or remote)
  3. Mount the node ISO in all the extra workers currently defined and start them
  4. Run the monitor script and wait until done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically to these lines of code, probably it could be worth to modify this portion of code to take care also of the extraworkers, if defined. The only exception is that the VMs for the extra workers must not be started, so the various functions may required an additional param for that. We do not require to address the pxe / appliance cases now, but it could be useful to prepare the ground for the future developments


set +ex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a small comment could be useful here

approve_csrs &
approve_csrs_pid=$!
trap 'kill -TERM ${approve_csrs_pid}; exit' INT EXIT TERM
set -ex

$node_joiner monitor-add-nodes ${AGENT_EXTRA_WORKERS_IPS[${AGENT_EXTRAWORKER_NODE_TO_ADD}]} --kubeconfig $OCP_DIR/auth/kubeconfig
Copy link
Member

@andfasano andfasano May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the previous case, also here the script execution should be used (when available)


oc get nodes extraworker-${AGENT_EXTRAWORKER_NODE_TO_ADD} | grep Ready
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this check, I'd expect that the monitor script will report the proper exit code (forcing then the current script to fail)


kill -TERM ${approve_csrs_pid}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed? The trap should already take care of that in any case

123 changes: 123 additions & 0 deletions agent/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,132 @@ function getReleaseImage() {
echo ${releaseImage}
}

# Functions used to determine IP addresses, MACs, and hostnames
function add_ip_host_entry {
ip=${1}
hostname=${2}

echo "${ip} ${hostname}">>"${OCP_DIR}"/hosts
}

function add_dns_entry {
ip=${1}
hostname=${2}

# Add a DNS entry for this hostname if it's not already defined
if ! $(sudo virsh net-dumpxml ${BAREMETAL_NETWORK_NAME} | xmllint --xpath "//dns/host[@ip = '${ip}']" - &> /dev/null); then
sudo virsh net-update ${BAREMETAL_NETWORK_NAME} add dns-host "<host ip='${ip}'> <hostname>${hostname}</hostname> </host>" --live --config
fi

# Add entries to etc/hosts for SNO IPV6 to sucessfully run the openshift conformance tests
if [[ $NUM_MASTERS == 1 && $IP_STACK == "v6" ]]; then
AGENT_NODE0_IPSV6=${ip}
echo "${ip} console-openshift-console.apps.${CLUSTER_DOMAIN}" | sudo tee -a /etc/hosts
echo "${ip} oauth-openshift.apps.${CLUSTER_DOMAIN}" | sudo tee -a /etc/hosts
echo "${ip} thanos-querier-openshift-monitoring.apps.${CLUSTER_DOMAIN}" | sudo tee -a /etc/hosts
fi
}

function get_static_ips_and_macs() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, I'd prefer to keep these functions in the step 5, since they are not really generic, and address locally the extra workers where required


AGENT_NODES_IPS=()
AGENT_NODES_IPSV6=()
AGENT_NODES_MACS=()
AGENT_NODES_HOSTNAMES=()

if [[ "$AGENT_STATIC_IP_NODE0_ONLY" = "true" ]]; then
static_ips=1
else
static_ips=$NUM_MASTERS+$NUM_WORKERS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static_ips=$NUM_MASTERS+$NUM_WORKERS
static_ips=$NUM_MASTERS+$NUM_WORKERS+$NUM_EXTRA_WORKERS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below

fi

if [[ $NETWORKING_MODE == "DHCP" ]]; then
base_ip=20
else
# Set outside the range used for dhcp
base_ip=80
fi

for (( i=0; i<${static_ips}; i++ ))
do
if [[ $i < $NUM_MASTERS ]]; then
AGENT_NODES_HOSTNAMES+=($(printf ${MASTER_HOSTNAME_FORMAT} ${i}))
cluster_name=${CLUSTER_NAME}_master_${i}
else
worker_num=$((${i}-$NUM_MASTERS))
AGENT_NODES_HOSTNAMES+=($(printf ${WORKER_HOSTNAME_FORMAT} ${worker_num}))
cluster_name=${CLUSTER_NAME}_worker_${worker_num}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point for managing the extra worker.

fi

ip=${base_ip}+${i}
if [[ "$IP_STACK" = "v4" ]]; then
AGENT_NODES_IPS+=($(nth_ip ${EXTERNAL_SUBNET_V4} ${ip}))
add_dns_entry ${AGENT_NODES_IPS[i]} ${AGENT_NODES_HOSTNAMES[i]}
add_ip_host_entry ${AGENT_NODES_IPS[i]} ${AGENT_NODES_HOSTNAMES[i]}
elif [[ "$IP_STACK" = "v6" ]]; then
AGENT_NODES_IPSV6+=($(nth_ip ${EXTERNAL_SUBNET_V6} ${ip}))
add_dns_entry ${AGENT_NODES_IPSV6[i]} ${AGENT_NODES_HOSTNAMES[i]}
add_ip_host_entry ${AGENT_NODES_IPSV6[i]} ${AGENT_NODES_HOSTNAMES[i]}
else
# v4v6
AGENT_NODES_IPS+=($(nth_ip ${EXTERNAL_SUBNET_V4} ${ip}))
AGENT_NODES_IPSV6+=($(nth_ip $EXTERNAL_SUBNET_V6 ${ip}))
add_dns_entry ${AGENT_NODES_IPS[i]} ${AGENT_NODES_HOSTNAMES[i]}
add_ip_host_entry ${AGENT_NODES_IPS[i]} ${AGENT_NODES_HOSTNAMES[i]}
add_dns_entry ${AGENT_NODES_IPSV6[i]} ${AGENT_NODES_HOSTNAMES[i]}
fi

# Get the generated mac addresses
AGENT_NODES_MACS+=($(sudo virsh dumpxml $cluster_name | xmllint --xpath "string(//interface[descendant::source[@bridge = '${BAREMETAL_NETWORK_NAME}']]/mac/@address)" -))
if [[ ! -z "${BOND_PRIMARY_INTERFACE:-}" ]]; then
# For a bond, a random mac is added for the 2nd interface
AGENT_NODES_MACS+=($(sudo virsh domiflist ${cluster_name} | grep ${BAREMETAL_NETWORK_NAME} | grep -v ${AGENT_NODES_MACS[-1]} | awk '{print $5}'))
fi
done

AGENT_EXTRA_WORKERS_IPS=()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really a lot of duplication that I'd prefer to not have. An extra worker it's really just like an ordinary node, but not effectively deployed, and it should be managed in an agnostic way in the code where possible.

The previous suggestions are meant to accomodate that vision, by incorporating the extra workers in the loop that is responsible for managing the nodes. In general I'm not against into having dedicated AGENT_EXTRA_WORKERS_* vars, in addition to the AGENT_NODES_* one, since I understand it may be useful to discriminate the two sets for the sake of template generation, but without having the code duplication

AGENT_EXTRA_WORKERS_IPSV6=()
AGENT_EXTRA_WORKERS_MACS=()
AGENT_EXTRA_WORKERS_HOSTNAMES=()
extra_workers=$NUM_EXTRA_WORKERS
for (( i=0; i<${extra_workers}; i++ ))
do
AGENT_EXTRA_WORKERS_HOSTNAMES+=($(printf ${EXTRA_WORKER_HOSTNAME_FORMAT} ${i}))
cluster_name=${CLUSTER_NAME}_extraworker_${i}


ip=${base_ip}+${static_ips}+${i}
if [[ "$IP_STACK" = "v4" ]]; then
AGENT_EXTRA_WORKERS_IPS+=($(nth_ip ${EXTERNAL_SUBNET_V4} ${ip}))
add_dns_entry ${AGENT_EXTRA_WORKERS_IPS[i]} ${AGENT_EXTRA_WORKERS_HOSTNAMES[i]}
add_ip_host_entry ${AGENT_EXTRA_WORKERS_IPS[i]} ${AGENT_EXTRA_WORKERS_HOSTNAMES[i]}
elif [[ "$IP_STACK" = "v6" ]]; then
AGENT_EXTRA_WORKERS_IPSV6+=($(nth_ip ${EXTERNAL_SUBNET_V6} ${ip}))
add_dns_entry ${AGENT_EXTRA_WORKERS_IPSV6[i]} ${AGENT_EXTRA_WORKERS_HOSTNAMES[i]}
add_ip_host_entry ${AGENT_EXTRA_WORKERS_IPSV6[i]} ${AGENT_EXTRA_WORKERS_HOSTNAMES[i]}
else
# v4v6
AGENT_EXTRA_WORKERS_IPS+=($(nth_ip ${EXTERNAL_SUBNET_V4} ${ip}))
AGENT_EXTRA_WORKERS_IPSV6+=($(nth_ip $EXTERNAL_SUBNET_V6 ${ip}))
add_dns_entry ${AGENT_EXTRA_WORKERS_IPS[i]} ${AGENT_EXTRA_WORKERS_HOSTNAMES[i]}
add_ip_host_entry ${AGENT_EXTRA_WORKERS_IPS[i]} ${AGENT_EXTRA_WORKERS_HOSTNAMES[i]}
add_dns_entry ${AGENT_EXTRA_WORKERS_IPSV6[i]} ${AGENT_EXTRA_WORKERS_HOSTNAMES[i]}
fi

# Get the generated mac addresses
AGENT_EXTRA_WORKERS_MACS+=($(sudo virsh dumpxml $cluster_name | xmllint --xpath "string(//interface[descendant::source[@bridge = '${BAREMETAL_NETWORK_NAME}']]/mac/@address)" -))
if [[ ! -z "${BOND_PRIMARY_INTERFACE:-}" ]]; then
# For a bond, a random mac is added for the 2nd interface
AGENT_EXTRA_WORKERS_MACS+=($(sudo virsh domiflist ${cluster_name} | grep ${BAREMETAL_NETWORK_NAME} | grep -v ${AGENT_NODES_MACS[-1]} | awk '{print $5}'))
fi
done
}

# External load balancer configuration.
# The following ports are opened in firewalld so that libvirt VMs can communicate with haproxy.
export MACHINE_CONFIG_SERVER_PORT=22623
export KUBE_API_PORT=6443
export INGRESS_ROUTER_PORT=443
export AGENT_NODE0_IPSV6=${AGENT_NODE0_IPSV6:-}

export AGENT_EXTRAWORKER_NODE_TO_ADD=${AGENT_EXTRAWORKER_NODE_TO_ADD:-"0"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the previous comment, this should not be required

52 changes: 52 additions & 0 deletions agent/docs/add-nodes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Add nodes (day 2)

Nodes can be added after installation, using the "agent_add_node" make target.

To add a node, the cluster must be created with extra worker nodes using
the NUM_EXTRA_WORKERS environment variable. This will configure VMs for extra
worker nodes, but not include them in the initial cluster installation.

For example,

```
export NUM_EXTRA_WORKERS=2
```

will create two extra worker nodes, named extraworker-0 and extraworker-1.

The extra worker nodes must have a minimum disk size of 100GB. This can be specified
using:

```
export EXTRA_WORKER_DISK=100
```

After the cluster has been installed. To add both nodes to the cluster, run
these commands:

```
AGENT_EXTRAWORKER_NODE_TO_ADD=0 make agent_add_node
AGENT_EXTRAWORKER_NODE_TO_ADD=1 make agent_add_node
```

The nodes will be sequentially added to the cluster.

# agent_add_node target

To add a node to the cluster, the "agent_add_node" target performs the same
steps a user would need to perform to add a node to the cluster using [the
node-joiner tool](https://github.com/openshift/installer/blob/master/docs/user/agent/add-node/add-nodes.md).

Steps:

* Downloads and executes the [node-joiner.sh](https://github.com/openshift/installer/blob/master/docs/user/agent/add-node/node-joiner.sh) script to generate the ISO (node.x86_64.iso) that can be used to join a
node to the cluster. This script requires an input file, nodes-config.yaml, which describes
the node being added. This input file is automatically generated by "agent_add_node" provided
the cluster was created with extra worker nodes.

* Boots the extra worker node using the generated ISO.

* Downloads and executes the [node-joiner-monitor.sh](https://github.com/openshift/installer/blob/master/docs/user/agent/add-node/node-joiner-monitor.sh) script to show progress. The monitor script also displays
CSRs that will need to be approved for the node to join the cluster.

* A user would then approve each CSR using "oc adm certificate approve" or equivalent. This target creates a background process that auto approves any CSRs.
12 changes: 12 additions & 0 deletions agent/roles/manifests/tasks/install-config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,15 @@
src: "agent-config_bond_yaml.j2"
dest: "{{ install_path }}/agent-config.yaml"
when: agent_bond_config != 'none'

- name: create add-node directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not really part of the install-config, it must be placed into its own task file related to add nodes

ansible.builtin.file:
path: "{{ install_path }}/add-node"
state: directory
when: num_extra_workers != "0"

- name: write the nodes-config.yaml
template:
src: "nodes-config_yaml.j2"
dest: "{{ install_path }}/add-node/nodes-config.yaml"
when: agent_bond_config == 'none' and num_extra_workers != "0"