-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Create GS baremetal Agent-Based Install steps, chain, and reference health workflow #78959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| approvers: &owners | ||
| - cspi-qe-ocp-lp | ||
| - ieng-chaos | ||
| reviewers: *owners |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| # Agent-based Installer (ABI) | ||
|
|
||
| **Layout (step-registry paths):** `conf/<platform>/` holds manifest / image-input work; `install/<mechanism>/` holds boot and cluster deployment (e.g. **BMC** | ||
| virtual media today; **PXE** or other targets can be added alongside without colliding with bare-metal **conf**). See `conf/bm` and `install/bmc` below. | ||
|
|
||
| **Step Inputs Parameters (names, defaults, semantics):** | ||
| | Step | Reference (source of truth) | Registry Documentation | | ||
| |---------------------|-----------------------------------------------------------------------|-------------------------------------------------------------------------------| | ||
| | **abi-conf-bm** | [`abi-conf-bm-ref.yaml`](conf/bm/abi-conf-bm-ref.yaml) | [`abi-conf-bm`](https://steps.ci.openshift.org/reference/abi-conf-bm) | | ||
| | **abi-install-bmc** | [`abi-install-bmc-ref.yaml`](install/bmc/abi-install-bmc-ref.yaml) | [`abi-install-bmc`](https://steps.ci.openshift.org/reference/abi-install-bmc) | | ||
|
|
||
| **Steps Execution Order:** [`abi-conf-bm-commands.sh`](conf/bm/abi-conf-bm-commands.sh) → [`abi-install-bmc-commands.sh`](install/bmc/abi-install-bmc-commands.sh) | ||
|
|
||
| **Official Documentation:** [Preparing to install with the Agent-based Installer](https://docs.redhat.com/en/documentation/openshift_container_platform/latest/html/installing_an_on-premise_cluster_with_the_agent-based_installer/preparing-to-install-with-the-agent-based-installer). | ||
|
|
||
| ## Installation Phases | ||
|
|
||
| | Phase | Comments | | ||
| |---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | **Day-0** | Cluster Configuration.<br> Creates a bare-minimum `install-config.yaml` and generates an `agent-config.yaml` template. Then `UpdateCfg Day0` applies overrides from `OCP__ABI__CFG_FN`, followed by `OCP__ABI__DAY0_SCRIPTS_YAML`. Both configuration files must be complete before proceeding to Day-1. | | ||
| | **Day-1** | Manifest Customization.<br> Generates the full manifest tree under `openshift/` (`agent create cluster-manifests`). Then `UpdateCfg Day1` applies overrides from `OCP__ABI__CFG_FN`, followed by `OCP__ABI__DAY1_SCRIPTS_YAML`, before the ISO is built. | | ||
| | **Day-1.5** | Post-Bootstrap Operations.<br> Runs after `agent wait-for bootstrap-complete`. Applies custom actions as configured in `OCP__ABI__CFG_FN` (e.g. scale Worker MachineSets to 0 when workers are provisioned directly by ABI). Runs concurrently with `wait-for install-complete`. | | ||
| | **Day-2** | Post-Deployment Customization.<br> Runs after `agent wait-for install-complete` and `KUBECONFIG` is set. Custom post-deployment actions via `OCP__ABI__DAY2_SCRIPTS_YAML` (e.g. install operators, apply policies). | | ||
|
|
||
| `SHARED_DIR` holds inter-step artifacts (tarball, kubeconfig, `kubeconfig-minimal`). Logs and `ocp.tgz` → `ARTIFACT_DIR`. | ||
|
|
||
| ## OCP__ABI__CFG_FN | ||
|
|
||
| Pre-populate `OCP__ABI__CFG_FN` (e.g., `${CLUSTER_PROFILE_DIR}/ocp--abi--cfg.yaml`) with the full `agent-config.yaml`, e.g. Host definitions (NMState network | ||
| config, BMC addresses), and any extra configuration needed: | ||
| ```yaml | ||
| Day0: | ||
| config: {} | ||
| configFileOverride: | ||
| yaml+: | ||
| - ...yamlCfg...: | ||
| ...yamlCfgContentToDeepMergeAppendArray... | ||
| yaml-: | ||
| - ...yamlCfg...: | ||
| ...yamlCfgContentToDeepMergeReplaceArray... | ||
| yaml=: | ||
| - ...yamlCfg...: | ||
| ...yamlCfgContentToReplace... | ||
| json+: | ||
| - ...jsonCfg...: | | ||
| ...jsonCfgContentToDeepMergeAppendArray... | ||
| json-: | ||
| - ...jsonCfg...: | | ||
| ...jsonCfgContentToDeepMergeReplaceArray... | ||
| json=: | ||
| - ...jsonCfg...: | | ||
| ...jsonCfgContentToReplace... | ||
| Day1: # Same schema as `Day0` | ||
| ... | ||
| Day1.5: | ||
| config: | ||
| - NodeProv: ...booleanNodeProvisioningStatus... | ||
| Day2: # Same schema as `Day1.5` | ||
| ... | ||
| ``` | ||
|
|
||
| Example: | ||
| ```yaml | ||
| Day0: | ||
| configFileOverride: | ||
| yaml-: | ||
| - install-config.yaml: | ||
| networking: | ||
| machineNetwork: | ||
| - cidr: 10.6.158.0/24 | ||
| platform: | ||
| baremetal: | ||
| apiVIPs: | ||
| - 10.6.158.26 | ||
| ingressVIPs: | ||
| - 10.6.158.27 | ||
| provisioningNetwork: Disabled | ||
| - agent-config.yaml: # Full agent-config.yaml: Host definitions (NMState network config, BMC addresses, roles, rootDeviceHints, etc.) | ||
| apiVersion: v1beta1 | ||
| kind: AgentConfig | ||
| metadata: | ||
| name: integrity-config | ||
| rendezvousIP: 10.6.158.11 | ||
| additionalNTPSources: | ||
| - clock.corp.redhat.com | ||
| hosts: | ||
| - ... # Per-host: hostname, role, rootDeviceHints, interfaces, networkConfig, bmc | ||
| Day1.5: | ||
| config: | ||
| - NodeProv: false | ||
| ``` | ||
|
|
||
| ## Tunneling / Chisel | ||
|
|
||
| Refer to [WebApp Services — Chisel Tunneling Service](https://redhat.atlassian.net/wiki/display/MPEXIENG/WebApp+Services#Chisel-Tunneling-Service) | ||
| for the reference setup (which uses **NGINX** as a reverse proxy in front of **Chisel** to achieve configurable data-plane port forwarding). | ||
|
|
||
| Operational layout and port table (if the above reference setup is used): | ||
| [Chisel Tunneling Service](https://redhat.atlassian.net/wiki/display/MPEXIENG/WebApp+Services#Step2.1.2.2.3--Chisel_OperationalTasks). | ||
|
|
||
| Step Input Parameters: `OCP__ABI__TUN_SVC__*` / `OCP__ABI__TEAM_NAME` | ||
|
|
||
| ## BMC / Redfish | ||
|
|
||
| **abi-conf-bm** emits `ocp--bmc--info.json`; **abi-install-bmc** drives virtual media and power via Redfish. Details live in `abi-install-bmc-commands.sh` | ||
| (maintainer-oriented). | ||
|
|
||
| ## Phase Customization Scripts | ||
|
|
||
| The `OCP__ABI__DAY0_SCRIPTS_YAML`, `OCP__ABI__DAY1_SCRIPTS_YAML`, and `OCP__ABI__DAY2_SCRIPTS_YAML` allow injecting arbitrary shell scripts into the | ||
| corresponding installation phase, executed in the order listed within the step's shell environment. See [Installation Phases](#installation-phases) for when | ||
| each script runs relative to the phase operations. | ||
|
|
||
| Example (`OCP__ABI__DAY0_SCRIPTS_YAML`): | ||
| ```yaml | ||
| OCP__ABI__DAY0_SCRIPTS_YAML: | | ||
| Scripts: | ||
| - | # Complete override of configuration files instead of using `OCP__ABI__CFG_FN` mechanism (not recommended, just serves as an example). | ||
| mkdir -p "${OCP__ABI__CLUSTER_DIR}/openshift" | ||
| cp -f "${CLUSTER_PROFILE_DIR}/install-config.yaml" "${OCP__ABI__CLUSTER_DIR}/install-config.yaml" | ||
| cp -f "${CLUSTER_PROFILE_DIR}/agent-config.yaml" "${OCP__ABI__CLUSTER_DIR}/agent-config.yaml" | ||
| ``` | ||
|
|
||
| Schema: [BuildCustomScriptsFromYAML.sh](https://github.com/RedHatQE/OpenShift-LP-QE--Tools/blob/main/libs/bash/common/BuildCustomScriptsFromYAML.sh). | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../OWNERS |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../OWNERS |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| { | ||
| "path": "abi/chains/bm--bmc/abi-chains-bm--bmc-chain.yaml", | ||
| "owners": { | ||
| "approvers": [ | ||
| "cspi-qe-ocp-lp", | ||
| "ieng-chaos" | ||
| ], | ||
| "reviewers": [ | ||
| "cspi-qe-ocp-lp", | ||
| "ieng-chaos" | ||
| ] | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| chain: | ||
| as: abi-chains-bm--bmc | ||
| env: | ||
| - name: OCP__ABI__CLUSTER_DIR | ||
| default: /tmp/ocpClusterDir | ||
| documentation: |- | ||
| The Steps use a Container Image where the `CWD` is R/O. Overrides this to a writable location. | ||
| steps: | ||
| - ref: abi-conf-bm | ||
| - ref: abi-install-bmc | ||
| documentation: |- | ||
| This Chain deploy OpenShift Container Platform (OCP) on Bare Metal with BMC. | ||
|
|
||
| See [ABI overview](https://github.com/openshift/release/blob/main/ci-operator/step-registry/abi/README.md) for details. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../OWNERS |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../OWNERS |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,220 @@ | ||
| #!/bin/bash | ||
| # abi-conf-bm — Agent-based installer configuration (bare metal; **conf** phase). | ||
| # | ||
| # Logic in this Step: | ||
| # - Bare-minimum `install-config.yaml` scaffold -> OCP-version-aware defaults -> `baremetal` platform -> `agent-config.yaml` template. | ||
| # - `UpdateCfg Day0` merges, updates, or replaces config entries; `OCP__ABI__DAY0_SCRIPTS_YAML` scripts further customize `install-config.yaml` / `agent-config.yaml`. | ||
| # - Extracts BMC info to `ocp--bmc--info.json`; strips BMC credentials from `agent-config.yaml`. | ||
| # - Generates Cluster manifests. | ||
| # - `UpdateCfg Day1` + `OCP__ABI__DAY1_SCRIPTS_YAML` scripts customize manifests. | ||
| # | ||
| set -euxo pipefail | ||
| shopt -s inherit_errexit | ||
|
|
||
| mkdir -p "${OCP__ABI__CLUSTER_DIR}" | ||
|
|
||
| eval "$( | ||
| curl -fsSL "https://raw.githubusercontent.com/RedHatQE/OpenShift-LP-QE--Tools/main/libs/bash/common/BuildCustomScriptsFromYAML.sh" | ||
| )" | ||
| eval "$( | ||
| curl -fsSL "https://raw.githubusercontent.com/RedHatQE/OpenShift-LP-QE--Tools/main/libs/bash/common/EnsureReqs.sh" | ||
| )"; EnsureReqs yq | ||
|
sg-rh marked this conversation as resolved.
|
||
|
|
||
| typeset ocpABIcfg="${CLUSTER_PROFILE_DIR}/${OCP__ABI__CFG_FN}"; [ -r "${ocpABIcfg}" ] | ||
|
|
||
| # Extract `openshift-install` from the release image. | ||
| # The `RELEASE_IMAGE_LATEST` is set by CI Operator based on `.releases.latest` in CI Conf. | ||
| oc adm release extract \ | ||
| -a /var/run/secrets/registry-pull--build-farms/.dockerconfigjson \ | ||
| "${RELEASE_IMAGE_LATEST}" \ | ||
| --command=openshift-install \ | ||
| --to="/tmp" | ||
| export PATH="/tmp:${PATH}" | ||
|
|
||
|
|
||
| function openshift-install () { | ||
|
sg-rh marked this conversation as resolved.
|
||
| typeset -i es=0 | ||
| { | ||
| echo \ | ||
| "$(date -Iseconds)|${FUNCNAME[0]@Q} ${*@Q}"$'\n'"$(printf '%.0s-' {1..80})" | ||
| command openshift-install \ | ||
| --dir "${OCP__ABI__CLUSTER_DIR}/" \ | ||
| --log-level "${OCP__ABI__INSTLR_LOG_LEVEL}" \ | ||
| "$@" 2>&1 || es=$? | ||
| echo "$(printf '%.0s=' {1..80})" | ||
| exit ${es} | ||
| } | tee -a "${ARTIFACT_DIR}/ocp--installer--cluster.log" | ||
| return ${PIPESTATUS[0]} | ||
| } | ||
|
Comment on lines
+35
to
+48
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: This
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is a valid point. My concern is that a different installation process might not always share or require these exact same CLI parameters. While being called more than once is a good enough reason to create a function, moving it into an external library requires a much broader use case. Introducing an external library carries a higher cost and increases our dependency burden; it needs stronger justification if it is only going to be used by two Step scripts. I am not opposing the idea right off the bat, as I generally prefer to do things right from the get-go, but unfortunately, time is not on our side for this particular PR. Let's keep this idea in mind for when we start developing more Thanks for the great suggestion! |
||
|
|
||
| function UpdateCfg () { | ||
| typeset topKey="${1:?}"; (($#)) && shift | ||
| typeset cfgType='' cfgFile='' cfgCont='' updateOp='' | ||
| while IFS=$'\t' read -r cfgType cfgFile cfgCont; do | ||
| [[ "${cfgFile}" == */* ]] && | ||
| mkdir -p "${OCP__ABI__CLUSTER_DIR}/${cfgFile%/*}" | ||
| true 1>> "${OCP__ABI__CLUSTER_DIR}/${cfgFile}" | ||
| exec 3< <(cat "${OCP__ABI__CLUSTER_DIR}/${cfgFile}"); wait $! | ||
| case ${cfgType} in | ||
| (*+) updateOp='select(fileIndex==0) *+ ' ;; | ||
| (*-) updateOp='select(fileIndex==0) * ' ;; | ||
| (*=) updateOp='' ;; | ||
| esac | ||
| updateOp+='select(fileIndex==1)' | ||
| case ${cfgType} in | ||
| (yaml+|yaml-|yaml=) | ||
| yq eval-all "${updateOp}" \ | ||
| - \ | ||
| <(set +x; yq -p json -o yaml eval . 0<<<"${cfgCont}") \ | ||
| 0<&3 1>"${OCP__ABI__CLUSTER_DIR}/${cfgFile}" | ||
| ;; | ||
| (json+|json-|json=) | ||
| yq -p json -o json eval-all "${updateOp}" \ | ||
| - \ | ||
| <(set +x; echo "${cfgCont}") \ | ||
| 0<&3 1>"${OCP__ABI__CLUSTER_DIR}/${cfgFile}" | ||
| ;; | ||
| (*) : "Invalid Type: ${cfgType}"; false;; | ||
| esac | ||
| exec 3<&- | ||
| done 0< <( | ||
| yq -o json eval . "${ocpABIcfg}" | | ||
| jq -r --arg k "${topKey}" ' | ||
| (.[$k].configFileOverride // empty) | to_entries[] | | ||
| .key as $type | .value[]? | to_entries[] | | ||
| [$type, .key, ( | ||
| if ($type | startswith("json")) then .value | ||
| else (.value | tojson) | ||
| end | ||
| )] | join("\t") | ||
| ' | ||
| ) | ||
| true | ||
| } | ||
|
|
||
|
|
||
| # Create bare-minimum `install-config.yaml`. | ||
| { | ||
| yq -p yaml -o json eval . | | ||
| jq -c \ | ||
| --arg clsName "${OCP__ABI__BM__CLS_NAME}" \ | ||
| --arg baseDom "${OCP__ABI__BM__BASE_DOM}" \ | ||
| --rawfile pullCrd <(set +x; cat "${CLUSTER_PROFILE_DIR}/pull-secret") \ | ||
| --rawfile sshKey <(set +x; cat "${CLUSTER_PROFILE_DIR}/ssh-publickey") \ | ||
| ' | ||
| .baseDomain=$baseDom | | ||
| .metadata.name=$clsName | | ||
| .pullSecret=($pullCrd | rtrimstr("\n")) | | ||
| .sshKey=$sshKey | ||
| ' | | ||
| yq -p json -o yaml eval . | ||
| } 0<<'fileEOF' 1> "${OCP__ABI__CLUSTER_DIR}/install-config.yaml" | ||
| apiVersion: v1 | ||
| baseDomain: '' | ||
| metadata: | ||
| name: '' | ||
| platform: {none: {}} | ||
| pullSecret: '' | ||
| sshKey: '' | ||
| fileEOF | ||
|
|
||
| # Enrich with OCP-version-aware defaults. | ||
| openshift-install create install-config | ||
| # Update for Bare Metal target. | ||
| yq -i eval \ | ||
| '.platform={"baremetal": {}}' \ | ||
| "${OCP__ABI__CLUSTER_DIR}/install-config.yaml" | ||
|
|
||
| # Create `agent-config.yaml` template. | ||
| openshift-install agent create agent-config-template | ||
| # Being idempotent on re-run. | ||
| [ -s "${OCP__ABI__CLUSTER_DIR}/agent-config.yaml" ] || { | ||
| jq -r \ | ||
| '."*agentconfig.AgentConfig".File.Data' \ | ||
| "${OCP__ABI__CLUSTER_DIR}/.openshift_install_state.json" | | ||
| base64 -d 1> "${OCP__ABI__CLUSTER_DIR}/agent-config.yaml" | ||
| } | ||
|
|
||
| # Customize `install-config.yaml` and complete `agent-config.yaml`. | ||
| UpdateCfg Day0 | ||
| eval "$(BuildCustomScriptsFromYAML OCP__ABI__DAY0_SCRIPTS_YAML)" | ||
|
|
||
| # Retrieve BMC Information from `agent-config.yaml`. | ||
| # Currently, if all Master Nodes are ready to be installed, but | ||
| # not all Worker Nodes are registering, the | ||
| # `wait-for bootstrap-complete` will exit out with error. | ||
| # As workaround, we boot the Worker Nodes first, and the | ||
| # Rendezvous Host last. | ||
| { | ||
| yq -p yaml -o json eval . | | ||
| jq \ | ||
| --rawfile usr <(set +x; cat "${CLUSTER_PROFILE_DIR}/cred--bmc--usr") \ | ||
| --rawfile pwd <(set +x; cat "${CLUSTER_PROFILE_DIR}/cred--bmc--pwd") \ | ||
| --argjson rIP "$(yq -o json '(select( | ||
| (.rendezvousIP | length) > 0) | .rendezvousIP | ||
| ) // ([ | ||
| (.hosts[] | select(.role == "master")), | ||
| (.hosts[] | select(.role == "arbiter")), | ||
| (.hosts[] | select((.role == "") or (.role == null))) | ||
| ] | .[0] | [.networkConfig.interfaces[] | | ||
| select(.ipv4.enabled == true) | | ||
| .ipv4.address[0].ip | ||
| ] | .[0]) // error( | ||
| "rendezvousIP could not be determined" | ||
| ) ' "${OCP__ABI__CLUSTER_DIR}/agent-config.yaml")" \ | ||
| '[( | ||
| (.hosts[] | select(.role == "worker")), | ||
| (( | ||
| (.hosts[] | select((.role == "") or (.role == null))), | ||
| (.hosts[] | select(.role == "auto-assign")), | ||
| (.hosts[] | select(.role == "arbiter")), | ||
| (.hosts[] | select(.role == "master")) | ||
| ) | select(any(( | ||
| .networkConfig.interfaces[] | | ||
| select(.ipv4.enabled == true) | | ||
| .ipv4.address[]?.ip | ||
| ); . == $rIP) | not)), | ||
| (.hosts[] | select(any(( | ||
| .networkConfig.interfaces[] | | ||
| select(.ipv4.enabled == true) | | ||
| .ipv4.address[]?.ip | ||
| ); . == $rIP))) | ||
| ) | { | ||
| url: ("https://" + (.bmc.address | split("://")[-1])), | ||
| usr: (.bmc.username // ($usr | rtrimstr("\n"))), | ||
| pwd: (.bmc.password // ($pwd | rtrimstr("\n"))), | ||
| hostIPv4: ([ | ||
| .networkConfig.interfaces[] | | ||
| select(.ipv4.enabled == true) | | ||
| .ipv4.address[0]?.ip | ||
| ][0] // null) | ||
| }]' | ||
| } 0< "${OCP__ABI__CLUSTER_DIR}/agent-config.yaml" 1> "${SHARED_DIR}/ocp--bmc--info.json" | ||
|
|
||
| # Strip BMC Credentials from `agent-config.yaml`. | ||
| exec 3< <(cat "${OCP__ABI__CLUSTER_DIR}/agent-config.yaml"); wait $! | ||
| { | ||
| yq -p yaml -o json eval . | | ||
| jq '.hosts[].bmc |= del(.username, .password)' | | ||
| yq -p json -o yaml eval . | ||
| } 0<&3 1> "${OCP__ABI__CLUSTER_DIR}/agent-config.yaml" | ||
| exec 3<&- | ||
|
|
||
| # Set ISO Mode. | ||
| ((OCP__ABI__MIN_ISO)) && ( | ||
| export __IMG__ROOT_FS="${OCP__ABI__TUN_SVC__DP_BASE_URL%%/}/${OCP__ABI__TUN_SVC__DP_PORT}/boot-artifacts" | ||
| yq -i eval ' | ||
| .minimalISO=true | | ||
| .bootArtifactsBaseURL=strenv(__IMG__ROOT_FS) | ||
| ' "${OCP__ABI__CLUSTER_DIR}/agent-config.yaml" | ||
| ) | ||
|
|
||
| # Generate full manifest tree. | ||
| openshift-install agent create cluster-manifests | ||
|
|
||
| # Manifest Customization. | ||
| UpdateCfg Day1 | ||
| eval "$(BuildCustomScriptsFromYAML OCP__ABI__DAY1_SCRIPTS_YAML)" | ||
|
|
||
| # Save OCP Installation information for next Step. | ||
| tar zcf "${SHARED_DIR}/ocpClusterInf.tgz" -C "${OCP__ABI__CLUSTER_DIR}/" . | ||
Uh oh!
There was an error while loading. Please reload this page.