Skip to content

feat(ci): parallelize base and RBAC helm deployments#4679

Open
gustavolira wants to merge 5 commits intoredhat-developer:mainfrom
gustavolira:feat/parallelize-showcase-deployments
Open

feat(ci): parallelize base and RBAC helm deployments#4679
gustavolira wants to merge 5 commits intoredhat-developer:mainfrom
gustavolira:feat/parallelize-showcase-deployments

Conversation

@gustavolira
Copy link
Copy Markdown
Member

Summary

Run base_deployment and rbac_deployment concurrently inside initiate_deployments() and initiate_deployments_osd_gcp(). Both target disjoint namespaces (NAME_SPACE vs NAME_SPACE_RBAC + NAME_SPACE_POSTGRES_DB) so there is no resource conflict, and overlapping the two helm upgrades plus their internal readiness waits saves ~3-5 min of wall-clock time on CI.

Tests continue to run sequentially downstream — parallelizing Playwright on the same CI pod caused OOMKilled in prior experiments (see #4469 for context).

Design

Phase Execution Why
Helm deployments (base + RBAC) Parallel Disjoint namespaces, no resource conflict
Readiness checks + Playwright tests Sequential (unchanged) Keeps this change small; avoids OOM risk

Implementation notes

  • New helper _run_parallel_deployments captures each background job's exit code via wait "$pid" || rc=$? and returns 1 if either deployment fails, so failures always propagate and are individually reported in the logs. This addresses the exit-code propagation concern raised on the previous attempt (feat(ci): parallelize deployments and readiness checks for showcase and showcase-rbac #4469).
  • No changes to callers (ocp-pull.sh, ocp-nightly.sh) — the helper is a drop-in replacement inside the existing initiate_deployments* functions.
  • Scope intentionally minimal: only helm deploys are parallelized. Further optimizations (parallel readiness polling, per-project artifact paths) can be iterated later.

Background

This is a fresh, rebased, and narrowed version of #4469, which was stale, drifted significantly from main, and mixed the deployment parallelization with test-parallelization changes that later had to be reverted. This PR ships only the safe, high-value part.

Test plan

  • OCP PR job (ocp-pull) runs both helm deployments in parallel and logs show overlapping helm upgrade output
  • OCP nightly (ocp-nightly) runs both deployments in parallel on OCP and OSD-GCP paths
  • Downstream tests still run sequentially with no OOMKilled
  • Forced failure of base deployment: RBAC still runs to completion, overall job fails, error reported
  • Forced failure of RBAC deployment: base still runs to completion, overall job fails, error reported

Resolves: RHIDP-12296

🤖 Generated with Claude Code

@rhdh-qodo-merge
Copy link
Copy Markdown

rhdh-qodo-merge Bot commented Apr 22, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Kubeconfig context race🐞 Bug ☼ Reliability
Description
namespace::configure and apply_yaml_files both mutate the current kubeconfig context namespace via
oc config set-context --current --namespace=...; when base/RBAC run concurrently, they race on
shared kubeconfig state. Because apply_yaml_files also runs oc apply/kubectl apply without an
explicit namespace for some resources, those resources can land in the wrong namespace depending on
timing.
Code

.ci/pipelines/utils.sh[R658-664]

+  "${base_fn}" "${base_arg}" &
+  local base_pid=$!
+  log::info "Base deployment started in background (PID: ${base_pid})"
+
+  "${rbac_fn}" "${rbac_arg}" &
+  local rbac_pid=$!
+  log::info "RBAC deployment started in background (PID: ${rbac_pid})"
Relevance

⭐⭐⭐ High

Team previously tightened oc apply namespace usage; racing `oc config set-context --current
--namespace` is a real parallelism risk.

PR-#3738
PR-#4129

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
Parallel execution is introduced by _run_parallel_deployments backgrounding both deployments. Both
deployments call namespace::configure and apply_yaml_files, each of which changes the current
context namespace in kubeconfig. apply_yaml_files contains multiple apply commands without
--namespace, which will therefore follow whichever concurrent job last updated the context, making
the target namespace nondeterministic.

.ci/pipelines/utils.sh[650-664]
.ci/pipelines/utils.sh[542-553]
.ci/pipelines/utils.sh[582-597]
.ci/pipelines/lib/namespace.sh[92-108]
.ci/pipelines/utils.sh[297-305]
.ci/pipelines/utils.sh[348-361]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Parallel deployments race on kubeconfig global state because `namespace::configure` and `apply_yaml_files` run `oc config set-context --current --namespace=...`. In parallel, this makes the "current namespace" nondeterministic, and some `oc apply`/`kubectl apply` calls in `apply_yaml_files` omit an explicit namespace.

### Issue Context
This PR introduces concurrent execution of base and RBAC deployments in `_run_parallel_deployments`.

### Fix Focus Areas
- .ci/pipelines/lib/namespace.sh[92-108]
- .ci/pipelines/utils.sh[297-365]
- .ci/pipelines/utils.sh[650-668]

### Implementation notes
- Remove/avoid `oc config set-context --current --namespace=...` in both `namespace::configure` and `apply_yaml_files` (or isolate kubeconfig per background job via a per-job `KUBECONFIG` copy).
- Add `--namespace="${project}"` to the `oc apply` calls that currently omit it.
- Add `-n "${project}"` (or `--namespace`) to the `kubectl apply` call as well.
- After changes, `apply_yaml_files` should be namespace-deterministic without relying on current context.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Shared temp files race🐞 Bug ≡ Correctness
Description
helm::merge_values writes intermediate YAML to fixed paths under /tmp, so concurrent base/RBAC
deployments can overwrite each other and produce corrupted merged values consumed by helm upgrade.
This can cause nondeterministic deployment failures or wrong configuration being deployed.
Code

.ci/pipelines/utils.sh[R658-668]

+  "${base_fn}" "${base_arg}" &
+  local base_pid=$!
+  log::info "Base deployment started in background (PID: ${base_pid})"
+
+  "${rbac_fn}" "${rbac_arg}" &
+  local rbac_pid=$!
+  log::info "RBAC deployment started in background (PID: ${rbac_pid})"
+
+  local base_rc=0 rbac_rc=0
+  wait "${base_pid}" || base_rc=$?
+  wait "${rbac_pid}" || rbac_rc=$?
Relevance

⭐⭐⭐ High

helm::merge_values uses fixed /tmp step files; parallel deployments can overwrite and corrupt merged
values.

PR-#4118

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The PR runs the two deployments concurrently (background jobs + wait). Both base_deployment and
rbac_deployment call helm::merge_values (PR path and OSD-GCP path). helm::merge_values uses
hard-coded intermediate file paths (/tmp/step-without-plugins.yaml and /tmp/step-only-plugins.yaml),
so two concurrent invocations will contend and overwrite each other.

.ci/pipelines/utils.sh[650-668]
.ci/pipelines/utils.sh[542-566]
.ci/pipelines/utils.sh[594-609]
.ci/pipelines/utils.sh[698-721]
.ci/pipelines/utils.sh[727-750]
.ci/pipelines/lib/helm.sh[30-73]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`helm::merge_values` uses fixed intermediate paths (`/tmp/step-without-plugins.yaml`, `/tmp/step-only-plugins.yaml`). With this PR’s parallel deployments, two concurrent calls can overwrite each other, corrupting the merged values file used for `helm upgrade`.

### Issue Context
The new `_run_parallel_deployments` runs `base_deployment*` and `rbac_deployment*` concurrently; both paths call `helm::merge_values`.

### Fix Focus Areas
- .ci/pipelines/lib/helm.sh[30-73]
- .ci/pipelines/utils.sh[650-668]

### Implementation notes
- Replace the fixed `step_1_file`/`step_2_file` with `mktemp` (or derive unique names from `final_file` + PID).
- Ensure cleanup via `trap 'rm -f ...' RETURN` inside `helm::merge_values`.
- Keep `final_file` deterministic; only intermediate files must be unique per invocation.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. In-place YAML edit race🐞 Bug ☼ Reliability
Description
apply_yaml_files rewrites shared YAML templates in-place via common::sed_inplace before applying
them; with base/RBAC running concurrently this can clobber edits and apply manifests with the wrong
namespace fields. This can lead to resources being created/updated in the wrong namespace or failing
validation due to mismatched metadata.
Code

.ci/pipelines/utils.sh[R658-664]

+  "${base_fn}" "${base_arg}" &
+  local base_pid=$!
+  log::info "Base deployment started in background (PID: ${base_pid})"
+
+  "${rbac_fn}" "${rbac_arg}" &
+  local rbac_pid=$!
+  log::info "RBAC deployment started in background (PID: ${rbac_pid})"
Relevance

⭐⭐⭐ High

apply_yaml_files/common::sed_inplace pattern implies shared in-place manifest edits; parallel runs
can clobber changes.

PR-#3904
PR-#4288

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
Parallel deployments both call apply_yaml_files; that function performs in-place edits on repo files
(service account, cluster role, cluster role binding) to change namespace: lines. Two concurrent
invocations editing the same files will overwrite each other’s changes and can cause the subsequent
oc apply to use the wrong namespace data in the manifest.

.ci/pipelines/utils.sh[650-664]
.ci/pipelines/utils.sh[542-553]
.ci/pipelines/utils.sh[594-597]
.ci/pipelines/utils.sh[305-314]
.ci/pipelines/lib/common.sh[66-74]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`apply_yaml_files` mutates shared YAML templates in-place (`sed -i`) to set namespaces. With this PR’s parallel deployments, two invocations will race on the same on-disk files and can apply manifests with incorrect `namespace:` fields.

### Issue Context
The templates being mutated live under `${DIR}/resources/...` and are used by both base and RBAC deployments.

### Fix Focus Areas
- .ci/pipelines/utils.sh[297-325]

### Implementation notes
- Do not edit tracked/shared YAML files in-place.
- Safer options:
 - Copy the needed YAMLs into a per-invocation temp directory (e.g., `tmpdir=$(mktemp -d)`), run `sed_inplace` on the copies, and apply the copies.
 - Or avoid filesystem writes entirely: generate the modified YAML on stdout (sed/yq) and `oc apply -f -`.
- Ensure any temp dirs/files are cleaned up via `trap`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@rhdh-qodo-merge
Copy link
Copy Markdown

Review Summary by Qodo

Parallelize base and RBAC helm deployments for CI optimization

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Parallelize base and RBAC helm deployments in CI pipelines
• New _run_parallel_deployments helper runs deployments concurrently
• Captures exit codes and propagates failures individually
• Saves ~3-5 minutes of wall-clock time on CI
Diagram
flowchart LR
  A["initiate_deployments<br/>initiate_deployments_osd_gcp"] -->|calls| B["_run_parallel_deployments"]
  B -->|spawns| C["base_deployment<br/>base_deployment_osd_gcp"]
  B -->|spawns| D["rbac_deployment<br/>rbac_deployment_osd_gcp"]
  C -->|parallel| E["Helm upgrades<br/>+ readiness checks"]
  D -->|parallel| E
  E -->|waits for both| F["Return success/failure<br/>with individual reporting"]
Loading

Grey Divider

File Changes

1. .ci/pipelines/utils.sh ✨ Enhancement +57/-4

Add parallel deployment orchestration helper function

• Added _run_parallel_deployments helper function that runs two deployment functions concurrently
• Helper captures exit codes via wait "$pid" || rc=$? pattern and returns 1 if either deployment
 fails
• Provides individual success/error logging for each deployment (base and RBAC)
• Updated initiate_deployments() to call _run_parallel_deployments instead of sequential calls
• Updated initiate_deployments_osd_gcp() to call _run_parallel_deployments instead of sequential
 calls

.ci/pipelines/utils.sh


Grey Divider

Qodo Logo

@github-actions
Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

gustavolira added a commit to gustavolira/rhdh that referenced this pull request Apr 23, 2026
…eployments

Address code review findings from PR redhat-developer#4679:

1. **Namespace validation**: Add defensive check to ensure NAME_SPACE and
   NAME_SPACE_RBAC are different and non-empty before starting parallel
   deployments. Prevents subtle race conditions if misconfigured.

2. **Improved logging**:
   - Log both namespace names at the start for debugging
   - Add synchronization point logs ("Waiting for parallel deployments..."
     and "Parallel deployments finished") to make log flow clearer when
     output from both background jobs is interleaved

3. **Documentation**: Add "Requires" section to function header documenting
   the namespace disjointness requirement

These changes improve debuggability and fail-fast behavior without changing
the core parallelization logic.
@github-actions
Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@github-actions
Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

Run base_deployment and rbac_deployment concurrently in initiate_deployments()
and initiate_deployments_osd_gcp(). Both target disjoint namespaces (NAME_SPACE
vs NAME_SPACE_RBAC + NAME_SPACE_POSTGRES_DB), so there is no resource conflict.
Overlapping the two helm upgrades and their internal readiness waits saves
~3-5 min of wall-clock time on CI.

The _run_parallel_deployments helper captures each background job's exit code
via `wait "$pid" || rc=$?` and returns 1 if either fails, so failures always
propagate and are individually reported in the logs. Tests continue to run
sequentially downstream (parallel Playwright would OOM given CI memory limits).

Resolves: RHIDP-12296
@gustavolira
Copy link
Copy Markdown
Member Author

/review

…eployments

Address code review findings from PR redhat-developer#4679:

1. **Namespace validation**: Add defensive check to ensure NAME_SPACE and
   NAME_SPACE_RBAC are different and non-empty before starting parallel
   deployments. Prevents subtle race conditions if misconfigured.

2. **Improved logging**:
   - Log both namespace names at the start for debugging
   - Add synchronization point logs ("Waiting for parallel deployments..."
     and "Parallel deployments finished") to make log flow clearer when
     output from both background jobs is interleaved

3. **Documentation**: Add "Requires" section to function header documenting
   the namespace disjointness requirement

These changes improve debuggability and fail-fast behavior without changing
the core parallelization logic.
Address 3 critical race conditions identified by Qodo code review:

**1. Shared temp files in helm::merge_values (Bug #1)**
- BEFORE: Used fixed paths /tmp/step-without-plugins.yaml and
  /tmp/step-only-plugins.yaml
- AFTER: Generate unique temp files per invocation via mktemp
- WHY: Concurrent helm::merge_values calls would overwrite each other's
  intermediate files, corrupting merged values

**2. Kubeconfig global state mutation (Bug #2)**
- BEFORE: namespace::configure and apply_yaml_files both ran
  `oc config set-context --current --namespace=...`
- AFTER: Removed all oc config set-context calls; added explicit
  --namespace flag to all oc/kubectl apply commands that lacked it
- WHY: Parallel deployments racing on current context would apply
  resources to nondeterministic namespaces

**3. In-place YAML file edits (Bug redhat-developer#3)**
- BEFORE: apply_yaml_files used sed -i to modify shared YAML templates
  directly in the repo
- AFTER: Copy templates to per-deployment tmpdir, patch copies, apply
  from tmpdir
- WHY: Concurrent sed -i edits would clobber each other and apply
  manifests with wrong namespace fields

All three bugs would have caused nondeterministic deployment failures
when base_deployment and rbac_deployment run in parallel.

Changes:
- .ci/pipelines/lib/helm.sh: mktemp + trap cleanup for intermediate files
- .ci/pipelines/lib/namespace.sh: Remove oc config set-context
- .ci/pipelines/utils.sh: Tmpdir for YAML patches, explicit --namespace
  on all oc apply calls
@rhdh-qodo-merge
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Trap Overwrite

The function sets a trap ... RETURN for temp-file cleanup. In bash, setting a RETURN trap inside a function can overwrite any previously configured RETURN trap in the same shell context, potentially breaking other cleanup logic. Consider saving/restoring any existing RETURN trap or using an explicit cleanup call pattern.

if [[ "$plugin_operation" == "merge" ]]; then
  # Temp files only needed for the merge path; overwrite writes directly to final_file
  local step_1_file step_2_file
  step_1_file=$(mktemp "${TMPDIR:-/tmp}/helm-merge-step1-XXXXXX.yaml")
  step_2_file=$(mktemp "${TMPDIR:-/tmp}/helm-merge-step2-XXXXXX.yaml")
  trap 'rm -f "${step_1_file}" "${step_2_file}"' RETURN
Trap Overwrite

Multiple functions introduce trap ... RETURN for temporary file/dir cleanup. Similar to other RETURN traps, this can clobber an existing RETURN trap and cause unexpected cleanup behavior elsewhere. Consider preserving/restoring prior trap handlers or avoiding RETURN traps if the script relies on them globally.

  # Patch a temp copy instead of editing the shared template in-place (parallel-deployment safe)
  local pg_cred_file
  pg_cred_file=$(mktemp "${TMPDIR:-/tmp}/postgres-cred-${project}-XXXXXX.yaml")
  sed "s|POSTGRES_PASSWORD:.*|POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}|g;s|POSTGRES_HOST:.*|POSTGRES_HOST: ${POSTGRES_HOST}|g" \
    "${DIR}/resources/postgres-db/postgres-cred.yaml" > "${pg_cred_file}"

  # Validate final configuration apply
  if ! oc apply -f "${pg_cred_file}" --namespace="${project}"; then
    rm -f "${pg_cred_file}"
    log::error "Failed to apply PostgreSQL credentials"
    return 1
  fi
  rm -f "${pg_cred_file}"

  log::success "External PostgreSQL database configured successfully!"
}

apply_yaml_files() {
  local dir=$1
  local project=$2
  local rhdh_base_url=$3
  log::info "Applying YAML files to namespace ${project}"

  # Create temporary directory for namespace-patched YAMLs (parallel-deployment safe)
  local tmpdir
  tmpdir=$(mktemp -d "${TMPDIR:-/tmp}/apply-yaml-${project}-XXXXXX")
  trap 'rm -rf "${tmpdir}"' RETURN
Parallel Safety

Parallel deployments rely on the invariant that no command depends on the kubeconfig “current namespace” and that all side effects are namespace-scoped and/or written to unique paths. Validate that both backgrounded deployment functions (and their downstream helpers) never call oc config set-context --current --namespace=... and never mutate shared on-disk templates or artifact paths that could conflict when run concurrently.

# Run base_deployment and rbac_deployment in parallel.
# Both target disjoint namespaces (NAME_SPACE vs NAME_SPACE_RBAC + NAME_SPACE_POSTGRES_DB)
# so there is no resource conflict. Overlapping the two helm upgrades and their
# internal readiness waits saves ~3-5 min of wall-clock time on CI.
#
# Args:
#   $1 - base_fn: function name for the base deployment
#   $2 - base_arg: artifacts_subdir passed to base_fn
#   $3 - rbac_fn: function name for the RBAC deployment
#   $4 - rbac_arg: artifacts_subdir passed to rbac_fn
#
# Returns:
#   0 if both deployments succeed
#   1 if either (or both) fail — failures are always reported individually
#
# Requires:
#   NAME_SPACE and NAME_SPACE_RBAC must be different to avoid resource conflicts
_run_parallel_deployments() {
  local base_fn=$1
  local base_arg=$2
  local rbac_fn=$3
  local rbac_arg=$4

  # Validate that namespaces are disjoint to prevent race conditions
  if [[ "${NAME_SPACE:-}" == "${NAME_SPACE_RBAC:-}" ]] || [[ -z "${NAME_SPACE:-}" ]] || [[ -z "${NAME_SPACE_RBAC:-}" ]]; then
    log::error "NAME_SPACE ('${NAME_SPACE:-}') and NAME_SPACE_RBAC ('${NAME_SPACE_RBAC:-}') must be different and non-empty for parallel deployment"
    return 1
  fi

  log::section "Starting parallel deployments: base + RBAC"
  log::info "Base namespace: ${NAME_SPACE}, RBAC namespace: ${NAME_SPACE_RBAC}"

  "${base_fn}" "${base_arg}" &
  local base_pid=$!
  log::info "Base deployment started in background (PID: ${base_pid})"

  "${rbac_fn}" "${rbac_arg}" &
  local rbac_pid=$!
  log::info "RBAC deployment started in background (PID: ${rbac_pid})"

  local base_rc=0 rbac_rc=0
  wait "${base_pid}" || base_rc=$?
  wait "${rbac_pid}" || rbac_rc=$?
  log::section "Parallel deployments finished — evaluating results"

  if [[ ${base_rc} -eq 0 ]]; then
    log::success "Base deployment completed"
  else
    log::error "Base deployment failed (exit code: ${base_rc})"
  fi
  if [[ ${rbac_rc} -eq 0 ]]; then
    log::success "RBAC deployment completed"
  else
    log::error "RBAC deployment failed (exit code: ${rbac_rc})"
  fi

  if [[ ${base_rc} -ne 0 || ${rbac_rc} -ne 0 ]]; then
    return 1
  fi
  return 0
}
📄 References
  1. No matching references available

1. postgres-cred.yaml in-place edit race: configure_external_postgres_db
   now patches a temp copy instead of editing the shared template via
   sed_inplace. Same class of bug fixed in apply_yaml_files.

2. Unquoted ${NAME_SPACE} in namespace::configure calls: quote both
   occurrences (lines 553, 721) to prevent word-splitting.

3. mktemp in helm::merge_values overwrite path: move mktemp + trap
   inside the "merge" branch so the "overwrite" path doesn't create
   and immediately discard two unused temp files.

4. Remove redundant "Waiting for parallel deployments" log::section
   that fires right before the blocking wait calls — the bookend
   sections are sufficient.
@gustavolira gustavolira force-pushed the feat/parallelize-showcase-deployments branch from a7771f4 to 9403307 Compare April 24, 2026 17:20
@github-actions
Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

trap RETURN leaks to the calling function in bash < 4.4, causing
unbound variable errors when step_1_file/tmpdir are expanded in
the caller's scope under set -o nounset.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant