Skip to content

Fix medik8s step-registry: skopeo, artifacts, robustness#79820

Merged
openshift-merge-bot[bot] merged 5 commits into
openshift:mainfrom
razo7:fix/medik8s-step-improvements
May 31, 2026
Merged

Fix medik8s step-registry: skopeo, artifacts, robustness#79820
openshift-merge-bot[bot] merged 5 commits into
openshift:mainfrom
razo7:fix/medik8s-step-improvements

Conversation

@razo7
Copy link
Copy Markdown
Member

@razo7 razo7 commented May 28, 2026

Summary

Fixes issues found during review of the medik8s step-registry steps and system-tests config. Neither step has been exercised in CI yet, so these fixes prevent failures before they occur.

Commit 1: Fix medik8s-catalogsource step

  • Critical: Replace skopeo commands with Quay API curl calls — the upi-installer image does not include skopeo, causing runtime failures in verify_fbc_image()
  • Add ARTIFACT_DIR usage with EXIT trap for debug artifact persistence
  • Add timestamped logging via log() helper
  • Replace fragile sed IDMS rename with yq-v4
  • Fix wait_for_catalogsource to check before sleeping
  • Write catsrc_name to SHARED_DIR in both modes for step chaining
  • Replace hardcoded /tmp with mktemp

Commit 2: Fix medik8s-operator-subscribe step

  • Replace fixed sleep 10 + single-shot subscription check with wait_for_subscription() retry loop
  • Add ARTIFACT_DIR usage with EXIT trap
  • Add timestamped logging
  • Filter empty entries from comma-separated OPERATORS input

Commit 3: Fix system-tests config

  • Remove unused releases block (both tests use from: src, not a release payload)
  • Adjust resource requests to match medik8s/common pattern for lint/unit repos
  • Regenerate Prow presubmit jobs (remove derived job-release labels)

Commit 4: Address CodeRabbit review

  • Add --retry/--connect-timeout/--max-time to Quay API curl calls
  • Normalize OPERATORS list at parse time; fail if no valid packages remain

Commit 5: Improve MCP rollout detection after IDMS apply

  • Extract wait_for_mcp_rollout() — snapshots MCP rendered config names before IDMS apply, polls for change
  • Replaces the unreliable condition=Updating || truecondition=Updated chain that could pass on stale state
  • Logs before/after config names on change detection for CI debugging

Dependencies

Jira: RHWA-1021

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 28, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 28, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Walkthrough

Adds UTC-timestamped logging and EXIT-bound artifact collection to Medik8s CI step scripts; refactors CatalogSource image/IDMS handling and readiness diagnostics; restructures operator subscription flow into two phases with improved CSV waiting; adjusts system-test resource requests and memory limits.

Changes

Medik8s CI Logging and Artifact Collection

Layer / File(s) Summary
System test resource profile adjustment
ci-operator/config/medik8s/system-tests/medik8s-system-tests-main.yaml
resources['*'] now sets limits.memory: 4Gi, requests.cpu: 100m, requests.memory: 200Mi replacing prior request values.
Operator-subscribe: logging, artifacts, and helpers
ci-operator/step-registry/medik8s/operator-subscribe/medik8s-operator-subscribe-commands.sh
Adds log() and collect_artifacts() wired to EXIT, updates proxy/namespace/operatorgroup helpers to use timestamped logging, and centralizes artifact collection.
Operator-subscribe: subscription and CSV orchestration
ci-operator/step-registry/medik8s/operator-subscribe/medik8s-operator-subscribe-commands.sh
Introduces wait_for_subscription(), reworks wait_for_csv() to emit subscription/installplan/events diagnostics and wait for CSV Succeeded via oc wait, refactors main() to create subscriptions first, then wait for subscription existence and CSV readiness, removes fixed sleep, and consolidates final reporting.
CatalogSource script: infra and verification
ci-operator/step-registry/medik8s/catalogsource/medik8s-catalogsource-commands.sh
Adds QUAY_REPO_PATH, log()/run() helpers, collect_artifacts() wired to EXIT, logs proxy handling and GitLab SHA resolution (hard-fail on unresolved), and replaces skopeo checks with Quay manifest/tag lookups and fallback tag selection.
CatalogSource script: apply, IDMS, readiness, and main flow
ci-operator/step-registry/medik8s/catalogsource/medik8s-catalogsource-commands.sh
Downloads/applies IDMS via a temp file and yq-v4, waits for MCP rollout (warns and proceeds on timeout), quotes spec.image in generated CatalogSource, logs per-iteration readiness polling, emits extra debug outputs (pods, CatalogSource YAML, pod selector YAML, Marketplace events) on timeout, writes ${SHARED_DIR}/catsrc_name, and installs artifact collection in main().

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • openshift/release#79547: Refactors and enhances the medik8s-operator-subscribe script; closely related to the operator-subscribe changes in this PR.

Suggested labels

lgtm, rehearsals-ack

Suggested reviewers

  • clobrano
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Title check ✅ Passed The title summarizes the main improvements across the medik8s step-registry changes: replacing skopeo with API calls, adding artifact persistence, and improving robustness.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR contains no Ginkgo test files. Changes are limited to CI operator configuration YAML and shell scripts, which are outside the scope of the Ginkgo test naming check.
Test Structure And Quality ✅ Passed PR contains only YAML config and bash shell script changes; no Ginkgo test code is present, so the test structure check is not applicable.
Microshift Test Compatibility ✅ Passed PR contains no Ginkgo e2e tests. Changes are CI infrastructure: one YAML configuration file and two bash shell scripts (step-registry commands), not test code.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds only CI/operator shell scripts and YAML config files; no Ginkgo e2e tests are introduced, so SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies CI config and test scripts only. No Kubernetes deployment manifests or scheduling constraints are introduced.
Ote Binary Stdout Contract ✅ Passed PR contains no OTE binary code; it only modifies YAML CI config and shell step-registry scripts. OTE Stdout Contract applies only to compiled binaries, not CI infrastructure.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR contains only CI configuration (YAML) and bash helper scripts; no Ginkgo e2e tests are added, so the IPv6/disconnected network compatibility check does not apply.
No-Weak-Crypto ✅ Passed No weak cryptography (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons detected in modified files.
Container-Privileges ✅ Passed No container privilege escalations found. Pod Security Standard namespace labels added; no securityContext.privileged, hostPID/Network/IPC, SYS_ADMIN caps, or allowPrivilegeEscalation in containers.
No-Sensitive-Data-In-Logs ✅ Passed No sensitive data exposed in logs. All logged content consists of non-sensitive Kubernetes metadata, configuration, and public container image references.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@razo7 razo7 marked this pull request as ready for review May 28, 2026 12:15
@openshift-ci openshift-ci Bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 28, 2026
@openshift-ci openshift-ci Bot requested review from beekhof and maximunited May 28, 2026 12:17
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ci-operator/step-registry/medik8s/operator-subscribe/medik8s-operator-subscribe-commands.sh (1)

192-197: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject an all-empty OPERATORS list after trimming.

OPERATORS=", , " passes the current env-var check, every parsed entry is skipped here, and the step exits green without creating any Subscription. Please normalize once, keep only non-empty package names, and fail if the resulting list is empty.

One way to make the validation explicit
-    IFS=',' read -ra OPERATOR_LIST <<< "$OPERATORS"
+    IFS=',' read -ra RAW_OPERATOR_LIST <<< "$OPERATORS"
+    OPERATOR_LIST=()
+    for pkg in "${RAW_OPERATOR_LIST[@]}"; do
+        pkg="${pkg//[[:space:]]/}"
+        [[ -n "$pkg" ]] && OPERATOR_LIST+=("$pkg")
+    done
+
+    if [[ ${`#OPERATOR_LIST`[@]} -eq 0 ]]; then
+        log "ERROR: OPERATORS did not contain any non-empty package names"
+        exit 1
+    fi
 
     for pkg in "${OPERATOR_LIST[@]}"; do
-        pkg="${pkg//[[:space:]]/}"
-        [[ -z "$pkg" ]] && continue
         log ""
         log "--- Installing operator: ${pkg} ---"
         wait_for_package_manifest "$pkg" || exit 1
         create_subscription "$pkg"
     done
@@
     local failed=0
     for pkg in "${OPERATOR_LIST[@]}"; do
-        pkg="${pkg//[[:space:]]/}"
-        [[ -z "$pkg" ]] && continue
         if ! wait_for_subscription "$pkg"; then
             log "ERROR: Subscription ${pkg} does not exist in ${INSTALL_NAMESPACE}"
             failed=1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/medik8s/operator-subscribe/medik8s-operator-subscribe-commands.sh`
around lines 192 - 197, The script currently splits OPERATORS into OPERATOR_LIST
and then skips empty/whitespace-only entries inside the for-loop, allowing
inputs like " , , " to silently produce no work; update the parsing/validation
so you first normalize and filter OPERATOR_LIST to remove any empty or
whitespace-only package names (trim each item and keep only non-empty strings),
then if the resulting list is empty log a clear error and exit non-zero (e.g.,
processLogger/echo error + exit 1) before entering the loop; refer to
OPERATOR_LIST, OPERATORS, and pkg when making the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/medik8s/catalogsource/medik8s-catalogsource-commands.sh`:
- Around line 80-95: The Quay curl probes for the FBC manifest and the fallback
tag list (the calls that reference FBC_COMMIT_SHA, QUAY_REPO_PATH and image_name
and populate fallback_tag) lack timeouts and retries; update both curl
invocations to include sensible retry and timeout flags (e.g. --retry,
--retry-delay, --connect-timeout, --max-time) or wrap them in a small retry loop
so transient network/Quay issues don’t hang the script and the fallback logic
reliably executes. Ensure the same stderr handling and jq/grep pipeline behavior
is preserved and that failures still allow the existing FBC_SHA_PINNED check and
fallback_tag assignment to behave as before.
- Around line 130-133: The current logic ignores the `oc wait mcp
--for=condition=Updating || true` and then relies on `oc wait ... Updated`,
which can pass on a stale MCP; instead capture a concrete MCP rendered config
before/after apply: read and store the MCP `status.configuration.name` (or
rendered config id) for the relevant MCPs before applying the IDMS, run the
apply, then poll/`oc wait` until the MCPs' `status.configuration.name` changes
from the stored pre-apply value (or matches the new expected name); remove the
silent ignore of the Updating wait (or at least only ignore errors from
not-started rollout) and prefer the configuration-name change as the success
criterion—use the existing references `IDMS_NAME`, `oc wait mcp --all
--for=condition=Updating`, and `status.configuration.name` to locate and update
the logic.

---

Outside diff comments:
In
`@ci-operator/step-registry/medik8s/operator-subscribe/medik8s-operator-subscribe-commands.sh`:
- Around line 192-197: The script currently splits OPERATORS into OPERATOR_LIST
and then skips empty/whitespace-only entries inside the for-loop, allowing
inputs like " , , " to silently produce no work; update the parsing/validation
so you first normalize and filter OPERATOR_LIST to remove any empty or
whitespace-only package names (trim each item and keep only non-empty strings),
then if the resulting list is empty log a clear error and exit non-zero (e.g.,
processLogger/echo error + exit 1) before entering the loop; refer to
OPERATOR_LIST, OPERATORS, and pkg when making the changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 591d8916-074b-4026-bda1-ff9aebc42fa3

📥 Commits

Reviewing files that changed from the base of the PR and between 5c280bc and 8e118d8.

📒 Files selected for processing (3)
  • ci-operator/config/medik8s/system-tests/medik8s-system-tests-main.yaml
  • ci-operator/step-registry/medik8s/catalogsource/medik8s-catalogsource-commands.sh
  • ci-operator/step-registry/medik8s/operator-subscribe/medik8s-operator-subscribe-commands.sh

Comment thread ci-operator/step-registry/medik8s/catalogsource/medik8s-catalogsource-commands.sh Outdated
Comment thread ci-operator/step-registry/medik8s/catalogsource/medik8s-catalogsource-commands.sh Outdated
razo7 and others added 2 commits May 28, 2026 15:25
Replace skopeo commands with Quay API curl calls — the upi-installer
image does not include skopeo, causing runtime failures. Use Quay v2
manifest API for image verification and v1 tag API for fallback tag
listing.

Add ARTIFACT_DIR usage with EXIT trap to persist debug artifacts
(CatalogSource YAML, marketplace pods, events) in Prow artifacts.

Add grpcPodConfig.securityContextConfig: restricted to CatalogSource
spec, required for OCP >= 4.12 pod security admission.

Additional improvements:
- Add timestamped logging via log() helper function
- Replace fragile sed IDMS rename with yq-v4 metadata.name set
- Fix wait_for_catalogsource to check before sleeping (was sleep-first)
- Write catsrc_name to SHARED_DIR in both modes for step chaining
- Replace hardcoded /tmp path with mktemp
- Quote image field in CatalogSource YAML heredoc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace fixed sleep 10 + single-shot subscription check with a
wait_for_subscription() retry loop (5 attempts, 2s apart) for more
resilient subscription verification.

Add ARTIFACT_DIR usage with EXIT trap to persist debug artifacts
(CSVs, Subscriptions, InstallPlans, OperatorGroup, events) in Prow
artifacts for post-mortem debugging.

Additional improvements:
- Add timestamped logging via log() helper function
- Filter empty entries from comma-separated OPERATORS input
- Add shellcheck SC1090 directive for proxy source

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@razo7 razo7 force-pushed the fix/medik8s-step-improvements branch from 8e118d8 to 43ef055 Compare May 28, 2026 12:25
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

1 similar comment
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

razo7 and others added 2 commits May 28, 2026 16:02
Both tests (lint, unit) run in a container from src and do not
require an OCP release payload. The releases block adds unnecessary
OCP release resolution overhead on every CI run and requires
version updates when new OCP releases come out.

Also adjust resource requests to cpu: 100m, memory: 200Mi with a
4Gi memory limit, matching the medik8s/common config pattern for
lint/unit-only repos.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add --retry/--connect-timeout/--max-time to Quay API curl calls in
verify_fbc_image(), matching the retry flags used by all other curl
calls in the script.

Normalize OPERATORS list at parse time: trim whitespace, filter empty
entries, and fail if no valid package names remain. Prevents silent
success on inputs like ", , ".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@razo7 razo7 force-pushed the fix/medik8s-step-improvements branch from 2b21c47 to 6722305 Compare May 28, 2026 13:04
The previous approach waited for condition=Updating (with || true
to swallow timeouts) then waited for condition=Updated. This could
pass on stale state: if the MCP rollout hadn't started yet, Updated
was already True and the wait returned immediately.

Extract wait_for_mcp_rollout() that snapshots MCP rendered config
names before IDMS apply, then polls for a change. If the rendered
config changes, a rollout was triggered and we wait for Updated.
If no change after 2 minutes, the IDMS didn't require a
MachineConfig update and we proceed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@razo7 razo7 force-pushed the fix/medik8s-step-improvements branch from 6722305 to 4e767cc Compare May 28, 2026 13:08
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@razo7: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-medik8s-system-tests-main-lint medik8s/system-tests presubmit Ci-operator config changed
pull-ci-medik8s-system-tests-main-unit medik8s/system-tests presubmit Ci-operator config changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@razo7
Copy link
Copy Markdown
Member Author

razo7 commented May 28, 2026

/pj-rehearse auto-ack

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@razo7: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot openshift-merge-bot Bot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label May 28, 2026
@ugreener
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 31, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 31, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: razo7, ugreener

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@razo7
Copy link
Copy Markdown
Member Author

razo7 commented May 31, 2026

/retest

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 31, 2026

@razo7: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 63ca5bb into openshift:main May 31, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants