Fix medik8s step-registry: skopeo, artifacts, robustness#79820
Fix medik8s step-registry: skopeo, artifacts, robustness#79820openshift-merge-bot[bot] merged 5 commits into
Conversation
|
Skipping CI for Draft Pull Request. |
WalkthroughAdds UTC-timestamped logging and EXIT-bound artifact collection to Medik8s CI step scripts; refactors CatalogSource image/IDMS handling and readiness diagnostics; restructures operator subscription flow into two phases with improved CSV waiting; adjusts system-test resource requests and memory limits. ChangesMedik8s CI Logging and Artifact Collection
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ci-operator/step-registry/medik8s/operator-subscribe/medik8s-operator-subscribe-commands.sh (1)
192-197:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReject an all-empty
OPERATORSlist after trimming.
OPERATORS=", , "passes the current env-var check, every parsed entry is skipped here, and the step exits green without creating anySubscription. Please normalize once, keep only non-empty package names, and fail if the resulting list is empty.One way to make the validation explicit
- IFS=',' read -ra OPERATOR_LIST <<< "$OPERATORS" + IFS=',' read -ra RAW_OPERATOR_LIST <<< "$OPERATORS" + OPERATOR_LIST=() + for pkg in "${RAW_OPERATOR_LIST[@]}"; do + pkg="${pkg//[[:space:]]/}" + [[ -n "$pkg" ]] && OPERATOR_LIST+=("$pkg") + done + + if [[ ${`#OPERATOR_LIST`[@]} -eq 0 ]]; then + log "ERROR: OPERATORS did not contain any non-empty package names" + exit 1 + fi for pkg in "${OPERATOR_LIST[@]}"; do - pkg="${pkg//[[:space:]]/}" - [[ -z "$pkg" ]] && continue log "" log "--- Installing operator: ${pkg} ---" wait_for_package_manifest "$pkg" || exit 1 create_subscription "$pkg" done @@ local failed=0 for pkg in "${OPERATOR_LIST[@]}"; do - pkg="${pkg//[[:space:]]/}" - [[ -z "$pkg" ]] && continue if ! wait_for_subscription "$pkg"; then log "ERROR: Subscription ${pkg} does not exist in ${INSTALL_NAMESPACE}" failed=1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/medik8s/operator-subscribe/medik8s-operator-subscribe-commands.sh` around lines 192 - 197, The script currently splits OPERATORS into OPERATOR_LIST and then skips empty/whitespace-only entries inside the for-loop, allowing inputs like " , , " to silently produce no work; update the parsing/validation so you first normalize and filter OPERATOR_LIST to remove any empty or whitespace-only package names (trim each item and keep only non-empty strings), then if the resulting list is empty log a clear error and exit non-zero (e.g., processLogger/echo error + exit 1) before entering the loop; refer to OPERATOR_LIST, OPERATORS, and pkg when making the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/medik8s/catalogsource/medik8s-catalogsource-commands.sh`:
- Around line 80-95: The Quay curl probes for the FBC manifest and the fallback
tag list (the calls that reference FBC_COMMIT_SHA, QUAY_REPO_PATH and image_name
and populate fallback_tag) lack timeouts and retries; update both curl
invocations to include sensible retry and timeout flags (e.g. --retry,
--retry-delay, --connect-timeout, --max-time) or wrap them in a small retry loop
so transient network/Quay issues don’t hang the script and the fallback logic
reliably executes. Ensure the same stderr handling and jq/grep pipeline behavior
is preserved and that failures still allow the existing FBC_SHA_PINNED check and
fallback_tag assignment to behave as before.
- Around line 130-133: The current logic ignores the `oc wait mcp
--for=condition=Updating || true` and then relies on `oc wait ... Updated`,
which can pass on a stale MCP; instead capture a concrete MCP rendered config
before/after apply: read and store the MCP `status.configuration.name` (or
rendered config id) for the relevant MCPs before applying the IDMS, run the
apply, then poll/`oc wait` until the MCPs' `status.configuration.name` changes
from the stored pre-apply value (or matches the new expected name); remove the
silent ignore of the Updating wait (or at least only ignore errors from
not-started rollout) and prefer the configuration-name change as the success
criterion—use the existing references `IDMS_NAME`, `oc wait mcp --all
--for=condition=Updating`, and `status.configuration.name` to locate and update
the logic.
---
Outside diff comments:
In
`@ci-operator/step-registry/medik8s/operator-subscribe/medik8s-operator-subscribe-commands.sh`:
- Around line 192-197: The script currently splits OPERATORS into OPERATOR_LIST
and then skips empty/whitespace-only entries inside the for-loop, allowing
inputs like " , , " to silently produce no work; update the parsing/validation
so you first normalize and filter OPERATOR_LIST to remove any empty or
whitespace-only package names (trim each item and keep only non-empty strings),
then if the resulting list is empty log a clear error and exit non-zero (e.g.,
processLogger/echo error + exit 1) before entering the loop; refer to
OPERATOR_LIST, OPERATORS, and pkg when making the changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 591d8916-074b-4026-bda1-ff9aebc42fa3
📒 Files selected for processing (3)
ci-operator/config/medik8s/system-tests/medik8s-system-tests-main.yamlci-operator/step-registry/medik8s/catalogsource/medik8s-catalogsource-commands.shci-operator/step-registry/medik8s/operator-subscribe/medik8s-operator-subscribe-commands.sh
Replace skopeo commands with Quay API curl calls — the upi-installer image does not include skopeo, causing runtime failures. Use Quay v2 manifest API for image verification and v1 tag API for fallback tag listing. Add ARTIFACT_DIR usage with EXIT trap to persist debug artifacts (CatalogSource YAML, marketplace pods, events) in Prow artifacts. Add grpcPodConfig.securityContextConfig: restricted to CatalogSource spec, required for OCP >= 4.12 pod security admission. Additional improvements: - Add timestamped logging via log() helper function - Replace fragile sed IDMS rename with yq-v4 metadata.name set - Fix wait_for_catalogsource to check before sleeping (was sleep-first) - Write catsrc_name to SHARED_DIR in both modes for step chaining - Replace hardcoded /tmp path with mktemp - Quote image field in CatalogSource YAML heredoc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace fixed sleep 10 + single-shot subscription check with a wait_for_subscription() retry loop (5 attempts, 2s apart) for more resilient subscription verification. Add ARTIFACT_DIR usage with EXIT trap to persist debug artifacts (CSVs, Subscriptions, InstallPlans, OperatorGroup, events) in Prow artifacts for post-mortem debugging. Additional improvements: - Add timestamped logging via log() helper function - Filter empty entries from comma-separated OPERATORS input - Add shellcheck SC1090 directive for proxy source Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8e118d8 to
43ef055
Compare
|
Actionable comments posted: 0 |
1 similar comment
|
Actionable comments posted: 0 |
Both tests (lint, unit) run in a container from src and do not require an OCP release payload. The releases block adds unnecessary OCP release resolution overhead on every CI run and requires version updates when new OCP releases come out. Also adjust resource requests to cpu: 100m, memory: 200Mi with a 4Gi memory limit, matching the medik8s/common config pattern for lint/unit-only repos. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add --retry/--connect-timeout/--max-time to Quay API curl calls in verify_fbc_image(), matching the retry flags used by all other curl calls in the script. Normalize OPERATORS list at parse time: trim whitespace, filter empty entries, and fail if no valid package names remain. Prevents silent success on inputs like ", , ". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2b21c47 to
6722305
Compare
The previous approach waited for condition=Updating (with || true to swallow timeouts) then waited for condition=Updated. This could pass on stale state: if the MCP rollout hadn't started yet, Updated was already True and the wait returned immediately. Extract wait_for_mcp_rollout() that snapshots MCP rendered config names before IDMS apply, then polls for a change. If the rendered config changes, a rollout was triggered and we wait for Updated. If no change after 2 minutes, the IDMS didn't require a MachineConfig update and we proceed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6722305 to
4e767cc
Compare
|
[REHEARSALNOTIFIER]
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse auto-ack |
|
@razo7: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: razo7, ugreener The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@razo7: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Fixes issues found during review of the medik8s step-registry steps and system-tests config. Neither step has been exercised in CI yet, so these fixes prevent failures before they occur.
Commit 1: Fix medik8s-catalogsource step
skopeocommands with Quay APIcurlcalls — theupi-installerimage does not includeskopeo, causing runtime failures inverify_fbc_image()ARTIFACT_DIRusage with EXIT trap for debug artifact persistencelog()helpersedIDMS rename withyq-v4wait_for_catalogsourceto check before sleepingcatsrc_nametoSHARED_DIRin both modes for step chaining/tmpwithmktempCommit 2: Fix medik8s-operator-subscribe step
sleep 10+ single-shot subscription check withwait_for_subscription()retry loopARTIFACT_DIRusage with EXIT trapCommit 3: Fix system-tests config
releasesblock (both tests usefrom: src, not a release payload)medik8s/commonpattern for lint/unit reposjob-releaselabels)Commit 4: Address CodeRabbit review
--retry/--connect-timeout/--max-timeto Quay API curl callsCommit 5: Improve MCP rollout detection after IDMS apply
wait_for_mcp_rollout()— snapshots MCP rendered config names before IDMS apply, polls for changecondition=Updating || true→condition=Updatedchain that could pass on stale stateDependencies
Jira: RHWA-1021