Add install-trustee-operator step to azure-ipi-coco test#79996
Add install-trustee-operator step to azure-ipi-coco test#79996tbuskey wants to merge 19 commits into
Conversation
…dation Critical fixes for kbs-client connectivity test: 1. **Update kbs-client image**: v0.17.0 → v0.19.0 - Newer version with updated KBS protocol support - Fixes compatibility issues with current KBS 2. **Auto-discover latest tag with skopeo**: - NEW: get_kbs_client_tag() function - Checks KBS_CLIENT_TAG env var first (for overrides) - Uses skopeo to find latest v.X.Y.Z tag automatically - Falls back to v0.19.0 if skopeo fails - Command: skopeo list-tags docker://quay.io/confidential-containers/kbs-client 3. **Environment variable support**: - NEW: KBS_CLIENT_TAG env var in ref.yaml - Allows pinning specific version in CI config - Empty (default) = auto-discover latest - Set to specific tag = use that version 4. **Fix resource validation logic**: - REMOVED: 404 treated as success (was incorrect) - NOW: Must actually retrieve a resource to pass - Test resource: default/cosign-keys/key-0 (exists from KbsConfig) - Only succeeds if kbs-client exit code is 0 (resource retrieved) 5. **Better error detection**: - 404: Resource not found - KbsConfig secret publishing issue - 401: Unauthorized - Attestation failure - Timeout: KBS service unreachable - SSL/TLS: Wrong URL protocol Previous behavior: - Hardcoded v0.17.0 (outdated) - Got 401 Unauthorized but treated as success - Never actually retrieved a resource - False positive on connectivity New behavior: - Auto-discovers latest kbs-client version - Must successfully retrieve cosign-keys/key-0 resource - Fails if 404, 401, timeout, or any error - Proves KBS connectivity AND attestation work - Allows version override via KBS_CLIENT_TAG env var Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Remove INITDATA content from logs (security) - Focus on failures, not success/progress messages - Keep version logging (trustee image, kbs-client tag) - Keep attestation patterns (POST attest, GET resource) - Reduce TEE platform warnings to single line (WARN:) - Remove duplicate/verbose progress updates
The RCA (Resource-Centric Authorization) protocol flow includes: 1. GET resource → 401 (no token) - triggers attestation 2. POST /auth + POST /attest - get token via attestation 3. GET resource → 200 (with token) - success kbs-client emits WARN messages during this normal flow: - 'No TEE platform detected. Sample Attester will be used.' - 'Authenticating with KBS failed. Perform a new RCAR handshake' - 'Attestation Token not found' These are expected protocol messages, not errors. Changes: - Separate stdout (resource data) from stderr (warnings) - On success: show resource size and first line only - On failure: show full stderr and stdout for debugging - Add comments explaining RCA protocol flow - Update attestation patterns to show full flow including 401 This eliminates confusing WARN messages that made successful operations appear to have failed.
When operator deployment fails to appear, check: - CatalogSource status and pod - Subscription status and conditions - InstallPlan existence - CSV existence This will help diagnose why OLM isn't creating the operator deployment.
Instead of just checking for deployment, poll through each OLM stage: 1. CatalogSource READY (60s timeout) 2. Subscription has InstallPlan reference (60s) 3. InstallPlan Complete (60s) 4. CSV Succeeded (60s) 5. Deployment Available (60s) Each stage polls every 5s (12 attempts). Total max wait: 5 minutes (was 150s before). Shows exactly which stage fails with appropriate error output.
Changes: - Add Secret kbsres1 with key1=cmVzMXZhbDEK for testing - Add kbsres1 to KbsConfig kbsSecretResources - Test kbsres1/key1 instead of cosign-keys/key-0 for better validation - Show full oc exec command in logs - Display retrieved resource value in logs Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Issue: stringData auto-base64-encodes values, causing double-encoding: - stringData: cmVzMXZhbDEK → stored as Y21Wek1YWmhiREVL - KBS returns: Y21Wek1YWmhiREVL (wrong) Fix: Use data field to store pre-encoded value directly: - data: cmVzMXZhbDEK → stored as cmVzMXZhbDEK - KBS returns: cmVzMXZhbDEK (correct, base64 of "res1val1") Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…election Changes: - Add TRUSTEE_CATALOG_SOURCE_NAME (default: redhat-operators) - Add TRUSTEE_CATALOG_SOURCE_IMAGE (default: empty) - Only create CatalogSource if TRUSTEE_CATALOG_SOURCE_IMAGE is set - Check if CatalogSource exists before creating to avoid overwrites - Skip CatalogSource readiness wait when using existing catalog - Update Subscription to use configurable catalog source name - Deprecate TRUSTEE_IMAGE_REPO/TAG (only used with custom image) Usage patterns: 1. Default (redhat-operators): Set nothing, uses existing catalog 2. Custom catalog (brew-catalog): Set TRUSTEE_CATALOG_SOURCE_NAME only 3. New custom catalog: Set both NAME and IMAGE to create new catalog Benefits: - Simpler than OSC (no auto-discovery of latest image tags) - Supports existing catalogs without modification - Allows custom catalogs when needed - Prevents overwriting existing catalog sources Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes: - Add map_trustee_to_kbs_client_version() function - Extract trustee version from CSV name (e.g., trustee-operator.v1.1.0) - Map trustee versions to compatible kbs-client versions: - trustee 1.1.x → kbs-client v0.17.0 - trustee 1.11.x → kbs-client v0.19.0 - Export TRUSTEE_CSV_NAME from wait_for_operator for use in mapping - Update get_kbs_client_tag() to use version mapping before auto-discovery Version selection priority: 1. KBS_CLIENT_TAG env var (explicit override) 2. Version mapping from trustee CSV (semantic versioning) 3. Auto-discovery (latest semver tag from registry) 4. Fallback: v0.17.0 This ensures kbs-client compatibility with the installed trustee version. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a CI step, wiring, and a new installer script that conditionally deploys the Trustee operator and operands, discovers endpoints, generates encoded INITDATA, patches cluster config, and verifies connectivity via a temporary kbs-client pod. ChangesTrustee Operator Installation Workflow
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (13 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-azure-ipi-coco |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
- Add install-trustee-operator step to sandboxed-containers-operator-pre chain - Enable TRUSTEE_INSTALL=true for azure-ipi-coco test - Add TRUSTEE_CATALOG_SOURCE_IMAGE and TRUSTEE_CATALOG_SOURCE_NAME env vars This integrates the trustee operator installation step into the CoCo test workflow, allowing automated deployment and testing of confidential containers with KBS (Key Broker Service) support. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
4b0a0e1 to
3b4bf64
Compare
|
@tbuskey: job(s): periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-azure-ipi-coco either don't exist or were not found to be affected, and cannot be rehearsed |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tbuskey The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.claude/scripts/analyze-prowjob.sh:
- Around line 24-25: The cached checkout logic ignores PROWJOB_ANALYZER_BRANCH
so a preexisting CACHE_DIR can end up on the wrong branch; update the script to
either key the cache by branch (e.g., incorporate ${BRANCH} into CACHE_DIR) or,
before running dig.py, ensure the repo in CACHE_DIR is on the requested branch
by running a fetch and explicit checkout/reset to origin/${BRANCH} (git fetch
origin && git checkout -B "${BRANCH}" "origin/${BRANCH}" || git reset --hard
"origin/${BRANCH}"). Apply the same change to the other checkout blocks
referenced (around lines 47-69 and 84-95) so every cached use honors BRANCH.
- Around line 159-162: The script currently treats no positional arguments the
same as --help and exits 0; change the logic so missing required PROW_JOB_URL
causes a non-zero exit: call show_usage (preferably printing to stderr) and exit
with a failure code (e.g., exit 1) when no args are provided, but continue to
treat --help or -h as success and exit 0; update the if/branch that currently
checks [[ $# -eq 0 || "$1" == "--help" || "$1" == "-h" ]] to distinguish the
empty-args case from the help flags and reference show_usage and PROW_JOB_URL in
your change.
In @.claude/scripts/monitor-rehearsal.sh:
- Around line 61-79: Replace the current grep+awk approach and the "|| echo"
swallow with a fail-fast, single JSON query: call gh pr checks ${PR_NUM} --repo
openshift/release --json name,state,url (or link) and exit if the gh command
returns non-zero; then filter the returned JSON to find exactly one check whose
"name" equals ${JOB_NAME} (or apply the explicit matching rule) and set
job_status from the check's "state" field and PROW_URL from the check's
"url"/"link" field; apply this change in both the final-status branch (where
final_status is derived) and the polling branch (where
status_line/job_status/PROW_URL are set) so network/auth errors surface
immediately and columns are not relied on.
In @.claude/scripts/prow-fetch.sh:
- Around line 121-126: The pr_checks function is currently swallowing failures
from the gh command by piping its stderr to /dev/null and using | grep ... ||
true; change it so gh pr checks failures are propagated while still treating
grep "no matches" as non-fatal: run gh pr checks "${pr}" --repo
openshift/release and check its exit status first (or enable pipefail),
capturing its stdout into a variable (e.g., checks_output) without discarding
stderr, return/exit if gh fails, then run grep "${pattern}" against that
captured output and ignore only grep’s non-zero exit for no matches; update the
pr_checks function accordingly so gh errors surface but missing pattern does not
fail the script.
- Around line 79-88: In fetch_url(), the curl invocation uses curl_opts=(-sS)
which doesn't fail on HTTP 4xx/5xx; update the curl options in the fetch_url
function (the curl_opts array) to include -f (e.g., curl_opts=(-sS -f)) so that
HTTP errors cause curl to exit non-zero; preserve the existing
PROW_FETCH_HEADERS handling and logging (log_info "Fetching: ${url}") and ensure
callers of fetch_url will observe failure return codes.
In @.claude/scripts/trigger-rehearsal.sh:
- Line 1: Replace the incorrect commented shebang "#\!/bin/bash" at the top of
.claude/scripts/trigger-rehearsal.sh with a valid bash shebang (start the file
with "#!/bin/bash") so the script executes under Bash when invoked directly;
ensure the corrected line is the very first line of the file and has no leading
whitespace or comment characters.
In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-commands.sh`:
- Around line 297-301: The current pipeline swallows real oc apply failures by
piping get_trustee_catalog_source_manifest into oc apply -f - || true; instead
capture the manifest output from get_trustee_catalog_source_manifest
(referencing that function) into a variable or temp file, check if it is
non-empty (i.e. a manifest was emitted using TRUSTEE_CATALOG_SOURCE_NAME and
TRUSTEE_CATALOG_SOURCE_IMAGE substitutions), and only then run oc apply -f -
without "|| true" so failures fail the step and surface immediately; if the
manifest is empty, skip the apply path gracefully.
- Around line 693-715: The KBS client pod manifest returned by
get_kbs_client_manifest currently omits readOnlyRootFilesystem, resource
requests/limits, and leaves a service account token automounted; update the Pod
spec (in get_kbs_client_manifest) to set automountServiceAccountToken: false on
the Pod, add container securityContext.readOnlyRootFilesystem: true (in the
kbs-client container), keep allowPrivilegeEscalation: false and runAsNonRoot:
true, and add resource requests and limits (cpu and memory) for the kbs-client
container to satisfy the repo manifest requirements.
- Around line 88-90: When detecting an existing CatalogSource named by
TRUSTEE_CATALOG_SOURCE_NAME (the branch around the oc get check), also fetch its
spec.image and compare it to TRUSTEE_CATALOG_SOURCE_IMAGE; if the images differ,
fail fast with a non-zero exit and a clear error message describing the name and
both image values so we don't silently point subscriptions to the wrong catalog.
Update the existing conditional that currently returns 0 to perform the image
comparison (using oc to read spec.image for the CatalogSource) and only skip
creation when they match; otherwise exit with an error asking for a distinct
TRUSTEE_CATALOG_SOURCE_NAME or to update TRUSTEE_CATALOG_SOURCE_IMAGE.
In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.yaml`:
- Around line 66-67: The "NO NETWORK ACCESS REQUIRED" statement is incorrect
because the step invokes skopeo list-tags and pulls
quay.io/confidential-containers/kbs-client:${tag}, so either update the claim to
indicate external network access is required or change the step to use a
configurable/mirrored image and avoid remote tag lookups; specifically modify
the logic that calls skopeo list-tags and the hardcoded
quay.io/confidential-containers/kbs-client:${tag} reference to instead read an
injected variable (e.g., KBS_CLIENT_IMAGE or IMAGE_REGISTRY+IMAGE_NAME+TAG) and
skip skopeo when restrict_network_access is true, or document the requirement to
have the image mirrored locally.
- Around line 32-45: The three advertised step inputs TRUSTEE_IMAGE_REPO,
TRUSTEE_IMAGE_TAG, and TRUSTEE_CHARTS_REF are unused and must either be removed
from the step spec or actually consumed; choose one fix: (A) Remove the three
entries from the step inputs in
sandboxed-containers-operator-install-trustee-operator-ref.yaml to avoid
exposing dead knobs, or (B) implement consumption by plumbing those env vars
into the companion install-trustee-operator command/script and use them where
the trustee catalog image and chart ref are built/queried (e.g., in the code
paths that construct the trustee catalog source image name and the charts git
ref); ensure variable names TRUSTEE_IMAGE_REPO, TRUSTEE_IMAGE_TAG, and
TRUSTEE_CHARTS_REF are read from the environment and override defaults when
present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: ac30892a-d2fb-40f9-b4ee-857c8cdb5821
📒 Files selected for processing (8)
.claude/scripts/analyze-prowjob.sh.claude/scripts/monitor-rehearsal.sh.claude/scripts/prow-fetch.sh.claude/scripts/trigger-rehearsal.shci-operator/config/openshift/sandboxed-containers-operator/openshift-sandboxed-containers-operator-devel__downstream-candidate.yamlci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-commands.shci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.yamlci-operator/step-registry/sandboxed-containers-operator/pre/sandboxed-containers-operator-pre-chain.yaml
| CACHE_DIR="${PROWJOB_ANALYZER_CACHE:-${HOME}/.cache/prowjob-analyzer}" | ||
| BRANCH="${PROWJOB_ANALYZER_BRANCH:-devel}" |
There was a problem hiding this comment.
Honor PROWJOB_ANALYZER_BRANCH for cached checkouts.
Once ${CACHE_DIR} already exists, the requested branch is ignored until the 24-hour refresh, and git pull origin "${BRANCH}" then pulls that remote branch into whatever local branch is currently checked out. That can leave this wrapper running the wrong analyzer revision or a mixed checkout. Either key the cache by branch or explicitly fetch/switch/reset to the requested branch before invoking dig.py.
Suggested fix
-CACHE_DIR="${PROWJOB_ANALYZER_CACHE:-${HOME}/.cache/prowjob-analyzer}"
+CACHE_ROOT="${PROWJOB_ANALYZER_CACHE:-${HOME}/.cache/prowjob-analyzer}"
+CACHE_DIR="${CACHE_ROOT}/${BRANCH}"
BRANCH="${PROWJOB_ANALYZER_BRANCH:-devel}" function update_analyzer() {
pushd "${CACHE_DIR}" > /dev/null
- if git pull origin "${BRANCH}" >&2; then
+ if git fetch origin "${BRANCH}" >&2 \
+ && git checkout -B "${BRANCH}" "origin/${BRANCH}" >&2; then
date +%s > "${CACHE_DIR}/.last_update"
log_info "Prowjob-analyzer updated successfully"
elseAlso applies to: 47-69, 84-95
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/scripts/analyze-prowjob.sh around lines 24 - 25, The cached checkout
logic ignores PROWJOB_ANALYZER_BRANCH so a preexisting CACHE_DIR can end up on
the wrong branch; update the script to either key the cache by branch (e.g.,
incorporate ${BRANCH} into CACHE_DIR) or, before running dig.py, ensure the repo
in CACHE_DIR is on the requested branch by running a fetch and explicit
checkout/reset to origin/${BRANCH} (git fetch origin && git checkout -B
"${BRANCH}" "origin/${BRANCH}" || git reset --hard "origin/${BRANCH}"). Apply
the same change to the other checkout blocks referenced (around lines 47-69 and
84-95) so every cached use honors BRANCH.
| # Handle --help | ||
| if [[ $# -eq 0 || "$1" == "--help" || "$1" == "-h" ]]; then | ||
| show_usage | ||
| exit 0 |
There was a problem hiding this comment.
Return a failure code when the required URL is missing.
PROW_JOB_URL is documented as required, but the empty-args path exits 0. That turns a caller bug into a false success for any automation wrapping this helper. Keep --help/-h as success, and exit non-zero when no positional args are provided.
Suggested fix
function main() {
- # Handle --help
- if [[ $# -eq 0 || "$1" == "--help" || "$1" == "-h" ]]; then
+ if [[ $# -eq 0 ]]; then
+ show_usage
+ exit 1
+ fi
+
+ # Handle --help
+ if [[ "$1" == "--help" || "$1" == "-h" ]]; then
show_usage
exit 0
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Handle --help | |
| if [[ $# -eq 0 || "$1" == "--help" || "$1" == "-h" ]]; then | |
| show_usage | |
| exit 0 | |
| if [[ $# -eq 0 ]]; then | |
| show_usage | |
| exit 1 | |
| fi | |
| # Handle --help | |
| if [[ "$1" == "--help" || "$1" == "-h" ]]; then | |
| show_usage | |
| exit 0 | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/scripts/analyze-prowjob.sh around lines 159 - 162, The script
currently treats no positional arguments the same as --help and exits 0; change
the logic so missing required PROW_JOB_URL causes a non-zero exit: call
show_usage (preferably printing to stderr) and exit with a failure code (e.g.,
exit 1) when no args are provided, but continue to treat --help or -h as success
and exit 0; update the if/branch that currently checks [[ $# -eq 0 || "$1" ==
"--help" || "$1" == "-h" ]] to distinguish the empty-args case from the help
flags and reference show_usage and PROW_JOB_URL in your change.
| final_status=$(gh pr checks ${PR_NUM} --repo openshift/release 2>&1 | grep "${JOB_NAME}" || echo "Status unavailable") | ||
| echo "Final Status: ${final_status}" | ||
| echo "" | ||
| echo "Monitoring ended. Check Prow URL for full details." | ||
| exit 0 | ||
| fi | ||
|
|
||
| # Check job status | ||
| status_line=$(gh pr checks ${PR_NUM} --repo openshift/release 2>&1 | grep "${JOB_NAME}" || echo "") | ||
|
|
||
| if [ -n "${status_line}" ]; then | ||
| job_status=$(echo "${status_line}" | awk '{print $2}') | ||
| timestamp=$(date '+%Y-%m-%d %H:%M:%S') | ||
| elapsed_mins=$(( (current_time - (END_TIME - (DURATION_HOURS * 3600))) / 60 )) | ||
|
|
||
| # Extract Prow URL if not already set | ||
| if [ -z "${PROW_URL}" ] && [ "${job_status}" != "pending" ]; then | ||
| PROW_URL=$(echo "${status_line}" | awk '{print $4}') | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether the target PR currently produces multiple matching rows
# or whether gh failures are being masked.
set -euo pipefail
PR_NUM="${1:?PR number required}"
JOB_NAME="${2:?short job name required}"
echo "=== raw gh pr checks output ==="
gh pr checks "${PR_NUM}" --repo openshift/release
echo
echo "=== rows matched by current grep ==="
gh pr checks "${PR_NUM}" --repo openshift/release | grep -n -- "${JOB_NAME}" || trueRepository: openshift/release
Length of output: 102
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the relevant portions of the script (around the referenced lines)
sed -n '1,140p' .claude/scripts/monitor-rehearsal.sh | nl -baRepository: openshift/release
Length of output: 104
🌐 Web query:
gh pr checks output columns NAME STATUS URL awk $4
💡 Result:
The gh pr checks command in the GitHub CLI outputs CI status for a pull request as line-based, plain-text by default [1][2]. In its default text mode, the output is structured into columns typically containing the status, the name of the check, and other associated metadata (such as duration or links) [1][3]. When using awk to parse this output, $4 refers to the fourth column of text [4]. Depending on your specific CLI version and terminal configuration, the columns may vary, but they generally follow this layout: 1. Status (e.g., ✓, X, or a state label like "pass" or "fail") 2. Name of the check 3. Duration/Timestamp or other metadata 4. Additional information, such as a URL or a secondary status description [1][3] To work with this data programmatically, it is highly recommended to use the --json flag instead of parsing plain text [2]. This ensures stable, machine-readable output [2]. You can specify the desired fields using the --json flag and filter or format the results using --jq [1][2]. Example of using JSON for better reliability: gh pr checks --json name,state,link --jq '.[] | [.name,.state,.link] | @tsv' This approach avoids the fragility of column-based parsing with awk, as the output structure of the text-based table is subject to change across CLI versions [2].
Citations:
- 1: https://cli.github.com/manual/gh_pr_checks
- 2: https://cli.github.com/manual/gh_help_formatting
- 3: pr checks add ability to see output for a specific check cli/cli#1528
- 4: https://dev.to/137foundry/parsing-server-logs-for-seo-a-practical-analysts-guide-2aab
🏁 Script executed:
#!/bin/bash
# Show the exact code around the referenced lines, with line numbers (no nl dependency)
awk 'NR>=45 && NR<=100 {printf "%5d\t%s\n", NR, $0}' .claude/scripts/monitor-rehearsal.shRepository: openshift/release
Length of output: 2784
🏁 Script executed:
#!/bin/bash
wc -l .claude/scripts/monitor-rehearsal.shRepository: openshift/release
Length of output: 102
🏁 Script executed:
#!/bin/bash
# Show the argument parsing + any JOB_NAME documentation and how PROW_URL is later used
awk 'NR<=140 {printf "%5d\t%s\n", NR, $0}' .claude/scripts/monitor-rehearsal.shRepository: openshift/release
Length of output: 7072
🏁 Script executed:
#!/bin/bash
# Show the remainder to see if grep/awk parsing is used elsewhere, and how JOB_NAME is defined
awk 'NR>140 {printf "%5d\t%s\n", NR, $0}' .claude/scripts/monitor-rehearsal.shRepository: openshift/release
Length of output: 1934
🌐 Web query:
gh pr checks default text output columns example status check name url columns
💡 Result:
The GitHub CLI command gh pr checks does not natively support configurable columns for its default table output [1]. Users cannot specify which columns to display or hide using a flag like --columns [1]. However, you can achieve custom output by using the --json flag to retrieve specific fields and formatting them with the --template flag [2][1]. Default behavior: The command outputs a standard table showing the status of CI checks for a pull request. While the CLI does not provide a command to alter this specific table layout, users often use Go templates to mimic custom columns [1]. Customizing output (example): To display only specific information, such as the status and the link for each check, you can use the following command structure: gh pr checks --json state,link --template '{{tablerow "STATE" "URL"}}{{range.}}{{tablerow.state.link}}{{end}}' Available JSON fields: When using the --json flag, you can access the following fields to build your custom output [2][3]: - bucket - completedAt - description - event - link - name - startedAt - state - workflow Important limitation: The --json and --template options are not compatible with the --watch flag [1]. If you need to monitor checks until completion, you must use the standard table output without customization [1].
Citations:
- 1: Allow configuration of columns for
gh pr checkscli/cli#10848 - 2: https://cli.github.com/manual/gh_pr_checks
- 3: https://man.archlinux.org/man/gh-pr-checks.1.en
Make gh pr checks fail-fast and parse a single deterministic check (avoid grep + fixed-column awk).
In .claude/scripts/monitor-rehearsal.sh (around lines 61-79), the script swallows any non-zero gh pr checks output via || echo ..., so real auth/network/rate-limit failures become indistinguishable from “no check yet” and monitoring keeps polling. Also, grep "${JOB_NAME}" can match multiple checks, and then awk '{print $2}' / awk '{print $4}' assumes a stable column layout, so job_status/PROW_URL can be taken from the wrong row.
Capture gh pr checks once and exit on non-zero, then use gh pr checks --json name,state,link ... to select exactly one matching check (by exact name, or an explicit matching rule) and derive status/URL from the JSON fields (apply in both the final-status and polling paths).
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 61-61: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 69-69: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/scripts/monitor-rehearsal.sh around lines 61 - 79, Replace the
current grep+awk approach and the "|| echo" swallow with a fail-fast, single
JSON query: call gh pr checks ${PR_NUM} --repo openshift/release --json
name,state,url (or link) and exit if the gh command returns non-zero; then
filter the returned JSON to find exactly one check whose "name" equals
${JOB_NAME} (or apply the explicit matching rule) and set job_status from the
check's "state" field and PROW_URL from the check's "url"/"link" field; apply
this change in both the final-status branch (where final_status is derived) and
the polling branch (where status_line/job_status/PROW_URL are set) so
network/auth errors surface immediately and columns are not relied on.
| function fetch_url() { | ||
| local url="$1" | ||
| local curl_opts=(-sS) | ||
|
|
||
| if [[ "${PROW_FETCH_HEADERS:-0}" == "1" ]]; then | ||
| curl_opts+=(-i) | ||
| fi | ||
|
|
||
| log_info "Fetching: ${url}" | ||
| curl "${curl_opts[@]}" "${url}" |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does curl exit with a non-zero status for HTTP 404/500 responses by default, and what changes when using -f or --fail-with-body?
💡 Result:
By default, curl exits with status 0 even if the HTTP response code is 404 or 500; curl does not treat HTTP response codes as transfer failures unless you enable the “fail on HTTP errors” options. This is stated in the curl docs: “By default, curl does not consider HTTP response codes to indicate failure.” [1] When you use -f or --fail, curl will fail and exit non-zero (error 22) for HTTP responses with codes 400 or greater, and it will not output the response body. [1] When you use --fail-with-body, curl also exits non-zero (error 22) for HTTP response codes 400 or greater, but it differs from --fail by still outputting the response body (useful for debugging/parsing error responses). [1] Practical summary: - Default (no -f/--fail-with-body): 404/500 typically still exit 0; only transport errors (e.g., network) cause non-zero.[1] - -f/--fail: 404/500 cause exit code 22 and suppress the body.[1] - --fail-with-body: 404/500 cause exit code 22 but keep the body.[1]
Citations:
Fail HTTP errors in fetch_url()
curl -sS does not treat HTTP 404/500 as failures by default, so callers can continue as if the fetch succeeded (e.g., parsing an HTML error page as an artifact). Add -f to make HTTP 4xx/5xx exit non-zero.
Suggested fix
function fetch_url() {
local url="$1"
- local curl_opts=(-sS)
+ local curl_opts=(-fsS)
if [[ "${PROW_FETCH_HEADERS:-0}" == "1" ]]; then
curl_opts+=(-i)
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function fetch_url() { | |
| local url="$1" | |
| local curl_opts=(-sS) | |
| if [[ "${PROW_FETCH_HEADERS:-0}" == "1" ]]; then | |
| curl_opts+=(-i) | |
| fi | |
| log_info "Fetching: ${url}" | |
| curl "${curl_opts[@]}" "${url}" | |
| function fetch_url() { | |
| local url="$1" | |
| local curl_opts=(-fsS) | |
| if [[ "${PROW_FETCH_HEADERS:-0}" == "1" ]]; then | |
| curl_opts+=(-i) | |
| fi | |
| log_info "Fetching: ${url}" | |
| curl "${curl_opts[@]}" "${url}" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/scripts/prow-fetch.sh around lines 79 - 88, In fetch_url(), the curl
invocation uses curl_opts=(-sS) which doesn't fail on HTTP 4xx/5xx; update the
curl options in the fetch_url function (the curl_opts array) to include -f
(e.g., curl_opts=(-sS -f)) so that HTTP errors cause curl to exit non-zero;
preserve the existing PROW_FETCH_HEADERS handling and logging (log_info
"Fetching: ${url}") and ensure callers of fetch_url will observe failure return
codes.
| function pr_checks() { | ||
| local pr="$1" | ||
| local pattern="${2:-rehearse}" | ||
|
|
||
| log_info "Getting PR checks for #${pr}" | ||
| gh pr checks "${pr}" --repo openshift/release 2>/dev/null | grep "${pattern}" || true |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What exit status does gh pr checksreturn on authentication or request failures, and willcmd | grep ... || truemask that failure underset -o pipefail?
💡 Result:
The GitHub CLI command gh pr checks follows standard conventions for exit codes, generally returning 1 for generic failures [1][2][3]. Specifically, it returns an exit code of 4 when authentication is required [2][3]. Other documented exit codes for this command include 2 if the command is cancelled and 8 if there are pending checks [4][1][2][3]. Note that if no checks are found, the command may return exit code 16 [5]. Regarding your shell pipeline cmd | grep ... || true under set -o pipefail: Yes, appending || true will effectively mask the failure of the entire pipeline [6][7]. When set -o pipefail is enabled, a pipeline returns the exit status of the last command in the pipeline that failed, or 0 if all commands succeeded [6][8][7]. By placing || true at the end of the pipeline, you force the entire expression to return 0, regardless of whether cmd or grep encountered an error [6][7]. While this successfully prevents the script from exiting (if you also have set -e enabled), it also prevents you from detecting if cmd failed due to a genuine error, such as an authentication failure or a request error [6][7]. If you need to verify the success of the first command while ignoring grep's non-zero exit (which is common when grep finds no matches), it is better to handle the exit codes explicitly using the ${PIPESTATUS[@]} array in Bash [6][8].
Citations:
- 1: pr checks returns failure code if there are no checks cli/cli#9390
- 2: https://cli.github.com/manual/gh_help_exit-codes
- 3: https://man.archlinux.org/man/gh-pr-checks.1.en
- 4: https://cli.github.com/manual/gh_pr_checks
- 5:
gh pr checksuses a unique exit code when no checks are found cli/cli#9691 - 6: https://www.commandinline.com/bash-pipefail-fix-pipeline-errors/
- 7: https://unixy.io/blog/set-euo-pipefail-not-enough/
- 8: https://unix.stackexchange.com/questions/14270/get-exit-status-of-process-thats-piped-to-another
Don’t swallow gh pr checks failures as “no matches”
In .claude/scripts/prow-fetch.sh (lines 121-126), gh pr checks ... | grep "${pattern}" || true forces a zero exit even when gh pr checks fails (e.g., auth/network/permission errors—gh commonly exits non-zero). Since errors are also sent to /dev/null, the script can succeed with no output when GitHub lookup failed. Only the grep “no matches” case should be ignored.
Suggested fix
function pr_checks() {
local pr="$1"
local pattern="${2:-rehearse}"
+ local checks
log_info "Getting PR checks for #${pr}"
- gh pr checks "${pr}" --repo openshift/release 2>/dev/null | grep "${pattern}" || true
+ if ! checks=$(gh pr checks "${pr}" --repo openshift/release 2>/dev/null); then
+ log_error "gh pr checks failed for PR #${pr}"
+ return 1
+ fi
+
+ grep "${pattern}" <<< "${checks}" || true
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function pr_checks() { | |
| local pr="$1" | |
| local pattern="${2:-rehearse}" | |
| log_info "Getting PR checks for #${pr}" | |
| gh pr checks "${pr}" --repo openshift/release 2>/dev/null | grep "${pattern}" || true | |
| function pr_checks() { | |
| local pr="$1" | |
| local pattern="${2:-rehearse}" | |
| local checks | |
| log_info "Getting PR checks for #${pr}" | |
| if ! checks=$(gh pr checks "${pr}" --repo openshift/release 2>/dev/null); then | |
| log_error "gh pr checks failed for PR #${pr}" | |
| return 1 | |
| fi | |
| grep "${pattern}" <<< "${checks}" || true | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/scripts/prow-fetch.sh around lines 121 - 126, The pr_checks function
is currently swallowing failures from the gh command by piping its stderr to
/dev/null and using | grep ... || true; change it so gh pr checks failures are
propagated while still treating grep "no matches" as non-fatal: run gh pr checks
"${pr}" --repo openshift/release and check its exit status first (or enable
pipefail), capturing its stdout into a variable (e.g., checks_output) without
discarding stderr, return/exit if gh fails, then run grep "${pattern}" against
that captured output and ignore only grep’s non-zero exit for no matches; update
the pr_checks function accordingly so gh errors surface but missing pattern does
not fail the script.
| if oc get catalogsource -n openshift-marketplace "${TRUSTEE_CATALOG_SOURCE_NAME}" &>/dev/null; then | ||
| echo ">>> CatalogSource ${TRUSTEE_CATALOG_SOURCE_NAME} already exists, skipping creation" | ||
| return 0 |
There was a problem hiding this comment.
Fail fast when the requested trustee catalog name already exists with a different image.
With the new azure-ipi-coco wiring, TRUSTEE_CATALOG_SOURCE_NAME=brew-catalog already exists for the OSC catalog, so this branch skips creation and silently ignores TRUSTEE_CATALOG_SOURCE_IMAGE. The subscription then points at the wrong catalog source. Compare the existing spec.image and error out on mismatch, or require a distinct Trustee catalog name.
Suggested fix
if oc get catalogsource -n openshift-marketplace "${TRUSTEE_CATALOG_SOURCE_NAME}" &>/dev/null; then
- echo ">>> CatalogSource ${TRUSTEE_CATALOG_SOURCE_NAME} already exists, skipping creation"
- return 0
+ local existing_image=""
+ existing_image=$(oc get catalogsource -n openshift-marketplace "${TRUSTEE_CATALOG_SOURCE_NAME}" -o jsonpath='{.spec.image}' 2>/dev/null || true)
+ if [[ -n "${TRUSTEE_CATALOG_SOURCE_IMAGE}" && -n "${existing_image}" && "${existing_image}" != "${TRUSTEE_CATALOG_SOURCE_IMAGE}" ]]; then
+ echo ">>> ERROR: CatalogSource ${TRUSTEE_CATALOG_SOURCE_NAME} already exists with image ${existing_image}, expected ${TRUSTEE_CATALOG_SOURCE_IMAGE}" >&2
+ return 1
+ fi
+ echo ">>> CatalogSource ${TRUSTEE_CATALOG_SOURCE_NAME} already exists, reusing it"
+ return 0
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if oc get catalogsource -n openshift-marketplace "${TRUSTEE_CATALOG_SOURCE_NAME}" &>/dev/null; then | |
| echo ">>> CatalogSource ${TRUSTEE_CATALOG_SOURCE_NAME} already exists, skipping creation" | |
| return 0 | |
| if oc get catalogsource -n openshift-marketplace "${TRUSTEE_CATALOG_SOURCE_NAME}" &>/dev/null; then | |
| local existing_image="" | |
| existing_image=$(oc get catalogsource -n openshift-marketplace "${TRUSTEE_CATALOG_SOURCE_NAME}" -o jsonpath='{.spec.image}' 2>/dev/null || true) | |
| if [[ -n "${TRUSTEE_CATALOG_SOURCE_IMAGE}" && -n "${existing_image}" && "${existing_image}" != "${TRUSTEE_CATALOG_SOURCE_IMAGE}" ]]; then | |
| echo ">>> ERROR: CatalogSource ${TRUSTEE_CATALOG_SOURCE_NAME} already exists with image ${existing_image}, expected ${TRUSTEE_CATALOG_SOURCE_IMAGE}" >&2 | |
| return 1 | |
| fi | |
| echo ">>> CatalogSource ${TRUSTEE_CATALOG_SOURCE_NAME} already exists, reusing it" | |
| return 0 | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-commands.sh`
around lines 88 - 90, When detecting an existing CatalogSource named by
TRUSTEE_CATALOG_SOURCE_NAME (the branch around the oc get check), also fetch its
spec.image and compare it to TRUSTEE_CATALOG_SOURCE_IMAGE; if the images differ,
fail fast with a non-zero exit and a clear error message describing the name and
both image values so we don't silently point subscriptions to the wrong catalog.
Update the existing conditional that currently returns 0 to perform the image
comparison (using oc to read spec.image for the CatalogSource) and only skip
creation when they match; otherwise exit with an error asking for a distinct
TRUSTEE_CATALOG_SOURCE_NAME or to update TRUSTEE_CATALOG_SOURCE_IMAGE.
| # Apply CatalogSource if needed (only if TRUSTEE_CATALOG_SOURCE_IMAGE is set and catalog doesn't exist) | ||
| get_trustee_catalog_source_manifest | \ | ||
| sed "s@TRUSTEE_CATALOG_SOURCE_NAME_PLACEHOLDER@${TRUSTEE_CATALOG_SOURCE_NAME}@g" | \ | ||
| sed "s@TRUSTEE_CATALOG_SOURCE_IMAGE_PLACEHOLDER@${TRUSTEE_CATALOG_SOURCE_IMAGE}@g" | \ | ||
| oc apply -f - || true |
There was a problem hiding this comment.
Don’t swallow CatalogSource apply failures.
oc apply -f - || true turns real creation errors into later OLM timeouts, which makes this step much harder to debug. It’s better to skip oc apply only when no manifest was emitted and let actual apply failures stop the step.
Suggested fix
- get_trustee_catalog_source_manifest | \
- sed "s@TRUSTEE_CATALOG_SOURCE_NAME_PLACEHOLDER@${TRUSTEE_CATALOG_SOURCE_NAME}`@g`" | \
- sed "s@TRUSTEE_CATALOG_SOURCE_IMAGE_PLACEHOLDER@${TRUSTEE_CATALOG_SOURCE_IMAGE}`@g`" | \
- oc apply -f - || true
+ local catalog_manifest=""
+ catalog_manifest="$(
+ get_trustee_catalog_source_manifest | \
+ sed "s@TRUSTEE_CATALOG_SOURCE_NAME_PLACEHOLDER@${TRUSTEE_CATALOG_SOURCE_NAME}`@g`" | \
+ sed "s@TRUSTEE_CATALOG_SOURCE_IMAGE_PLACEHOLDER@${TRUSTEE_CATALOG_SOURCE_IMAGE}`@g`"
+ )"
+ if [[ -n "${catalog_manifest//[[:space:]]/}" ]]; then
+ printf '%s\n' "${catalog_manifest}" | oc apply -f -
+ fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Apply CatalogSource if needed (only if TRUSTEE_CATALOG_SOURCE_IMAGE is set and catalog doesn't exist) | |
| get_trustee_catalog_source_manifest | \ | |
| sed "s@TRUSTEE_CATALOG_SOURCE_NAME_PLACEHOLDER@${TRUSTEE_CATALOG_SOURCE_NAME}@g" | \ | |
| sed "s@TRUSTEE_CATALOG_SOURCE_IMAGE_PLACEHOLDER@${TRUSTEE_CATALOG_SOURCE_IMAGE}@g" | \ | |
| oc apply -f - || true | |
| # Apply CatalogSource if needed (only if TRUSTEE_CATALOG_SOURCE_IMAGE is set and catalog doesn't exist) | |
| local catalog_manifest="" | |
| catalog_manifest="$( | |
| get_trustee_catalog_source_manifest | \ | |
| sed "s@TRUSTEE_CATALOG_SOURCE_NAME_PLACEHOLDER@${TRUSTEE_CATALOG_SOURCE_NAME}`@g`" | \ | |
| sed "s@TRUSTEE_CATALOG_SOURCE_IMAGE_PLACEHOLDER@${TRUSTEE_CATALOG_SOURCE_IMAGE}`@g`" | |
| )" | |
| if [[ -n "${catalog_manifest//[[:space:]]/}" ]]; then | |
| printf '%s\n' "${catalog_manifest}" | oc apply -f - | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-commands.sh`
around lines 297 - 301, The current pipeline swallows real oc apply failures by
piping get_trustee_catalog_source_manifest into oc apply -f - || true; instead
capture the manifest output from get_trustee_catalog_source_manifest
(referencing that function) into a variable or temp file, check if it is
non-empty (i.e. a manifest was emitted using TRUSTEE_CATALOG_SOURCE_NAME and
TRUSTEE_CATALOG_SOURCE_IMAGE substitutions), and only then run oc apply -f -
without "|| true" so failures fail the step and surface immediately; if the
manifest is empty, skip the apply path gracefully.
| function get_kbs_client_manifest() { | ||
| cat << 'MANIFEST_EOF' | ||
| --- | ||
| apiVersion: v1 | ||
| kind: Pod | ||
| metadata: | ||
| name: KBS_CLIENT_POD_PLACEHOLDER | ||
| namespace: KBS_CLIENT_NAMESPACE_PLACEHOLDER | ||
| spec: | ||
| containers: | ||
| - name: kbs-client | ||
| image: KBS_CLIENT_IMAGE_PLACEHOLDER | ||
| command: ["sleep", "infinity"] | ||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| runAsNonRoot: true | ||
| seccompProfile: | ||
| type: RuntimeDefault | ||
| capabilities: | ||
| drop: | ||
| - ALL | ||
| restartPolicy: Never | ||
| MANIFEST_EOF |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Harden the temporary kbs-client pod manifest.
This pod still gets a service-account token by default and it omits readOnlyRootFilesystem plus any resource requests/limits. For a one-shot verification pod, that’s unnecessary exposure and it also misses the repo’s manifest requirements.
Suggested fix
spec:
+ automountServiceAccountToken: false
containers:
- name: kbs-client
image: KBS_CLIENT_IMAGE_PLACEHOLDER
command: ["sleep", "infinity"]
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
+ readOnlyRootFilesystem: true
seccompProfile:
type: RuntimeDefault
capabilities:
drop:
- ALL
+ resources:
+ requests:
+ cpu: 10m
+ memory: 64Mi
+ limits:
+ cpu: 100m
+ memory: 256Mi
restartPolicy: NeverAs per coding guidelines: securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false, Resource limits (cpu, memory) on every container, and automountServiceAccountToken: false unless needed.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function get_kbs_client_manifest() { | |
| cat << 'MANIFEST_EOF' | |
| --- | |
| apiVersion: v1 | |
| kind: Pod | |
| metadata: | |
| name: KBS_CLIENT_POD_PLACEHOLDER | |
| namespace: KBS_CLIENT_NAMESPACE_PLACEHOLDER | |
| spec: | |
| containers: | |
| - name: kbs-client | |
| image: KBS_CLIENT_IMAGE_PLACEHOLDER | |
| command: ["sleep", "infinity"] | |
| securityContext: | |
| allowPrivilegeEscalation: false | |
| runAsNonRoot: true | |
| seccompProfile: | |
| type: RuntimeDefault | |
| capabilities: | |
| drop: | |
| - ALL | |
| restartPolicy: Never | |
| MANIFEST_EOF | |
| function get_kbs_client_manifest() { | |
| cat << 'MANIFEST_EOF' | |
| --- | |
| apiVersion: v1 | |
| kind: Pod | |
| metadata: | |
| name: KBS_CLIENT_POD_PLACEHOLDER | |
| namespace: KBS_CLIENT_NAMESPACE_PLACEHOLDER | |
| spec: | |
| automountServiceAccountToken: false | |
| containers: | |
| - name: kbs-client | |
| image: KBS_CLIENT_IMAGE_PLACEHOLDER | |
| command: ["sleep", "infinity"] | |
| securityContext: | |
| allowPrivilegeEscalation: false | |
| runAsNonRoot: true | |
| readOnlyRootFilesystem: true | |
| seccompProfile: | |
| type: RuntimeDefault | |
| capabilities: | |
| drop: | |
| - ALL | |
| resources: | |
| requests: | |
| cpu: 10m | |
| memory: 64Mi | |
| limits: | |
| cpu: 100m | |
| memory: 256Mi | |
| restartPolicy: Never | |
| MANIFEST_EOF |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-commands.sh`
around lines 693 - 715, The KBS client pod manifest returned by
get_kbs_client_manifest currently omits readOnlyRootFilesystem, resource
requests/limits, and leaves a service account token automounted; update the Pod
spec (in get_kbs_client_manifest) to set automountServiceAccountToken: false on
the Pod, add container securityContext.readOnlyRootFilesystem: true (in the
kbs-client container), keep allowPrivilegeEscalation: false and runAsNonRoot:
true, and add resource requests and limits (cpu and memory) for the kbs-client
container to satisfy the repo manifest requirements.
| - name: TRUSTEE_IMAGE_REPO | ||
| default: "quay.io/redhat-user-workloads/ose-osc-tenant/trustee-test-fbc" | ||
| documentation: |- | ||
| (DEPRECATED - only used when TRUSTEE_CATALOG_SOURCE_IMAGE is set) | ||
| The container image repository for the trustee operator catalog | ||
| - name: TRUSTEE_IMAGE_TAG | ||
| default: "1.1.0-1776506656" | ||
| documentation: |- | ||
| (DEPRECATED - only used when TRUSTEE_CATALOG_SOURCE_IMAGE is set) | ||
| The container image tag for the trustee operator catalog | ||
| - name: TRUSTEE_CHARTS_REF | ||
| default: "main" | ||
| documentation: |- | ||
| The git ref (branch/tag/commit) to use from the confidential-devhub/charts repository |
There was a problem hiding this comment.
Remove or implement the dead step inputs.
TRUSTEE_IMAGE_REPO, TRUSTEE_IMAGE_TAG, and TRUSTEE_CHARTS_REF are advertised here, but the companion commands script never consumes any of them. Right now callers can set three public knobs and get no behavior change, which is a config-contract break.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.yaml`
around lines 32 - 45, The three advertised step inputs TRUSTEE_IMAGE_REPO,
TRUSTEE_IMAGE_TAG, and TRUSTEE_CHARTS_REF are unused and must either be removed
from the step spec or actually consumed; choose one fix: (A) Remove the three
entries from the step inputs in
sandboxed-containers-operator-install-trustee-operator-ref.yaml to avoid
exposing dead knobs, or (B) implement consumption by plumbing those env vars
into the companion install-trustee-operator command/script and use them where
the trustee catalog image and chart ref are built/queried (e.g., in the code
paths that construct the trustee catalog source image name and the charts git
ref); ensure variable names TRUSTEE_IMAGE_REPO, TRUSTEE_IMAGE_TAG, and
TRUSTEE_CHARTS_REF are read from the environment and override defaults when
present.
| NO NETWORK ACCESS REQUIRED: This step uses pre-rendered manifests with runtime variable | ||
| substitution via sed, eliminating the need for helm or git. Works with restrict_network_access: true. |
There was a problem hiding this comment.
The “no network access required” claim is too strong.
The script still does a skopeo list-tags against Quay and launches a pod from quay.io/confidential-containers/kbs-client:${tag}. That means this step is not actually self-contained for restricted-network jobs unless those external pulls are already mirrored elsewhere.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.yaml`
around lines 66 - 67, The "NO NETWORK ACCESS REQUIRED" statement is incorrect
because the step invokes skopeo list-tags and pulls
quay.io/confidential-containers/kbs-client:${tag}, so either update the claim to
indicate external network access is required or change the step to use a
configurable/mirrored image and avoid remote tag lookups; specifically modify
the logic that calls skopeo list-tags and the hardcoded
quay.io/confidential-containers/kbs-client:${tag} reference to instead read an
injected variable (e.g., KBS_CLIENT_IMAGE or IMAGE_REGISTRY+IMAGE_NAME+TAG) and
skip skopeo when restrict_network_access is true, or document the requirement to
have the image mirrored locally.
- Add missing OWNERS file for step-registry-metadata check - Fix shellcheck SC2155 warnings by declaring and assigning variables separately Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.metadata.json`:
- Line 17: There is an extraneous backslash character "\" immediately after the
closing brace "}" in the JSON metadata, which breaks JSON syntax; remove that
trailing backslash so the file ends with a single "}" (no trailing escape
character) to restore valid JSON structure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: e6bfc52e-3b14-4541-87c4-a22c2a2db804
📒 Files selected for processing (1)
ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.metadata.json
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-azure-ipi-coco |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@tbuskey: job(s): periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-azure-ipi-coco either don't exist or were not found to be affected, and cannot be rehearsed |
|
/test generated-config |
|
/pj-rehearse list |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@tbuskey: job(s): periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-aws-ipi-coco either don't exist or were not found to be affected, and cannot be rehearsed |
1 similar comment
|
@tbuskey: job(s): periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-aws-ipi-coco either don't exist or were not found to be affected, and cannot be rehearsed |
|
/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
Add TRUSTEE_INSTALL=true and trustee catalog env vars to: - azure-ipi-coco (all candidate versions) - aro-ipi-coco (all candidate versions) - aws-ipi-coco (already done, this completes the set) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
Enable TRUSTEE_INSTALL for all CoCo jobs (aws-ipi-coco, azure-ipi-coco, aro-ipi-coco) across candidate versions 4.17-4.21. Fixed: Used Python YAML library which reformatted all quotes. Now using surgical line insertion to preserve exact formatting. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
@tbuskey: |
|
@openshift-ci[bot]: your |
|
@tbuskey: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/pj-rehearse list |
|
/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-aws-ipi-coco |
The determinize-ci-operator tool alphabetically sorted TRUSTEE_* environment variables in all candidate configs. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-aws-ipi-coco |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@tbuskey: job(s): periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-aws-ipi-coco either don't exist or were not found to be affected, and cannot be rehearsed |
|
/pj-rehearse list |
|
/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
1 similar comment
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
Changed from brew-catalog to trustee-catalog in all candidate configs. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
[REHEARSALNOTIFIER]
A total of 31 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse abort periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@tbuskey: job(s): abort, periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco either don't exist or were not found to be affected, and cannot be rehearsed |
|
@tbuskey: job(s): periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco either don't exist or were not found to be affected, and cannot be rehearsed |
Summary
This PR integrates the
install-trustee-operatorstep into the CoCo test workflow for the sandboxed-containers-operator.Changes
install-trustee-operatorstep tosandboxed-containers-operator-prechainTRUSTEE_INSTALL=trueforazure-ipi-cocotestTRUSTEE_CATALOG_SOURCE_IMAGEandTRUSTEE_CATALOG_SOURCE_NAMEenvironment variablesDetails
The install-trustee-operator step:
Testing
The step will be tested via rehearsal on the azure-ipi-coco job.
🤖 Generated with Claude Code
Summary by CodeRabbit
This PR updates OpenShift CI configuration in the openshift/release repository to add an automated trustee-operator installation step into the Sandboxed Containers Operator CI workflow and enables it for the azure-ipi-coco test job.
What changed (practical terms)
Files/locations changed
Impact