Skip to content

CRT Config Monitor for Ship Status#79397

Open
thiagoalessio wants to merge 1 commit into
openshift:mainfrom
thiagoalessio:add-crt-to-ship-status
Open

CRT Config Monitor for Ship Status#79397
thiagoalessio wants to merge 1 commit into
openshift:mainfrom
thiagoalessio:add-crt-to-ship-status

Conversation

@thiagoalessio
Copy link
Copy Markdown
Member

@thiagoalessio thiagoalessio commented May 18, 2026

Related PR: https://github.com/openshift/continuous-release-jobs/pull/1792

This PR adds monitoring and dashboard configuration for CRT (Continuous Release Team) services in the OpenShift CI infrastructure's Ship Status system.

Changes

Monitor Configuration (component-monitor-config.yaml):

  • Added HTTP health monitors for Release Controller services across multiple architectures (amd64, arm64, multi, ppc64le, s390x) and their privileged variants, configured with appropriate expected HTTP status codes (200 for standard endpoints, 403 for privileged endpoints) and 5-second retry intervals.
  • Added HTTP health monitors for two CRT services: Backstage (expecting 403 response) and CI-Search (expecting 200 response), both with 5-second retry intervals.

Dashboard Configuration (dashboard-config.yaml):

  • Introduced a "Release Controller" component to the Ship Status dashboard with sub-components for each Release Controller variant and architecture, owned by the continuous-release-team and monitored via the CRT component monitor.
  • Introduced a "CRT" component to the Ship Status dashboard with sub-components for Backstage and CI-Search services, similarly configured with ownership and monitoring settings.

These changes enable the Ship Status dashboard to display the health and status of CRT services alongside existing infrastructure components, providing visibility into the continuous release infrastructure's operational state.

@openshift-ci openshift-ci Bot requested review from Prucek and pruan-rht May 18, 2026 10:26
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 18, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: thiagoalessio
Once this PR has been reviewed and has the lgtm label, please assign neisw for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot Bot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label May 18, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@thiagoalessio: no rehearsable tests are affected by this change

Note: If this PR includes changes to step registry files (ci-operator/step-registry/) and you expected jobs to be found, try rebasing your PR onto the base branch. This helps pj-rehearse accurately detect changes when the base branch has moved forward.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Walkthrough

This PR adds monitoring and dashboard configuration for Release Controller and CRT services. Two YAML configuration files are updated with HTTP health check monitors, dashboard components, ownership metadata, and service account associations to enable ship-status visibility and alerting.

Changes

Ship-status Monitoring and Dashboard Configuration

Layer / File(s) Summary
Release Controller monitoring and dashboard
core-services/ship-status/component-monitor-config.yaml, core-services/ship-status/dashboard-config.yaml
Release Controller HTTP monitors are configured for multiple architecture variants (amd64, arm64, multi, ppc64le, s390x) and privileged (-priv) sub-components with expected HTTP status codes (200 for standard, 403 for privileged endpoints). Dashboard component definition includes sub-component metadata, monitoring configuration targeting crt-component-monitor, and service account ownership.
CRT services monitoring and dashboard
core-services/ship-status/component-monitor-config.yaml, core-services/ship-status/dashboard-config.yaml
CRT service HTTP monitors for backstage (expecting 403) and ci-search (expecting 200) with 5-second retry intervals. Dashboard component adds corresponding sub-components with monitoring targets and consistent ownership metadata.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'CRT Config Monitor for Ship Status' directly and clearly summarizes the main change: adding CRT (Continuous Release Team) configuration monitoring to the Ship Status system.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR modifies only YAML configuration files. No Ginkgo tests or test definitions exist. Custom check for test names is not applicable.
Test Structure And Quality ✅ Passed This PR modifies only YAML configuration files. The custom check requires reviewing Ginkgo test code. No Go test files are present in this PR, so the check is not applicable.
Microshift Test Compatibility ✅ Passed Custom check not applicable. PR modifies only YAML configuration files for ship-status monitoring and dashboard setup. No Ginkgo e2e tests are added or modified.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR adds only YAML configuration files for ship-status monitors and dashboards. No Ginkgo e2e tests are added, so the SNO test compatibility check does not apply.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces only configuration file changes (monitoring and dashboard configs), not deployment manifests, operator code, or controllers. No scheduling constraints present. Check not applicable.
Ote Binary Stdout Contract ✅ Passed PR contains only YAML configuration file changes for Ship Status monitoring and dashboard components. No code with stdout writes, logging initialization, or OTE binary interactions is present.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Check not applicable. PR adds only YAML configuration files, not Ginkgo e2e tests. The check applies only when new Ginkgo e2e tests are added.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 18, 2026

@thiagoalessio: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core-services/ship-status/component-monitor-config.yaml`:
- Around line 558-625: Add a missing monitor entry for the release-controller
sub_component_slug "dpcr-openshift-release": create a block matching the other
release-controller entries (component_slug "release-controller",
sub_component_slug "dpcr-openshift-release") with an http_monitor that points to
the dpcr-openshift-release service (e.g.,
https://dpcr-openshift-release.apps.ci.l2s4.p1.openshiftapps.com), set the
expected code to 200 and retry_after to 5s, and insert it alongside the other
Release Controller entries just before the "END: Release Controller entries"
marker.
- Line 564: Check and confirm whether the shorter retry interval is intentional:
review the Release Controller entries and CRT Services entries that set
"retry_after: 5s" and either (a) change them to match existing Prow monitors'
"retry_after: 4m" if they should follow the same backoff policy, or (b) keep
"retry_after: 5s" but add an inline comment above those entries explaining the
rationale and risk tradeoffs for the 5s interval; update the Release Controller
and CRT Services monitor blocks that currently contain "retry_after: 5s"
accordingly so the intent is explicit.

In `@core-services/ship-status/dashboard-config.yaml`:
- Around line 543-545: Update the incorrect namespace on the component-monitor
service account references: replace occurrences of
"system:serviceaccount:crt-argocd:component-monitor" with
"system:serviceaccount:ship-status:component-monitor" (the entries under the
owners list where the service account is specified) so they point to the actual
service account defined in the ship-status namespace and match other components
in this file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 8a1f20e2-9dc6-4a81-92f3-3c8ce3dbbd74

📥 Commits

Reviewing files that changed from the base of the PR and between 5a5a25f and c7672b5.

📒 Files selected for processing (2)
  • core-services/ship-status/component-monitor-config.yaml
  • core-services/ship-status/dashboard-config.yaml

Comment on lines +558 to +625
# BEGIN: Release Controller entries
- component_slug: "release-controller"
sub_component_slug: "openshift-release"
http_monitor:
url: "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "origin-release"
http_monitor:
url: "https://origin-release.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-arm64"
http_monitor:
url: "https://openshift-release-arm64.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-multi"
http_monitor:
url: "https://openshift-release-multi.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-ppc64le"
http_monitor:
url: "https://openshift-release-ppc64le.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-s390x"
http_monitor:
url: "https://openshift-release-s390x.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-priv"
http_monitor:
url: "https://openshift-release-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-arm64-priv"
http_monitor:
url: "https://openshift-release-arm64-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-multi-priv"
http_monitor:
url: "https://openshift-release-multi-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-ppc64le-priv"
http_monitor:
url: "https://openshift-release-ppc64le-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-s390x-priv"
http_monitor:
url: "https://openshift-release-s390x-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
# END: Release Controller entries
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing monitor configuration for dpcr-openshift-release sub-component.

The dashboard configuration (dashboard-config.yaml lines 535-542) includes a dpcr-openshift-release sub-component, but there is no corresponding monitor entry in this file. This will cause the dashboard to display a component without any health monitoring.

📊 Proposed fix: Add monitor for dpcr-openshift-release

Add the following entry after line 624 (before the "END: Release Controller entries" comment):

       code: 403
       retry_after: 5s
+  - component_slug: "release-controller"
+    sub_component_slug: "dpcr-openshift-release"
+    http_monitor:
+      url: "https://openshift-release.apps.cr.j7t7.p1.openshiftapps.com"
+      code: 200
+      retry_after: 5s
   # END: Release Controller entries
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# BEGIN: Release Controller entries
- component_slug: "release-controller"
sub_component_slug: "openshift-release"
http_monitor:
url: "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "origin-release"
http_monitor:
url: "https://origin-release.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-arm64"
http_monitor:
url: "https://openshift-release-arm64.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-multi"
http_monitor:
url: "https://openshift-release-multi.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-ppc64le"
http_monitor:
url: "https://openshift-release-ppc64le.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-s390x"
http_monitor:
url: "https://openshift-release-s390x.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-priv"
http_monitor:
url: "https://openshift-release-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-arm64-priv"
http_monitor:
url: "https://openshift-release-arm64-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-multi-priv"
http_monitor:
url: "https://openshift-release-multi-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-ppc64le-priv"
http_monitor:
url: "https://openshift-release-ppc64le-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-s390x-priv"
http_monitor:
url: "https://openshift-release-s390x-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
# END: Release Controller entries
# BEGIN: Release Controller entries
- component_slug: "release-controller"
sub_component_slug: "openshift-release"
http_monitor:
url: "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "origin-release"
http_monitor:
url: "https://origin-release.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-arm64"
http_monitor:
url: "https://openshift-release-arm64.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-multi"
http_monitor:
url: "https://openshift-release-multi.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-ppc64le"
http_monitor:
url: "https://openshift-release-ppc64le.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-s390x"
http_monitor:
url: "https://openshift-release-s390x.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-priv"
http_monitor:
url: "https://openshift-release-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-arm64-priv"
http_monitor:
url: "https://openshift-release-arm64-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-multi-priv"
http_monitor:
url: "https://openshift-release-multi-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-ppc64le-priv"
http_monitor:
url: "https://openshift-release-ppc64le-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "openshift-release-s390x-priv"
http_monitor:
url: "https://openshift-release-s390x-priv.apps.ci.l2s4.p1.openshiftapps.com"
code: 403
retry_after: 5s
- component_slug: "release-controller"
sub_component_slug: "dpcr-openshift-release"
http_monitor:
url: "https://openshift-release.apps.cr.j7t7.p1.openshiftapps.com"
code: 200
retry_after: 5s
# END: Release Controller entries
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core-services/ship-status/component-monitor-config.yaml` around lines 558 -
625, Add a missing monitor entry for the release-controller sub_component_slug
"dpcr-openshift-release": create a block matching the other release-controller
entries (component_slug "release-controller", sub_component_slug
"dpcr-openshift-release") with an http_monitor that points to the
dpcr-openshift-release service (e.g.,
https://dpcr-openshift-release.apps.ci.l2s4.p1.openshiftapps.com), set the
expected code to 200 and retry_after to 5s, and insert it alongside the other
Release Controller entries just before the "END: Release Controller entries"
marker.

http_monitor:
url: "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com"
code: 200
retry_after: 5s
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check retry_after values across all HTTP monitors
rg -n 'retry_after:' core-services/ship-status/component-monitor-config.yaml

Repository: openshift/release

Length of output: 449


🏁 Script executed:

# Check the context around the new retry_after values to see if there are explanatory comments
head -n 650 core-services/ship-status/component-monitor-config.yaml | tail -n 100

Repository: openshift/release

Length of output: 3537


🏁 Script executed:

# Check git history for when these lines were added
git log --oneline -p core-services/ship-status/component-monitor-config.yaml | head -n 200

Repository: openshift/release

Length of output: 6442


Confirm that retry_after: 5s is intentional for Release Controller and CRT services.

The Release Controller entries (lines 564–624) and CRT Services entries (lines 633–639) all use retry_after: 5s, while existing Prow monitors use retry_after: 4m (lines 8, 109). This 48× difference in retry frequency is consistent across all new entries, suggesting intentional design, but no comments explain the rationale for the difference.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core-services/ship-status/component-monitor-config.yaml` at line 564, Check
and confirm whether the shorter retry interval is intentional: review the
Release Controller entries and CRT Services entries that set "retry_after: 5s"
and either (a) change them to match existing Prow monitors' "retry_after: 4m" if
they should follow the same backoff policy, or (b) keep "retry_after: 5s" but
add an inline comment above those entries explaining the rationale and risk
tradeoffs for the 5s interval; update the Release Controller and CRT Services
monitor blocks that currently contain "retry_after: 5s" accordingly so the
intent is explicit.

Comment on lines +543 to +545
owners:
- rover_group: "continuous-release-team"
- service_account: "system:serviceaccount:crt-argocd:component-monitor"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the crt-argocd service account exists and compare with referenced component-monitor

# Look for any references to crt-argocd namespace in the codebase
rg -n 'crt-argocd' --type yaml

# Look for component-monitor service account definitions
fd -e yaml -e yml | xargs rg -l 'kind:\s*ServiceAccount' | xargs rg -A5 'name:\s*component-monitor'

Repository: openshift/release

Length of output: 3189


🏁 Script executed:

# Search for all service accounts in crt-argocd namespace
rg -A3 'namespace:\s*crt-argocd' --type yaml | rg -B3 'kind:\s*ServiceAccount'

# Also search for any ServiceAccount definitions with crt-argocd context
fd -e yaml -e yml | xargs rg -B5 -A5 'namespace:\s*crt-argocd' | grep -A10 'ServiceAccount'

# Check if crt-argocd namespace is defined anywhere
rg 'namespace.*crt-argocd' --type yaml -B2 -A2

Repository: openshift/release

Length of output: 43


🏁 Script executed:

# Search for all references to crt-argocd in YAML files
rg 'crt-argocd' --type yaml -B2 -A2

# Search for all ServiceAccount definitions in the codebase
fd -e yaml -e yml -exec grep -l 'kind:\s*ServiceAccount' {} \; | head -20

# Look for Release Controller and CRT related configs
rg -i 'release.*controller|continuous.*release' --type yaml -B1 -A1 | head -50

Repository: openshift/release

Length of output: 4665


🏁 Script executed:

# Search for all ServiceAccount definitions
rg 'kind:\s*ServiceAccount' --type yaml -A2

# Look for all namespaces in serviceaccount.yaml to understand the full context
cat clusters/build-clusters/common/ship-status/serviceaccount.yaml | head -100

Repository: openshift/release

Length of output: 50374


🏁 Script executed:

cat clusters/build-clusters/common/ship-status/serviceaccount.yaml

Repository: openshift/release

Length of output: 1048


Fix namespace reference for component-monitor service account.

The service account referenced at lines 545 and 568 uses system:serviceaccount:crt-argocd:component-monitor, but this account does not exist in the crt-argocd namespace. The component-monitor service account is defined only in the ship-status namespace (clusters/build-clusters/common/ship-status/serviceaccount.yaml). Update the references to use system:serviceaccount:ship-status:component-monitor to match the actual service account location, consistent with other components in this file.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core-services/ship-status/dashboard-config.yaml` around lines 543 - 545,
Update the incorrect namespace on the component-monitor service account
references: replace occurrences of
"system:serviceaccount:crt-argocd:component-monitor" with
"system:serviceaccount:ship-status:component-monitor" (the entries under the
owners list where the service account is specified) so they point to the actual
service account defined in the ship-status namespace and match other components
in this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant