Skip to content

AGENT-1488: Aggregate IRI status from MachineConfigNodes#5841

Open
bfournie wants to merge 1 commit intoopenshift:mainfrom
bfournie:iri-status-aggregation
Open

AGENT-1488: Aggregate IRI status from MachineConfigNodes#5841
bfournie wants to merge 1 commit intoopenshift:mainfrom
bfournie:iri-status-aggregation

Conversation

@bfournie
Copy link
Copy Markdown
Contributor

@bfournie bfournie commented Apr 9, 2026

Add cluster-wide InternalReleaseImage status aggregation from all master MachineConfigNodes. For each release bundle, aggregates conditions (Available, Degraded, Installing, Removing) across nodes.

Status messages include node counts and degraded node names for operational visibility. Follows the Node Controller → MCP aggregation pattern.

- What I did

Add cluster-wide InternalReleaseImage status aggregation from all master MachineConfigNodes. For each release bundle, aggregates conditions (Available, Degraded, Installing, Removing) across nodes.

- How to verify it

oc get internalreleaseimage cluster -o yaml
e.g.

status:
    releases:
    - name: ocp-release-bundle-4.22.0-x86_64
      image: localhost:22625/openshift/release-images@sha256:...
      conditions:
      - type: Available
        status: "True"
        reason: AllNodesAvailable
        message: "Release ocp-release-bundle-4.22.0-x86_64 is available on all 3 nodes"
      - type: Degraded
        status: "False"
        reason: NoNodesDegraded

Check that Available=True when all master MCNs report Available=True

oc get internalreleaseimage cluster -o jsonpath='{.status.releases[0].conditions[?(@.type=="Available")]}'
 Expected: status: "True", reason: "AllNodesAvailable", message shows "X/X nodes"

Scenario B: Simulate degraded node (if possible)

  • Cause one MCN to report Degraded=True for a release
  • Verify IRI status shows:
    - Available: "False" with node count in message
    - Degraded: "True" with degraded node name in message

- Description for the changelog

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 9, 2026

@bfournie: This pull request references AGENT-1488 which is a valid jira issue.

Details

In response to this:

Add cluster-wide InternalReleaseImage status aggregation from all master MachineConfigNodes. For each release bundle, aggregates conditions (Available, Degraded, Installing, Removing) across nodes.

Status messages include node counts and degraded node names for operational visibility. Follows the Node Controller → MCP aggregation pattern.

- What I did

Add cluster-wide InternalReleaseImage status aggregation from all master MachineConfigNodes. For each release bundle, aggregates conditions (Available, Degraded, Installing, Removing) across nodes.

- How to verify it

View cluster-wide aggregated status

oc get internalreleaseimage cluster -o yaml
e.g.

status:
   releases:
   - name: ocp-release-bundle-4.22.0-x86_64
     image: localhost:22625/openshift/release-images@sha256:...
     conditions:
     - type: Available
       status: "True"
       reason: AllNodesAvailable
       message: "Release ocp-release-bundle-4.22.0-x86_64 is available on all 3 nodes"
     - type: Degraded
       status: "False"
       reason: NoNodesDegraded

Check that Available=True when all master MCNs report Available=True

oc get internalreleaseimage cluster -o jsonpath='{.status.releases[0].conditions[?(@.type=="Available")]}'
 Expected: status: "True", reason: "AllNodesAvailable", message shows "X/X nodes"

Scenario B: Simulate degraded node (if possible)

  • Cause one MCN to report Degraded=True for a release
  • Verify IRI status shows:
   - Available: "False" with node count in message
   - Degraded: "True" with degraded node name in message

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 9, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

The InternalReleaseImage controller now watches MachineConfigNode resources, aggregates per-master-node release statuses into cluster-wide release bundles, enqueues reconciliation on MCN status changes, and updates controller wiring and tests to include the MCN informer and aggregation logic.

Changes

Cohort / File(s) Summary
Controller Initialization
cmd/machine-config-controller/start.go
Pass MachineConfigNodes() informer into the InternalReleaseImage controller constructor.
Aggregation Logic
pkg/controller/internalreleaseimage/aggregation.go
New file implementing aggregateMCNIRIStatus, aggregateReleaseStatus, and helpers to filter master MCNs, count per-release condition states (Available, Degraded, Installing, Removing, Mounted), select cluster image pullspec, and build deterministic aggregated InternalReleaseImageBundleStatus entries.
Aggregation Tests
pkg/controller/internalreleaseimage/aggregation_test.go
New unit tests covering aggregation scenarios (all-available, partial availability with installing, degraded node) and tests for filterMasterMCNs and findReleaseInMCN; includes helpers to build MCN test objects.
Controller: MCN Watchers & Status Flow
pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
Add mcnLister/mcnListerSynced fields, accept mcnInformer in New(...), register MCN add/update/delete handlers, include MCN cache sync in Run(), enqueue singleton IRI on master-MCN events, compute aggregatedReleases during sync, and replace status updater with updateInternalReleaseImageStatusWithReleases that updates status.releases only when changed (and fails sync on aggregation error).
Controller Tests Updated
pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go
Wire MCN informer into test fixture, set mcnListerSynced = alwaysReady, and add TestMCNEventHandlersEnqueueIRI to verify enqueue behavior on add/update/delete and on internal release status changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 9, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bfournie
Once this PR has been reviewed and has the lgtm label, please assign djoshy for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 9, 2026

@bfournie: This pull request references AGENT-1488 which is a valid jira issue.

Details

In response to this:

Add cluster-wide InternalReleaseImage status aggregation from all master MachineConfigNodes. For each release bundle, aggregates conditions (Available, Degraded, Installing, Removing) across nodes.

Status messages include node counts and degraded node names for operational visibility. Follows the Node Controller → MCP aggregation pattern.

- What I did

Add cluster-wide InternalReleaseImage status aggregation from all master MachineConfigNodes. For each release bundle, aggregates conditions (Available, Degraded, Installing, Removing) across nodes.

- How to verify it

oc get internalreleaseimage cluster -o yaml
e.g.

status:
   releases:
   - name: ocp-release-bundle-4.22.0-x86_64
     image: localhost:22625/openshift/release-images@sha256:...
     conditions:
     - type: Available
       status: "True"
       reason: AllNodesAvailable
       message: "Release ocp-release-bundle-4.22.0-x86_64 is available on all 3 nodes"
     - type: Degraded
       status: "False"
       reason: NoNodesDegraded

Check that Available=True when all master MCNs report Available=True

oc get internalreleaseimage cluster -o jsonpath='{.status.releases[0].conditions[?(@.type=="Available")]}'
 Expected: status: "True", reason: "AllNodesAvailable", message shows "X/X nodes"

Scenario B: Simulate degraded node (if possible)

  • Cause one MCN to report Degraded=True for a release
  • Verify IRI status shows:
   - Available: "False" with node count in message
   - Degraded: "True" with degraded node name in message

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go (1)

361-363: Double controller instantiation may cause test flakiness.

newFixture() already creates a controller internally (line 259), but this test calls f.newController() again, creating a second controller instance. The second controller has a fresh queue while the first one's queue may contain stale items from informer event handlers.

Consider using the controller already created by the fixture:

 func TestMCNEventHandlersEnqueueIRI(t *testing.T) {
 	f := newFixture(t, objs(iri(), cconfig(), clusterVersion(), iriCertSecret())())
-	c := f.newController()
+	c := f.controller
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go`
around lines 361 - 363, The test TestMCNEventHandlersEnqueueIRI creates two
controllers which can cause flaky behavior; remove the extra controller
instantiation by using the controller already created inside newFixture()
instead of calling f.newController() again — locate the test function and delete
the f.newController() call (and any assignment to a second controller variable),
then use the existing controller instance managed by newFixture/newFixture's
fields (and ensure any references to the removed variable are updated to use the
fixture's controller or its queue).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go`:
- Around line 361-363: The test TestMCNEventHandlersEnqueueIRI creates two
controllers which can cause flaky behavior; remove the extra controller
instantiation by using the controller already created inside newFixture()
instead of calling f.newController() again — locate the test function and delete
the f.newController() call (and any assignment to a second controller variable),
then use the existing controller instance managed by newFixture/newFixture's
fields (and ensure any references to the removed variable are updated to use the
fixture's controller or its queue).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0e0e8a4a-f44f-47b2-b58e-b6bafc3554d6

📥 Commits

Reviewing files that changed from the base of the PR and between a327c35 and b499a8b.

📒 Files selected for processing (5)
  • cmd/machine-config-controller/start.go
  • pkg/controller/internalreleaseimage/aggregation.go
  • pkg/controller/internalreleaseimage/aggregation_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go

@bfournie bfournie force-pushed the iri-status-aggregation branch from b499a8b to 0d0ca5c Compare April 10, 2026 00:59
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go (1)

303-314: Fixture MCN wiring is only partial for sync-path coverage.

newController now depends on MCN informer/lister sync, but the fixture still doesn’t plumb *mcfgv1.MachineConfigNode from initialObjects into a dedicated MCN lister/indexer path. That makes MCN-backed sync tests harder and leaves aggregation integration coverage weaker than it should be.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go`
around lines 303 - 314, The test fixture doesn't populate an MCN informer/lister
with the MachineConfigNode objects from initialObjects, yet newController now
requires the MCN informer sync (c.mcnListerSynced) for full sync-path coverage;
fix by wiring the MCN objects into a dedicated indexer/lister used by the
controller test: create an MCN fake informer/indexer, add any
*mcfgv1.MachineConfigNode instances from initialObjects into that indexer, set
c.mcnListerSynced to the informer's HasSynced and wire the informer's Lister
into the controller (same pattern used for other informers like
c.mcListerSynced/c.ccListerSynced), ensuring newController receives the
populated MCN lister so MCN-backed sync tests exercise the aggregation path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go`:
- Around line 387-404: The test currently only checks c.queue.Len() after
calling addMachineConfigNode and updateMachineConfigNode; change both
enqueue-path assertions in TestMCNEventHandlersEnqueueIRI to dequeue the key
(using c.queue.Get()), assert that the returned key equals
ctrlcommon.InternalReleaseImageInstanceName (casting to string if needed), and
call c.queue.Done(key) as before; update the t.Errorf messages to report the
unexpected key value so the test fails if the wrong item was enqueued rather
than just the length being correct.

---

Nitpick comments:
In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go`:
- Around line 303-314: The test fixture doesn't populate an MCN informer/lister
with the MachineConfigNode objects from initialObjects, yet newController now
requires the MCN informer sync (c.mcnListerSynced) for full sync-path coverage;
fix by wiring the MCN objects into a dedicated indexer/lister used by the
controller test: create an MCN fake informer/indexer, add any
*mcfgv1.MachineConfigNode instances from initialObjects into that indexer, set
c.mcnListerSynced to the informer's HasSynced and wire the informer's Lister
into the controller (same pattern used for other informers like
c.mcListerSynced/c.ccListerSynced), ensuring newController receives the
populated MCN lister so MCN-backed sync tests exercise the aggregation path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 089e78d1-2ef1-4fef-99ec-ca9777c9aebe

📥 Commits

Reviewing files that changed from the base of the PR and between b499a8b and 0d0ca5c.

📒 Files selected for processing (5)
  • cmd/machine-config-controller/start.go
  • pkg/controller/internalreleaseimage/aggregation.go
  • pkg/controller/internalreleaseimage/aggregation_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/controller/internalreleaseimage/aggregation.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • cmd/machine-config-controller/start.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller.go

@bfournie bfournie force-pushed the iri-status-aggregation branch from 0d0ca5c to 45af2c7 Compare April 10, 2026 01:21
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`:
- Around line 564-578: The status write is being forced because
reflect.DeepEqual(newIRI.Status.Releases, aggregatedReleases) sees every
aggregated condition as changed due to a freshly assigned LastTransitionTime;
change the logic so equality ignores or preserves LastTransitionTime before the
DeepEqual: either (A) when building aggregatedReleases in aggregation.go,
preserve the existing LastTransitionTime from the matching condition in
newIRI.Status.Releases (match by condition Type/Status/Reason/Message) so
unchanged conditions retain their original LastTransitionTime, or (B) perform
the equality check by comparing releases and their conditions while explicitly
excluding LastTransitionTime (i.e., compare
Type/Status/Reason/Message/ObservedGeneration but not LastTransitionTime) and
only set aggregatedReleases' LastTransitionTime when the condition truly
transitions; then use that stable aggregatedReleases in the existing
reflect.DeepEqual/newIRI.Status.Releases check before calling UpdateStatus.
- Around line 307-309: The controller currently skips calling
ctrl.enqueueAllInternalReleaseImages() when
mcn.Status.InternalReleaseImage.Releases is empty, which misses master-set and
pool-name changes; change the logic so enqueueAllInternalReleaseImages() is
invoked not only when len(Status.InternalReleaseImage.Releases)>0 but also
whenever the object represents a master (master membership changes) or
Spec.Pool.Name changes (i.e., compare old vs new Spec.Pool.Name in update
handlers); specifically update the handlers that call
ctrl.enqueueAllInternalReleaseImages() to remove the strict non-empty-only guard
and add checks for master membership changes and Spec.Pool.Name diffs so master
add/remove or pool name updates always trigger
ctrl.enqueueAllInternalReleaseImages().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dfade0d3-80c5-4e53-ab5a-11c071f0752b

📥 Commits

Reviewing files that changed from the base of the PR and between 0d0ca5c and 45af2c7.

📒 Files selected for processing (5)
  • cmd/machine-config-controller/start.go
  • pkg/controller/internalreleaseimage/aggregation.go
  • pkg/controller/internalreleaseimage/aggregation_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/controller/internalreleaseimage/aggregation.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • cmd/machine-config-controller/start.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go

@bfournie bfournie force-pushed the iri-status-aggregation branch from 45af2c7 to c11b9a0 Compare April 10, 2026 01:46
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/controller/internalreleaseimage/aggregation_test.go (1)

11-129: Lock down the full aggregation contract.

The suite covers part of the new API surface, but it never asserts aggregated Reason values and doesn't exercise the Removing condition. A regression there would still pass unnoticed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/internalreleaseimage/aggregation_test.go` around lines 11 -
129, The tests for aggregateReleaseStatus
(TestAggregateReleaseStatus_AllNodesAvailable, _PartiallyAvailable,
_OneDegraded) do not assert the aggregated Condition.Reason fields and there is
no test exercising the Removing condition; update these tests to assert expected
Reason values for conditions returned by aggregateReleaseStatus and add a new
test that constructs MCNs via createMCNWithReleaseStatus that set a
Removing=true state to ensure aggregateReleaseStatus emits a Removing condition
with the correct Status and Reason. Specifically, add assertions using
findCondition(result.Conditions, "...").Reason for "Available", "Degraded",
"Installing" in the existing tests and add a TestAggregateReleaseStatus_Removing
that calls aggregateReleaseStatus and verifies the Removing condition presence,
Status, and Reason (and that the Reason string contains the node name), so
regressions to aggregateReleaseStatus's Reason semantics are caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller.go`:
- Around line 456-463: The aggregation error from aggregateMCNIRIStatus is
currently only logged and not propagated, causing successful syncs with stale
Degraded=False; detect aggErr after calling aggregateMCNIRIStatus and
incorporate it into the overall sync error (e.g., set syncErr =
errors.Wrap(aggErr, "...") or return aggErr immediately) so the deferred status
update sees a non-nil error and flips the Degraded condition/causes a retry;
update references to aggregatedReleases only when aggErr is nil and ensure you
surface aggErr via the existing syncErr variable or return path so the reconcile
fails appropriately.

---

Nitpick comments:
In `@pkg/controller/internalreleaseimage/aggregation_test.go`:
- Around line 11-129: The tests for aggregateReleaseStatus
(TestAggregateReleaseStatus_AllNodesAvailable, _PartiallyAvailable,
_OneDegraded) do not assert the aggregated Condition.Reason fields and there is
no test exercising the Removing condition; update these tests to assert expected
Reason values for conditions returned by aggregateReleaseStatus and add a new
test that constructs MCNs via createMCNWithReleaseStatus that set a
Removing=true state to ensure aggregateReleaseStatus emits a Removing condition
with the correct Status and Reason. Specifically, add assertions using
findCondition(result.Conditions, "...").Reason for "Available", "Degraded",
"Installing" in the existing tests and add a TestAggregateReleaseStatus_Removing
that calls aggregateReleaseStatus and verifies the Removing condition presence,
Status, and Reason (and that the Reason string contains the node name), so
regressions to aggregateReleaseStatus's Reason semantics are caught.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8a8c3428-50a4-45bf-8bcf-642c4ddb1e88

📥 Commits

Reviewing files that changed from the base of the PR and between 45af2c7 and c11b9a0.

📒 Files selected for processing (5)
  • cmd/machine-config-controller/start.go
  • pkg/controller/internalreleaseimage/aggregation.go
  • pkg/controller/internalreleaseimage/aggregation_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • cmd/machine-config-controller/start.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go
  • pkg/controller/internalreleaseimage/aggregation.go

Add cluster-wide InternalReleaseImage status aggregation from all
master MachineConfigNodes. For each release bundle, aggregates
conditions (Available, Degraded, Installing, Removing) across nodes.

Status messages include node counts and degraded node names for
operational visibility. Follows the Node Controller → MCP aggregation
pattern.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@bfournie bfournie force-pushed the iri-status-aggregation branch from c11b9a0 to a2a0bd3 Compare April 10, 2026 02:03
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/controller/internalreleaseimage/aggregation_test.go (1)

11-129: Add a Removing aggregation case.

This suite validates Available, Degraded, and Installing, but not Removing, which is part of the new aggregated release contract. A focused case here would keep that condition from drifting untested.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/internalreleaseimage/aggregation_test.go` around lines 11 -
129, Add a new test case in aggregation_test.go that exercises the "Removing"
aggregation path: call aggregateReleaseStatus(releaseName, mcns, nil) where one
MachineConfigNode created via createMCNWithReleaseStatus(...) has "Removing":
true (others Available=true), then assert that findCondition(result.Conditions,
"Removing") returns a condition with Status==metav1.ConditionTrue and its
Message mentions the node name; also verify overall Available/Degraded
expectations as appropriate to the scenario to avoid regressions.
pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go (1)

406-415: Cover master pool transitions in the update-handler test.

updateMachineConfigNode also enqueues on worker↔master pool changes, but this test only exercises IRI-status diffs. Add assertions for enter/leave-master transitions so the membership logic doesn’t regress silently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go`
around lines 406 - 415, The test for updateMachineConfigNode currently only
asserts enqueue behavior for IRI status changes; add cases to also assert
enqueue behavior when a node's pool membership changes between worker and
master. Specifically, in the test around mcn/oldMCN and calls to
c.updateMachineConfigNode, add two additional scenarios: one where oldMCN has
pool "worker" and mcn changes to "master" (should enqueue) and one where oldMCN
is "master" and mcn changes to "worker" (should enqueue), using
assertQueuedIRIKey to verify the queue counts for these transitions; keep the
existing IRI-diff and no-change assertions unchanged. Ensure you mutate the
MachineConfigNode pool field on the DeepCopy instances (oldMCN/mcn) used in the
update calls so the membership change is detected by updateMachineConfigNode.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/controller/internalreleaseimage/aggregation_test.go`:
- Around line 11-129: Add a new test case in aggregation_test.go that exercises
the "Removing" aggregation path: call aggregateReleaseStatus(releaseName, mcns,
nil) where one MachineConfigNode created via createMCNWithReleaseStatus(...) has
"Removing": true (others Available=true), then assert that
findCondition(result.Conditions, "Removing") returns a condition with
Status==metav1.ConditionTrue and its Message mentions the node name; also verify
overall Available/Degraded expectations as appropriate to the scenario to avoid
regressions.

In `@pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go`:
- Around line 406-415: The test for updateMachineConfigNode currently only
asserts enqueue behavior for IRI status changes; add cases to also assert
enqueue behavior when a node's pool membership changes between worker and
master. Specifically, in the test around mcn/oldMCN and calls to
c.updateMachineConfigNode, add two additional scenarios: one where oldMCN has
pool "worker" and mcn changes to "master" (should enqueue) and one where oldMCN
is "master" and mcn changes to "worker" (should enqueue), using
assertQueuedIRIKey to verify the queue counts for these transitions; keep the
existing IRI-diff and no-change assertions unchanged. Ensure you mutate the
MachineConfigNode pool field on the DeepCopy instances (oldMCN/mcn) used in the
update calls so the membership change is detected by updateMachineConfigNode.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4a746903-1bc2-4116-8201-49cf23db0bea

📥 Commits

Reviewing files that changed from the base of the PR and between c11b9a0 and a2a0bd3.

📒 Files selected for processing (5)
  • cmd/machine-config-controller/start.go
  • pkg/controller/internalreleaseimage/aggregation.go
  • pkg/controller/internalreleaseimage/aggregation_test.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller.go
  • pkg/controller/internalreleaseimage/internalreleaseimage_controller_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/controller/internalreleaseimage/aggregation.go

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 10, 2026

@bfournie: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-ocl-part1 a2a0bd3 link false /test e2e-gcp-op-ocl-part1
ci/prow/e2e-gcp-op-ocl-part2 a2a0bd3 link false /test e2e-gcp-op-ocl-part2
ci/prow/e2e-gcp-op-ocl a2a0bd3 link false /test e2e-gcp-op-ocl
ci/prow/verify a2a0bd3 link true /test verify

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

mcn := obj.(*mcfgv1.MachineConfigNode)
klog.V(4).Infof("Adding MachineConfigNode %s", mcn.Name)

if mcn.Spec.Pool.Name == "master" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this approach is robust enough (ie pool names could change).
In this case we're interested in watching the events of the MCN belonging to control plane nodes only, so a safer approach would be to retrieve the associated node (ie ctrl.nodeLister.Get(mcn.Name)) and check for the node-role.kubernetes.io/master or node-role.kubernetes.io/control-plane label.
cc @yuqi-zhang in case a better approach would recommended to handle such cases

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same filtering logic should be applied also for the update and delete event handlers, so in case better to have a command method for that)


// enqueueAllInternalReleaseImages enqueues all IRI resources for reconciliation.
// Since IRI is a singleton named "cluster", this just enqueues that one resource.
func (ctrl *Controller) enqueueAllInternalReleaseImages() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to remove code duplication, but:

  • The method name is wrong and misleading: by design IRI is a singleton, it should be something like enqueueInternalReleaseImage - the weird thing is that the comment also reports that...
  • Remove the duplication in other parts of the code where the same action is performed


// Aggregate MCN IRI status into cluster-wide IRI status
var aggErr error
aggregatedReleases, aggErr = ctrl.aggregateMCNIRIStatus(iri)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the sync handler manages the IRI status in the deferred method updateInternalReleaseImageStatusWithReleases, I think it's better to keep centralized such logic in one place. So better to move this part inside directly inside that method (I'm still not 100% convinced the deferred design will work well in all the cases, but since it's currently accepted design let's follow it properly - it will also make simpler to maintain the code)

if aggErr != nil {
return fmt.Errorf("failed to aggregate MCN IRI status: %w", aggErr)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the comment here since there's no better place but.. I was expecting that updateInternalReleaseImageStatusWithReleases would have been replaced by the new logic?

var masterMCNs []*mcfgv1.MachineConfigNode
for _, mcn := range mcns {
// Check if MCN belongs to master pool
if mcn.Spec.Pool.Name == "master" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment on filtering properly the MCN for control plane nodes

//
// This function implements the pattern used by the Node Controller for aggregating
// MCN conditions into MachineConfigPool status.
func (ctrl *Controller) aggregateMCNIRIStatus(iri *mcfgv1alpha1.InternalReleaseImage) ([]mcfgv1alpha1.InternalReleaseImageBundleStatus, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this code looks to me unnecessarily complex and I'd strongly suggest to review it for making it simpler / readable (the fact that it required a separate file may be a sign of that).

Let me recap the main points of the expected aggregation logic, which I think it' somehow simpler than the current one:

  1. Before looking to the MCN release, the MCN itself should be checked for any error. If the condition InternalReleaseImageDegraded is set, then it means that something bad happened, and the whole node may not be able to serve any more the images. By implementing an initial conservative logic, we could consider that if at least one MCN is IRI degraded, then the IRI should be marked as degraded. The IRI degraded condition should report which nodes are currently degraded.

  2. Every OCP bundle release can have up to 5 different conditions: Degraded, Available, Mounted, Installing, Removing.
    For the scope of this PR / feature, only Degraded and Available must be considered - the other three will be handled in the NoRegistryClusterUpgrade feature and we can safely ignore them.

  3. Similarly to the previous logic, if at least one MCN release bundle is Degraded, then the related IRI release should be marked as degraded. Since the same release could be degraded in different nodes, the IRI degraded condition for the release should report which nodes are currently degraded.

  4. Every MCN image reports a local (to that node) pullspec, ie something like localhost:22625/openshift/release-images@sha256:33ca1f3b76fc24060b323bd93f8491cf69a63d16949944d4ab3404.... For the end user this not very useful, so the localhost should be replaced with the canonical api-int.<cluster domain> URL.

To avoid any ambiguity, I will also prepare and share some concrete examples of the expected resource status, so that we could use them as acceptance criteria for the story.

Anyhow, in general, I'd strive to simplify as much as possible this code


// TestMCNEventHandlersEnqueueIRI verifies that MachineConfigNode event handlers
// correctly enqueue the IRI resource for reconciliation.
func TestMCNEventHandlersEnqueueIRI(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test could be really useful (at least as a unit), I'd prefer to have other tests - reusing the current infrastructure - verifying the expected releases status

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants