Skip to content

Conversation

@perdasilva
Copy link
Contributor

@perdasilva perdasilva commented Nov 18, 2025

Description

Summary

This PR refactors the ClusterExtensionRevision status conditions to provide clearer, more actionable feedback about revision lifecycle states. The changes introduce a new Progressing condition, simplify condition reasons, and ensure that revision-level errors (retrying states) are properly surfaced to the parent ClusterExtension.

This PR does introduce API breaking changes. But, only to experimental APIs. It should be ok to override.

Key Changes

  1. Refactored ClusterExtensionRevision Status Conditions

Simplified condition types:

  • Retained Available and Succeeded conditions
  • Added new Progressing condition for better visibility into rollout state
  • Removed obsolete condition reasons and consolidated error handling

Updated condition reasons to be more semantic and actionable:

  • Before: ReconcileFailure, RevisionValidationFailure, PhaseValidationError, ObjectCollisions, RolloutSuccess, Incomplete, Progressing
  • After: Retrying, Blocked, RollingOut, Archived, Migrated

API Changes:

  • Added Progressing column to kubectl get clusterextensionrevisions output
  1. Surfaced Revision Retrying Condition to ClusterExtension

Modified the Boxcutter applier to propagate retrying errors from ClusterExtensionRevision to the parent ClusterExtension. When a revision is in a Progressing=True state with Reason=Retrying, the error is now surfaced to the ClusterExtension, providing better visibility into failed reconciliation attempts. When the revision is Progressing=True/Succeeded the applier returns success.

  1. Added Comprehensive E2E Tests

Added end-to-end tests for ClusterExtensionRevision lifecycle scenarios to validate the new condition states and transitions.

  1. Added ClusterExtension documentation to the RolloutInProgress Progression reason

ClusterExtensionRevision Status Conditions

Available Condition

Indicates whether the revision's objects are available and passing the object status probes.

Status Reason Description
True ProbesSuceeded Objects are available and pass all probes
False ProbeFailed One or more objects are failing their availability probes
Unknown Reconciling Probes could not be observed / intermittent reconciliation error
Unknown Archived Revision is archived
Unknown Migrated Revision was migrated from Helm release

Progressing Condition

Indicates whether the revision is progressing to its next state. It follows the same semantic as the ClusterExtension and Deployment Progressing condition.

Status Reason Description
True Retrying Reconciliation failed due to validation errors, object collisions, or other recoverable errors; will retry
True RollingOut Revision is actively rolling out objects across phases
True Succeeded Revision has completed its rollout successfully
False Archived Revision has been archived and is no longer progressing

Not a part of this PR yet, but Progressing can also be Blocked for non-retryable errors.

Succeeded Condition

Indicates whether the revision has successfully completed its rollout.

Status Reason Description
True Succeeded Revision succeeded rolling out all objects and passed all probes

Migration Notes

  • The Progressing condition replaces the previous pattern of setting Available=False with various failure reasons
  • All retrying scenarios (validation failures, collisions, reconcile errors) now consistently set Progressing=True, Reason=Retrying and Available=Unknown
  • The new condition structure provides clearer separation between "something is wrong" (Progressing=Retrying) vs "rollout is in progress" (Progressing=RollingOut)

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@perdasilva perdasilva requested a review from a team as a code owner November 18, 2025 12:39
Copilot AI review requested due to automatic review settings November 18, 2025 12:39
@netlify
Copy link

netlify bot commented Nov 18, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 98b7083
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/6920277f65edfb00084c2ccd
😎 Deploy Preview https://deploy-preview-2340--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@openshift-ci openshift-ci bot requested review from grokspawn and oceanc80 November 18, 2025 12:39
@openshift-ci
Copy link

openshift-ci bot commented Nov 18, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign perdasilva for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@perdasilva perdasilva marked this pull request as draft November 18, 2025 12:40
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 18, 2025
Copilot finished reviewing on behalf of perdasilva November 18, 2025 12:42
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the ClusterExtensionRevision conditions system to better distinguish between availability and progression states. The changes introduce a new "Progressing" condition type and update condition reasons to be more descriptive of the revision lifecycle states.

Key changes:

  • Introduces Progressing condition type alongside refactored Available and Succeeded conditions
  • Replaces generic condition reasons with specific lifecycle states (e.g., RollingOut, RolledOut, Retrying, ProbesSucceeded)
  • Adds comprehensive E2E tests for ClusterExtensionRevision condition behavior
  • Removes enum validation for CollisionProtection field in CRDs

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
api/v1/clusterextensionrevision_types.go Refactors condition type constants and adds Progressing type, updates condition reason constants to reflect specific lifecycle states
internal/operator-controller/controllers/clusterextensionrevision_controller.go Implements condition setting logic with new helper functions and updates reconciliation flow to properly set Progressing/Available conditions
internal/operator-controller/controllers/clusterextensionrevision_controller_test.go Updates unit tests to validate new condition reasons and adds test coverage for error scenarios
internal/operator-controller/applier/boxcutter.go Adds handling for Retrying reason in progressing condition
test/e2e/cluster_extension_revision_test.go Adds comprehensive E2E test covering revision conditions, availability probes, and archiving
test/e2e/e2e_suite_test.go Adds Kubernetes clientset for pod exec operations in E2E tests
manifests/experimental.yaml, manifests/experimental-e2e.yaml, helm/olmv1/base/operator-controller/crd/experimental/olm.operatorframework.io_clusterextensionrevisions.yaml Adds Progressing column to kubectl output and removes CollisionProtection enum validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@perdasilva perdasilva changed the title :sparking: ClusterExtentionRevision conditions ✨ ClusterExtentionRevision conditions Nov 18, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 18, 2025
Copilot AI review requested due to automatic review settings November 18, 2025 18:24
Copilot finished reviewing on behalf of perdasilva November 18, 2025 18:26
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 18, 2025
@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

❌ Patch coverage is 85.71429% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.99%. Comparing base (1355ff7) to head (98b7083).

Files with missing lines Patch % Lines
...controllers/clusterextensionrevision_controller.go 91.66% 6 Missing ⚠️
internal/operator-controller/applier/boxcutter.go 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2340      +/-   ##
==========================================
- Coverage   74.23%   70.99%   -3.25%     
==========================================
  Files          91       91              
  Lines        7239     7199      -40     
==========================================
- Hits         5374     5111     -263     
- Misses       1433     1656     +223     
  Partials      432      432              
Flag Coverage Δ
e2e 44.56% <0.00%> (+0.20%) ⬆️
experimental-e2e 14.35% <0.00%> (-34.13%) ⬇️
unit 58.88% <85.71%> (+0.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@perdasilva perdasilva changed the title ✨ ClusterExtentionRevision conditions ⚠️ ClusterExtentionRevision conditions Nov 19, 2025
Copilot AI review requested due to automatic review settings November 19, 2025 15:01
Copilot finished reviewing on behalf of perdasilva November 19, 2025 15:02
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

internal/operator-controller/controllers/clusterextensionrevision_controller_test.go:915

  • Corrected spelling of 'InTransistion' to 'InTransition'.
func (m mockRevisionResult) InTransistion() bool {

internal/operator-controller/controllers/clusterextensionrevision_controller_test.go:953

  • Corrected spelling of 'InTransistion' to 'InTransition'.
func (m mockPhaseResult) InTransistion() bool {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 19, 2025 15:26
@openshift-ci openshift-ci bot requested a review from tmshort November 19, 2025 18:00
Copilot finished reviewing on behalf of perdasilva November 19, 2025 18:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


func setRetryingConditions(cer *ocv1.ClusterExtensionRevision, message string) {
markAsProgressing(cer, ocv1.ClusterExtensionRevisionReasonRetrying, message)
markAsAvailableUnknown(cer, message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if Available is not already present on status we should not introduce it with unknown status.

Copilot AI review requested due to automatic review settings November 19, 2025 19:17
Copilot finished reviewing on behalf of perdasilva November 19, 2025 19:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 19, 2025 19:34
Copilot finished reviewing on behalf of perdasilva November 19, 2025 19:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if err != nil {
if rres != nil {
l.Error(err, "revision reconcile failed")
l.V(1).Info("reconcile failure report", "report", rres.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perdasilva here we need keep debug logs.
I will clarify that in the code and create a test
OR we must remove those reports they are too verbose.

https://issues.redhat.com/browse/OCPBUGS-62964

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedjak @perdasilva You both have been working with this more recently.
I also don’t find the report very valuable in the logs — it’s confusing and hard to read.

What do you think about removing the report entirely?
And instead, formatting the message in a clearer way, similar to what we do in other places, so it becomes more meaningful? It would either solve the bug above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-added it (but only in debug mode). Let's leave it like that for this PR and address this separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perdasilva

I got up and was looking at this today.
I’m thinking the best solution might be similar to what we do for crddiff.

We need a format to extract the report information and normalize it into both machine-readable messages and human-readable logs. Then we can use it for any situation—info or error—while normalizing and stripping out only the data that is actually validatable.

This definitely needs to be done in a separate PR, and we’ll need to discuss a better long-term approach for handling this.

Thank you for the attention !!!

c/c @pedjak

Copilot AI review requested due to automatic review settings November 19, 2025 20:33
Copilot finished reviewing on behalf of perdasilva November 19, 2025 20:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Per Goncalves da Silva <pegoncal@redhat.com>
Co-authored-by: Predrag Knezevic <pknezevi@redhat.com
Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com>
// When Progressing is True and the Reason is Retrying, the ClusterExtension has encountered an error that could be resolved on subsequent reconciliation attempts.
// When Progressing is False and the Reason is Blocked, the ClusterExtension has encountered an error that requires manual intervention for recovery.
// <opcon:experimental:description>
// When Progressing is True and Reason is RolloutInProgress, the ClusterExtension has one or more ClusterExtensionRevisions in active roll out.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not unifying here to have the same reason on the extension and revision? Right now we have RolloutInProgress on the extension and on the revision we have RollingOut,


// ClusterExtensionRevision is the Schema for the clusterextensionrevisions API
// +kubebuilder:printcolumn:name="Available",type=string,JSONPath=`.status.conditions[?(@.type=='Available')].status`
// +kubebuilder:printcolumn:name="Progressing",type=string,JSONPath=`.status.conditions[?(@.type=='Progressing')].status`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without exposing the reason as well, status alone does not tell us enough.

case ocv1.ClusterExtensionRevisionReasonRetrying:
return false, "", errors.New(progressingCondition.Message)
}
return false, progressingCondition.Message, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this line into the switch above as detault option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants