Skip to content

Conversation

@tmshort
Copy link
Contributor

@tmshort tmshort commented Nov 12, 2025

Issue:
Manifest ordering inconsistency: CRDs from Helm release manifest and
bundle manifest appeared in different orders, causing PhaseSort to
produce different phase structures even though they contained the same
objects.

Solution:
Added deterministic sorting in PhaseSort (phase.go):

  • Sort objects within each phase by Group, Version, Kind, Namespace, Name
  • Ensures consistent phase structure regardless of input order
  • Critical for comparing revisions from different sources

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Description

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@tmshort tmshort requested a review from a team as a code owner November 12, 2025 20:46
@openshift-ci openshift-ci bot requested review from dtfranz and trgeiger November 12, 2025 20:46
@netlify
Copy link

netlify bot commented Nov 12, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 48e05e1
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/691cc02223cecb00080a1c66
😎 Deploy Preview https://deploy-preview-2329--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov
Copy link

codecov bot commented Nov 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.52%. Comparing base (6ef62de) to head (48e05e1).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2329      +/-   ##
==========================================
- Coverage   74.19%   70.52%   -3.68%     
==========================================
  Files          91       91              
  Lines        7228     7239      +11     
==========================================
- Hits         5363     5105     -258     
- Misses       1433     1699     +266     
- Partials      432      435       +3     
Flag Coverage Δ
e2e 44.32% <0.00%> (-0.08%) ⬇️
experimental-e2e 14.08% <0.00%> (-34.31%) ⬇️
unit 58.54% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@joelanford
Copy link
Member

CollisionProtection mismatch: The migrated revision had collisionProtection=None (needed to adopt Helm-managed resources) while the bundle-generated revision had collisionProtection=Prevent (the default value).

I think this is intentional. We need None in order to adopt the existing objects. But then we want Prevent to ensure that we stop automatically adopting those objects in the future. I'm pretty sure that @thetechnick intended the adoption mechanics of None to be a one-time thing.

@perdasilva
Copy link
Contributor

CollisionProtection mismatch: The migrated revision had collisionProtection=None (needed to adopt Helm-managed resources) while the bundle-generated revision had collisionProtection=Prevent (the default value).

I think this is intentional. We need None in order to adopt the existing objects. But then we want Prevent to ensure that we stop automatically adopting those objects in the future. I'm pretty sure that @thetechnick intended the adoption mechanics of None to be a one-time thing.

I think the original behavior makes sense. We shouldn't keep None beyond the initial migration to assume a conservative posture by default going forward. Leaving it as None opens the cluster to possible unforeseen pain in the future. I'd also say it's fine to generate a new revision with the new collisionProtection setting to keep an audit trail.

// to ensure consistent ordering regardless of input order. This is critical for
// Helm-to-Boxcutter migration where the same resources may come from different sources
// (Helm release manifest vs bundle manifest) and need to produce identical phases.
func compareClusterExtensionRevision(a, b ocv1.ClusterExtensionRevisionObject) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func compareClusterExtensionRevision(a, b ocv1.ClusterExtensionRevisionObject) int {
func compareClusterExtensionRevisionObject(a, b ocv1.ClusterExtensionRevisionObject) int {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even just compareRevisionObject ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a rename for this, but I'm waiting on it, in case the decision is that we don't want the CollisionProtection change.

@tmshort
Copy link
Contributor Author

tmshort commented Nov 13, 2025

CollisionProtection mismatch: The migrated revision had collisionProtection=None (needed to adopt Helm-managed resources) while the bundle-generated revision had collisionProtection=Prevent (the default value).

I think this is intentional. We need None in order to adopt the existing objects. But then we want Prevent to ensure that we stop automatically adopting those objects in the future. I'm pretty sure that @thetechnick intended the adoption mechanics of None to be a one-time thing.

I think the original behavior makes sense. We shouldn't keep None beyond the initial migration to assume a conservative posture by default going forward. Leaving it as None opens the cluster to possible unforeseen pain in the future. I'd also say it's fine to generate a new revision with the new collisionProtection setting to keep an audit trail.

So, removing either of the changes will cause the bug (two CERs created) to assert itself. If we are OK with this, then we don't necessarily need this fix, although sorting the objects during the phase is probably still a good idea. @dtfranz and I found the same solution to the problem:

On its own, the manifest sanitization does not completely fix the other issue, but it does make it quite a bit better. Following the repro steps in OCPBUGS-62943 on main, I get infinite additional CERs. When I add my fix for OPRUN-4238, I only get one additional CER. This extra CER is being created due to the diff between revisions created by GenerateRevisionFromHelmRelease and GenerateRevision; the former sets the CollisionProtection flag, and the latter does not. The objects within the phases may also need to be sorted. If you resolve those differences, the extra CER does not get generated. I'm not 100% sure of the fix method (or, as @perdasilva mentioned, if it's even an issue that needs fixing), but I'm pretty sure we need to disregard the CollisionProtection flag when comparing observed vs desired CERs. If we do that then the controller should understand that no new revision is required.

If we're good with having two initial CERs upon transition from the helm applier to the boxcutter applier, then I can reduce this to be just the sorting changes, and remove the CollisionProtection change. I would then close the bug as intentional.

@tmshort
Copy link
Contributor Author

tmshort commented Nov 14, 2025

@joelanford @dtfranz giving me 👍 doesn't really tell me what you want! Do we want to fix the two CER issue or just fix the sorting?

@tmshort tmshort force-pushed the fix-extra-cer branch 2 times, most recently from ec218e0 to 0a3158f Compare November 17, 2025 19:26
@tmshort tmshort changed the title 🐛 Fix duplicate ClusterExtensionRevisions during Helm-to-Boxcutter migr… 🐛 Fix Boxcutter manifest ordering inconsistency Nov 17, 2025
@tmshort
Copy link
Contributor Author

tmshort commented Nov 17, 2025

@joelanford @dtfranz now only the sorting remains.

@dtfranz
Copy link
Contributor

dtfranz commented Nov 18, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 18, 2025
@camilamacedo86
Copy link
Contributor

/lgtm

// to ensure consistent ordering regardless of input order. This is critical for
// Helm-to-Boxcutter migration where the same resources may come from different sources
// (Helm release manifest vs bundle manifest) and need to produce identical phases.
func compareClusterExtensionRevisionObjects(a, b ocv1.ClusterExtensionRevisionObject) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would using cmp.Compare be a nice alternative here? e.g.

cmp.Compare(cmp.Or(aGVK.Group, bGVK.Group), cmp.Or(...), ...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return cmp.Compare(
  cmp.Or(a.Object.GroupVersionKind().Group, b.Object.GroupVersionKind().Group),
  cmp.Or(a.Object.GroupVersionKind().Version, b.Object.GroupVersionKind().Version),
  cmp.Or(a.Object.GroupVersionKind().Kind, b.Object.GroupVersionKind().Kind),
  cmp.Or(a.Object.GetNamespace(), b.Object.GetNamespace()),
  cmp.Or(a.Object.GetName(), b.Object.GetName()),
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only a suggestion - if it doesn't spark joy, I'm happy to approve. Let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with that API, right now, it's not sparking joy (😉), but what's here works... so let's leave as-is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np - it's the cool kid in town: https://pkg.go.dev/cmp basically obviates a lot the branching here

@openshift-ci
Copy link

openshift-ci bot commented Nov 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: camilamacedo86, perdasilva

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 18, 2025
}

// Compare Version
if aGVK.Version < bGVK.Version {
Copy link
Contributor

@perdasilva perdasilva Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add another couple of resources to the test that exercise the version comparison (or just change one of the current resources to use the apps/v2 api group or something) - I think that will clear the codecov problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do this, then I might as well look at cmp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's still less than threshold... 😢

Issue:
Manifest ordering inconsistency: CRDs from Helm release manifest and
bundle manifest appeared in different orders, causing PhaseSort to
produce different phase structures even though they contained the same
objects.

Solution:
Added deterministic sorting in PhaseSort (phase.go):
 - Sort objects within each phase by Group, Version, Kind, Namespace, Name
 - Ensures consistent phase structure regardless of input order
 - Critical for comparing revisions from different sources

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 18, 2025
@perdasilva
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 18, 2025
@tmshort
Copy link
Contributor Author

tmshort commented Nov 18, 2025

/override project

@openshift-ci
Copy link

openshift-ci bot commented Nov 18, 2025

@tmshort: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

  • project

Only the following failed contexts/checkruns were expected:

  • Verify PR title
  • codecov/project
  • crd-diff
  • e2e-kind
  • extension-developer-e2e
  • go-apidiff
  • go-verdiff
  • goreleaser
  • lint
  • netlify/olmv1/deploy-preview
  • tide
  • unit-test-basic
  • upgrade-e2e
  • verify

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override project

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tmshort
Copy link
Contributor Author

tmshort commented Nov 18, 2025

/override codecov/project

@openshift-ci
Copy link

openshift-ci bot commented Nov 18, 2025

@tmshort: Overrode contexts on behalf of tmshort: codecov/project

In response to this:

/override codecov/project

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit f3569d5 into operator-framework:main Nov 18, 2025
25 of 26 checks passed
@tmshort tmshort deleted the fix-extra-cer branch November 18, 2025 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants