Skip to content

NO-ISSUE: Synchronize From Upstream Repositories#1306

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
openshift-bot:synchronize-upstream
May 25, 2026
Merged

NO-ISSUE: Synchronize From Upstream Repositories#1306
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
openshift-bot:synchronize-upstream

Conversation

@openshift-bot
Copy link
Copy Markdown
Contributor

@openshift-bot openshift-bot commented May 22, 2026

The staging/ and vendor/ directories have been synchronized from the upstream repositories, pulling in the following commits:

Date Commit Author Message
2026-05-21 16:11:29 operator-framework/operator-lifecycle-manager@4aab00c Rohit Patil Improve bundle unpack failure handling and user experience (#3832)

This pull request is expected to merge without any human intervention. If tests are failing here, changes must land upstream to fix any issues so that future downstreaming efforts succeed.

/assign @openshift/openshift-team-operator-runtime

This commit improves the handling and messaging of bundle unpack failures
in OLM to provide better user experience and follows Kubernetes controller
best practices.

Key improvements:

1. User-friendly error messages
   - Provide clear, actionable error messages for bundle unpack failures
   - Include common causes and troubleshooting steps
   - Add auto-retry information and manual remediation commands

2. Reduced etcd payload bloat
   - Keep subscription condition messages concise
   - Emit detailed troubleshooting guidance as Kubernetes Events
   - Prevents storing large 329-char strings repeatedly in etcd

3. Prevent duplicate guidance in messages
   - Check if guidance already exists before appending
   - Avoids message duplication from underlying conditions

4. Fix variable shadowing
   - Rename inner 'cond' to 'unpackingCond' for clarity
   - Improves code readability and prevents confusion

5. Prevent queue churn from repeated requeues
   - Track state transitions with isNewFailure flag
   - Only schedule delayed requeue on new failures
   - Prevents repeated AddAfter calls for persistent failures

6. Use exact constant comparison for JobIncompleteReason
   - Replace strings.Contains() with bundle.JobIncompleteReason
   - More deterministic and type-safe
   - Prevents matching unintended substring values

7. Improve test maintainability
   - Use substring assertions instead of exact message matching
   - Tests won't break on minor wording/punctuation changes
   - Verify key components: prefix, reason, error, guidance

All tests pass. Changes follow Kubernetes controller best practices.

Co-authored-by: Rohit Patil <ropatil@ropatil-thinkpadp16vgen1.bengluru.csb>
Upstream-repository: operator-lifecycle-manager
Upstream-commit: 4aab00c4e145519e53495801d995d5ec3d6d2b1e
@openshift-bot openshift-bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. labels May 22, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 22, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@openshift-bot: This pull request explicitly references no jira issue.

Details

In response to this:

The staging/ and vendor/ directories have been synchronized from the upstream repositories, pulling in the following commits:

Date Commit Author Message
2026-05-21 16:11:29 operator-framework/operator-lifecycle-manager@4aab00c Rohit Patil Improve bundle unpack failure handling and user experience (#3832)

This pull request is expected to merge without any human intervention. If tests are failing here, changes must land upstream to fix any issues so that future downstreaming efforts succeed.

/assign @openshift/openshift-team-operator-runtime

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Walkthrough

Bundle unpack job automation now includes automatic cleanup via TTL (300 seconds), and error/pending diagnostic messages enriched with targeted troubleshooting guidance. Subscription status updates track new failures separately and emit detailed warning events.

Changes

Bundle Unpack Job TTL and Enhanced Diagnostics

Layer / File(s) Summary
Job TTL cleanup and lifecycle
staging/operator-lifecycle-manager/pkg/controller/bundle/bundle_unpacker.go, staging/operator-lifecycle-manager/pkg/controller/bundle/bundle_unpacker_test.go
Bundle unpack Job now enables Kubernetes TTL cleanup with TTLSecondsAfterFinished: 300. Comments clarified to note pod/log availability limited until cleanup. Test expectations updated across custom timeout, digest, pending, and failed job scenarios to include TTL assertion. New TestBundleUnpackJobHasTTL validates TTL setting and critical job spec fields.
Bundle unpack error and pending message enrichment
staging/operator-lifecycle-manager/pkg/controller/bundle/bundle_unpacker.go, staging/operator-lifecycle-manager/pkg/controller/bundle/bundle_unpacker_test.go
When bundle unpack job is pending with container status, code appends targeted troubleshooting guidance for image pull failures (e.g., ImagePullBackOff, ErrImagePull, manifest unknown, unauthorized), including TTL cleanup timing and manual retry instructions. Test expected pending messages updated with guidance. Error construction assertions refactored to compare parsed errors instead of hardcoded strings.
Subscription status failure and pending diagnostics
staging/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go, staging/operator-lifecycle-manager/pkg/controller/operators/catalog/subscriptions_test.go, staging/operator-lifecycle-manager/test/e2e/subscription_e2e_test.go
Catalog operator enriches BundleUnpackFailed and BundleUnpacking subscription conditions: on failure, builds condition message with conditional guidance, deduplicates subscription updates, emits per-subscription warning events, and schedules delayed requeue only for newly observed failures. On pending, extracts specific image pull messages from bundle lookups. Test assertions refactored to flexible substring matching to avoid brittle exact message comparison. E2E test updated to verify multiple failure substrings.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

verified

Suggested reviewers

  • bentito
  • camilamacedo86
🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Ginkgo e2e test lacks timeouts: Consistently() call (line 2507) missing timeout/interval parameters, defaulting to insufficient 100ms for cluster operations. Add explicit timeout/interval to Consistently: Consistently(..., 5*time.Minute, interval). Add diagnostic messages to assertions in TestBundleUnpackJobHasTTL.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'NO-ISSUE: Synchronize From Upstream Repositories' accurately describes the main purpose of the PR, which is to synchronize staging and vendor directories from upstream repositories as part of a systematic sync process.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names are stable and deterministic with no dynamic values (pod names, timestamps, UUIDs, namespaces, etc.). Test titles are static descriptive strings.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; only modifications to existing test assertions. The new test added is a Go unit test, not a Ginkgo e2e test, so check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests were added in this PR. TestBundleUnpackJobHasTTL is a unit test, and subscription_e2e_test.go changes only updated existing tests.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds topology-aware tolerations for bundle unpacker Job targeting 'node-role.kubernetes.io/master'. Does not break SNO, TNF, TNA, or HyperShift topologies.
Ote Binary Stdout Contract ✅ Passed No process-level stdout writes. All fmt calls are Errorf/Sprintf. Test code in It/Describe blocks. No klog/glog writes or init() violations.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests were added. Only an assertion matcher was updated in an existing test. External registry references are pre-existing.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from pedjak and thetechnick May 22, 2026 00:06
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 22, 2026

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: openshift-bot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 22, 2026

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: openshift-bot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
staging/operator-lifecycle-manager/pkg/controller/operators/catalog/subscriptions_test.go (1)

1327-1340: ⚡ Quick win

Avoid index-based condition assertions in this test

Using fetched.Status.Conditions[0] assumes stable ordering. Prefer fetching SubscriptionBundleUnpackFailed by type, then asserting on that message.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@staging/operator-lifecycle-manager/pkg/controller/operators/catalog/subscriptions_test.go`
around lines 1327 - 1340, The test is indexing Status.Conditions[0] which
assumes stable ordering; instead, locate the condition with Type
"SubscriptionBundleUnpackFailed" on both fetched and the expected s (e.g.,
iterate over fetched.Status.Conditions to find cond.Type ==
"SubscriptionBundleUnpackFailed") and copy that condition's Message into the
matching expected condition before require.Equal, then run the require.Contains
assertions against that found condition's Message. Update all uses of
fetched.Status.Conditions[0] and s.Status.Conditions[0] in the
"NoStatus/NoCurrentCSV/BundleUnpackFailed" case to use the condition lookup by
Type so assertions no longer rely on array index ordering.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@staging/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go`:
- Around line 1401-1410: The current isNewFailure check only schedules retries
when a SubscriptionBundleUnpackFailed condition flips to True, which prevents
further AddAfter retries once the condition stays True; change the logic that
computes isNewFailure to instead detect any subscription that currently has
SubscriptionBundleUnpackFailed == True (iterate subs and if existingCond.Status
== corev1.ConditionTrue set a flag), and call the existing AddAfter scheduling
path for those failing subscriptions even when the condition was already True so
retries continue; keep the rest of the existing AddAfter call/path unchanged
(use the same queue key and delay) and remove the branch that skips scheduling
solely because the failure is not "new".

---

Nitpick comments:
In
`@staging/operator-lifecycle-manager/pkg/controller/operators/catalog/subscriptions_test.go`:
- Around line 1327-1340: The test is indexing Status.Conditions[0] which assumes
stable ordering; instead, locate the condition with Type
"SubscriptionBundleUnpackFailed" on both fetched and the expected s (e.g.,
iterate over fetched.Status.Conditions to find cond.Type ==
"SubscriptionBundleUnpackFailed") and copy that condition's Message into the
matching expected condition before require.Equal, then run the require.Contains
assertions against that found condition's Message. Update all uses of
fetched.Status.Conditions[0] and s.Status.Conditions[0] in the
"NoStatus/NoCurrentCSV/BundleUnpackFailed" case to use the condition lookup by
Type so assertions no longer rely on array index ordering.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a1febac9-4c7a-44b1-96ff-4aa178b1abab

📥 Commits

Reviewing files that changed from the base of the PR and between 3f946ad and aa437b9.

⛔ Files ignored due to path filters (2)
  • vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/bundle/bundle_unpacker.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (5)
  • staging/operator-lifecycle-manager/pkg/controller/bundle/bundle_unpacker.go
  • staging/operator-lifecycle-manager/pkg/controller/bundle/bundle_unpacker_test.go
  • staging/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go
  • staging/operator-lifecycle-manager/pkg/controller/operators/catalog/subscriptions_test.go
  • staging/operator-lifecycle-manager/test/e2e/subscription_e2e_test.go

Comment on lines +1401 to +1410
// Check if any subscription is transitioning to failed state (not already failed)
// to avoid scheduling redundant delayed requeues on every reconcile
isNewFailure := false
for _, sub := range subs {
existingCond := sub.Status.GetCondition(v1alpha1.SubscriptionBundleUnpackFailed)
if existingCond.Status != corev1.ConditionTrue {
isNewFailure = true
break
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Retry scheduling can stop after subsequent failed attempts

isNewFailure only checks whether SubscriptionBundleUnpackFailed=True already exists. On later failed attempts, that stays true, so Line 1477 skips AddAfter, which can stop automatic retry progression after TTL cleanup.

💡 Suggested fix
-			isNewFailure := false
+			isNewFailure := false
 			for _, sub := range subs {
-				existingCond := sub.Status.GetCondition(v1alpha1.SubscriptionBundleUnpackFailed)
-				if existingCond.Status != corev1.ConditionTrue {
+				existingFailed := sub.Status.GetCondition(v1alpha1.SubscriptionBundleUnpackFailed)
+				existingUnpacking := sub.Status.GetCondition(v1alpha1.SubscriptionBundleUnpacking)
+				// Treat transition from unpacking->failed as a new failure cycle.
+				if existingFailed.Status != corev1.ConditionTrue || existingUnpacking.Status == corev1.ConditionTrue {
 					isNewFailure = true
 					break
 				}
 			}

Also applies to: 1475-1483

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@staging/operator-lifecycle-manager/pkg/controller/operators/catalog/operator.go`
around lines 1401 - 1410, The current isNewFailure check only schedules retries
when a SubscriptionBundleUnpackFailed condition flips to True, which prevents
further AddAfter retries once the condition stays True; change the logic that
computes isNewFailure to instead detect any subscription that currently has
SubscriptionBundleUnpackFailed == True (iterate subs and if existingCond.Status
== corev1.ConditionTrue set a flag), and call the existing AddAfter scheduling
path for those failing subscriptions even when the condition was already True so
retries continue; keep the rest of the existing AddAfter call/path unchanged
(use the same queue key and delay) and remove the branch that skips scheduling
solely because the failure is not "new".

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 22, 2026

@openshift-bot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-upgrade-ovn-single-node aa437b9 link false /test e2e-aws-upgrade-ovn-single-node

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@bandrade
Copy link
Copy Markdown
Contributor

/label qe-approved
/verified by @bandrade

@openshift-ci openshift-ci Bot added the qe-approved Signifies that QE has signed off on this PR label May 25, 2026
@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 25, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@bandrade: This PR has been marked as verified by @bandrade.

Details

In response to this:

/label qe-approved
/verified by @bandrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 3f946ad and 2 for PR HEAD aa437b9 in total

@openshift-merge-bot openshift-merge-bot Bot merged commit 328957c into openshift:main May 25, 2026
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants