NO-ISSUE: test/e2e/upgrade: Raise default update-ack timeout to 10m by wking · Pull Request #30917 · openshift/origin

wking · 2026-03-20T19:08:25Z

It's failing in openshift/cluster-version-operator#1282 's client-go bump to Kube 1.35, with timelines like:

11:17:27, test-suite patches ClusterVersion to request the update.
11:17:27, CVO Events UpgradeStarted (although it hasn't actually pivoted the target yet, might want to adjust this Event name/timing) and RetrievePayload: Retrieving and verifying payload…
11:17:28, CVO creates version–fgn9s.
11:17:28, CVO starts watching version--fgn9s.
11:17:33, version--fgn9s container started. So it's not slow image pulls.
11:17:34, version--fgn9s container started again? Not clear what happened here.
11:17:37, rename-to-final-location container started (the last container in version-... Pods, it should be an atomic, single-filesystem mv).
11:17:38, rename-to-final-location exits 0, so success, but the CVO does not notice.
11:19:27, test-case times out after 2m.
11:21:27, CVO refreshes the watch on version--fgn9s.

and CVO logs like:

  I0320 11:17:58.306414       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 18.795159461s ago"
  I0320 11:18:08.306345       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 28.795076029s ago"
  ...
  I0320 11:21:18.307818       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 3m38.796537922s ago"
  I0320 11:21:27.596115       1 trace.go:236] Trace[23813503]: "Reflector WatchList" name:k8s.io/client-go/tools/watch/informerwatcher.go:162 (20-Mar-2026 11:17:28.301) (total time: 239294ms):

21 - 17 = 4m, and I'm giving it 10m to be safe. The bump allows work like openshift/cluster-version-operator#1282 to move forward while we troubleshoot the client issue. We'll definitely want to revert this once the 1.35 clients are fixed.

It's failing in [1]'s client-go bump to Kube 1.35, with timelines like [2]: * 11:17:27, test-suite patches ClusterVersion to request the update. * 11:17:27, CVO Events UpgradeStarted (although it hasn't actually pivoted the target yet, might want to adjust this Event name/timing) and RetrievePayload: Retrieving and verifying payload… * 11:17:28, CVO creates version–fgn9s. * 11:17:28, CVO starts watching version--fgn9s. * 11:17:33, version--fgn9s container started. So it's not slow image pulls. * 11:17:34, version--fgn9s container started again? Not clear what happened here. * 11:17:37, rename-to-final-location container started (the last container in version-... Pods, it should be an atomic, single-filesystem mv). * 11:17:38, rename-to-final-location exits 0, so success, but the CVO does not notice. * 11:19:27, test-case times out after 2m. * 11:21:27, CVO refreshes the watch on version--fgn9s. and CVO logs like: I0320 11:17:58.306414 1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 18.795159461s ago" I0320 11:18:08.306345 1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 28.795076029s ago" ... I0320 11:21:18.307818 1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 3m38.796537922s ago" I0320 11:21:27.596115 1 trace.go:236] Trace[23813503]: "Reflector WatchList" name:k8s.io/client-go/tools/watch/informerwatcher.go:162 (20-Mar-2026 11:17:28.301) (total time: 239294ms): 21 - 17 = 4m, and I'm giving it 10m to be safe. The bump allows work like [1] to move forward while we troubleshoot the client issue. We'll definitely want to revert this once the 1.35 clients are fixed. [1]: openshift/cluster-version-operator#1282 [2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1282/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-ovn-upgrade-out-of-change/2034935978228977664

openshift-ci-robot · 2026-03-20T19:08:29Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

openshift-ci-robot · 2026-03-20T19:08:31Z

@wking: This pull request explicitly references no jira issue.

Details

In response to this:

It's failing in openshift/cluster-version-operator#1282 's client-go bump to Kube 1.35, with timelines like:

11:17:27, test-suite patches ClusterVersion to request the update.

11:17:27, CVO Events UpgradeStarted (although it hasn't actually pivoted the target yet, might want to adjust this Event name/timing) and RetrievePayload: Retrieving and verifying payload…

11:17:28, CVO creates version–fgn9s.

11:17:28, CVO starts watching version--fgn9s.

11:17:33, version--fgn9s container started. So it's not slow image pulls.

11:17:34, version--fgn9s container started again? Not clear what happened here.

11:17:37, rename-to-final-location container started (the last container in version-... Pods, it should be an atomic, single-filesystem mv).

11:17:38, rename-to-final-location exits 0, so success, but the CVO does not notice.

11:19:27, test-case times out after 2m.

11:21:27, CVO refreshes the watch on version--fgn9s.

and CVO logs like:
 I0320 11:17:58.306414       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 18.795159461s ago"
 I0320 11:18:08.306345       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 28.795076029s ago"
 ...
 I0320 11:21:18.307818       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 3m38.796537922s ago"
 I0320 11:21:27.596115       1 trace.go:236] Trace[23813503]: "Reflector WatchList" name:k8s.io/client-go/tools/watch/informerwatcher.go:162 (20-Mar-2026 11:17:28.301) (total time: 239294ms):
21 - 17 = 4m, and I'm giving it 10m to be safe. The bump allows work like openshift/cluster-version-operator#1282 to move forward while we troubleshoot the client issue. We'll definitely want to revert this once the 1.35 clients are fixed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-03-20T19:08:55Z

Walkthrough

The default acknowledgment timeout for the Cluster Version Operator (CVO) update was increased from 2 minutes to 10 minutes in the upgrade test file. This timeout serves as the fallback value for non-BareMetal and non-OpenStack platforms and impacts flake-threshold checks.

Changes

Cohort / File(s)	Summary
CVO acknowledgment timeout configuration `test/e2e/upgrade/upgrade.go`	Increased default `cvoAckTimeout` value from `2 * time.Minute` to `10 * time.Minute` for CVO update acknowledgment handling and flake-threshold validation.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

test/e2e/upgrade/upgrade.go (1)
85-85: Decouple ack timeout from flake-threshold signal

This constant now controls both the wait timeout and the flake threshold (Line 528), which effectively removes the slow-ack flake signal on current platform branches. Consider introducing a separate flake-threshold constant (or an explicit TODO+issue expiry) so this temporary timeout bump doesn’t silently reduce regression visibility.

As per coding guidelines, "-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/upgrade/upgrade.go` at line 85, The constant
defaultCVOUpdateAckTimeout currently gates both the wait timeout and the
flake-threshold signal (referenced by defaultCVOUpdateAckTimeout and the flake
check usage), which hides flake regressions when you temporarily bump the
timeout; introduce a separate constant (e.g., defaultCVOFlakeThreshold or
cvoUpdateFlakeThreshold) and replace the flake-threshold usage to read that new
constant while leaving defaultCVOUpdateAckTimeout as the actual wait timeout,
and update any references in the flake-detection logic and comments to use the
new name so the two behaviors are decoupled.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/e2e/upgrade/upgrade.go`:
- Line 85: The constant defaultCVOUpdateAckTimeout currently gates both the wait
timeout and the flake-threshold signal (referenced by defaultCVOUpdateAckTimeout
and the flake check usage), which hides flake regressions when you temporarily
bump the timeout; introduce a separate constant (e.g., defaultCVOFlakeThreshold
or cvoUpdateFlakeThreshold) and replace the flake-threshold usage to read that
new constant while leaving defaultCVOUpdateAckTimeout as the actual wait
timeout, and update any references in the flake-detection logic and comments to
use the new name so the two behaviors are decoupled.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 46d340d1-0aad-4cf6-8be6-9a4bf5751029

📥 Commits

Reviewing files that changed from the base of the PR and between 1883631 and 6c8653c.

📒 Files selected for processing (1)

test/e2e/upgrade/upgrade.go

JoelSpeed · 2026-03-20T19:13:30Z

/lgtm

neisw · 2026-03-20T19:20:41Z

/approve

openshift-ci · 2026-03-20T19:22:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed, neisw, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [neisw]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2026-03-20T19:41:25Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-ovn-upgrade-rollback

openshift-ci · 2026-03-20T23:44:27Z

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

JoelSpeed · 2026-03-21T09:10:26Z

/verified by E2E passing

openshift-ci-robot · 2026-03-21T09:10:38Z

@JoelSpeed: This PR has been marked as verified by E2E passing.

Details

In response to this:

/verified by E2E passing

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 20, 2026

wking changed the title ~~NO-issue: test/e2e/upgrade: Raise default update-ack timeout to 10m~~ NO-ISSUE: test/e2e/upgrade: Raise default update-ack timeout to 10m Mar 20, 2026

openshift-ci bot requested review from p0lyn0mial and sjenning March 20, 2026 19:09

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

openshift-ci bot assigned JoelSpeed Mar 20, 2026

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 20, 2026

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 20, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 21, 2026

openshift-merge-bot bot merged commit 0509de9 into openshift:main Mar 21, 2026
22 checks passed

wking deleted the raise-update-ack-timeout branch March 21, 2026 14:40

wking mentioned this pull request Mar 21, 2026

OCPBUGS-78997: Revert "NO-ISSUE: test/e2e/upgrade: Raise default update-ack timeout to 10m" #30919

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NO-ISSUE: test/e2e/upgrade: Raise default update-ack timeout to 10m#30917

NO-ISSUE: test/e2e/upgrade: Raise default update-ack timeout to 10m#30917
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
wking:raise-update-ack-timeout

wking commented Mar 20, 2026

Uh oh!

openshift-ci-robot commented Mar 20, 2026

Uh oh!

openshift-ci-robot commented Mar 20, 2026

Uh oh!

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

JoelSpeed commented Mar 20, 2026

Uh oh!

neisw commented Mar 20, 2026

Uh oh!

openshift-ci bot commented Mar 20, 2026

Uh oh!

openshift-ci-robot commented Mar 20, 2026

Uh oh!

openshift-ci bot commented Mar 20, 2026

Uh oh!

JoelSpeed commented Mar 21, 2026

Uh oh!

openshift-ci-robot commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wking commented Mar 20, 2026

Uh oh!

openshift-ci-robot commented Mar 20, 2026

Uh oh!

openshift-ci-robot commented Mar 20, 2026

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

JoelSpeed commented Mar 20, 2026

Uh oh!

neisw commented Mar 20, 2026

Uh oh!

openshift-ci bot commented Mar 20, 2026

Uh oh!

openshift-ci-robot commented Mar 20, 2026

Uh oh!

openshift-ci bot commented Mar 20, 2026

Uh oh!

JoelSpeed commented Mar 21, 2026

Uh oh!

openshift-ci-robot commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coderabbitai bot commented Mar 20, 2026 •

edited

Loading