Skip to content

NO-ISSUE: test/e2e/upgrade: Raise default update-ack timeout to 10m#30917

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
wking:raise-update-ack-timeout
Mar 21, 2026
Merged

NO-ISSUE: test/e2e/upgrade: Raise default update-ack timeout to 10m#30917
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
wking:raise-update-ack-timeout

Conversation

@wking
Copy link
Member

@wking wking commented Mar 20, 2026

It's failing in openshift/cluster-version-operator#1282 's client-go bump to Kube 1.35, with timelines like:

  • 11:17:27, test-suite patches ClusterVersion to request the update.
  • 11:17:27, CVO Events UpgradeStarted (although it hasn't actually pivoted the target yet, might want to adjust this Event name/timing) and RetrievePayload: Retrieving and verifying payload…
  • 11:17:28, CVO creates version–fgn9s.
  • 11:17:28, CVO starts watching version--fgn9s.
  • 11:17:33, version--fgn9s container started. So it's not slow image pulls.
  • 11:17:34, version--fgn9s container started again? Not clear what happened here.
  • 11:17:37, rename-to-final-location container started (the last container in version-... Pods, it should be an atomic, single-filesystem mv).
  • 11:17:38, rename-to-final-location exits 0, so success, but the CVO does not notice.
  • 11:19:27, test-case times out after 2m.
  • 11:21:27, CVO refreshes the watch on version--fgn9s.

and CVO logs like:

  I0320 11:17:58.306414       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 18.795159461s ago"
  I0320 11:18:08.306345       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 28.795076029s ago"
  ...
  I0320 11:21:18.307818       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 3m38.796537922s ago"
  I0320 11:21:27.596115       1 trace.go:236] Trace[23813503]: "Reflector WatchList" name:k8s.io/client-go/tools/watch/informerwatcher.go:162 (20-Mar-2026 11:17:28.301) (total time: 239294ms):

21 - 17 = 4m, and I'm giving it 10m to be safe. The bump allows work like openshift/cluster-version-operator#1282 to move forward while we troubleshoot the client issue. We'll definitely want to revert this once the 1.35 clients are fixed.

It's failing in [1]'s client-go bump to Kube 1.35, with timelines like
[2]:

* 11:17:27, test-suite patches ClusterVersion to request the update.
* 11:17:27, CVO Events UpgradeStarted (although it hasn't actually pivoted the target yet, might want to adjust this Event name/timing) and RetrievePayload: Retrieving and verifying payload…
* 11:17:28, CVO creates version–fgn9s.
* 11:17:28, CVO starts watching version--fgn9s.
* 11:17:33, version--fgn9s container started.  So it's not slow image pulls.
* 11:17:34, version--fgn9s container started again?  Not clear what happened here.
* 11:17:37, rename-to-final-location container started (the last container in version-... Pods, it should be an atomic, single-filesystem mv).
* 11:17:38, rename-to-final-location exits 0, so success, but the CVO does not notice.
* 11:19:27, test-case times out after 2m.
* 11:21:27, CVO refreshes the watch on version--fgn9s.

and CVO logs like:

  I0320 11:17:58.306414       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 18.795159461s ago"
  I0320 11:18:08.306345       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 28.795076029s ago"
  ...
  I0320 11:21:18.307818       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 3m38.796537922s ago"
  I0320 11:21:27.596115       1 trace.go:236] Trace[23813503]: "Reflector WatchList" name:k8s.io/client-go/tools/watch/informerwatcher.go:162 (20-Mar-2026 11:17:28.301) (total time: 239294ms):

21 - 17 = 4m, and I'm giving it 10m to be safe.  The bump allows work
like [1] to move forward while we troubleshoot the client issue.
We'll definitely want to revert this once the 1.35 clients are fixed.

[1]: openshift/cluster-version-operator#1282
[2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1282/pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-ovn-upgrade-out-of-change/2034935978228977664
@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 20, 2026
@openshift-ci-robot
Copy link

@wking: This pull request explicitly references no jira issue.

Details

In response to this:

It's failing in openshift/cluster-version-operator#1282 's client-go bump to Kube 1.35, with timelines like:

  • 11:17:27, test-suite patches ClusterVersion to request the update.
  • 11:17:27, CVO Events UpgradeStarted (although it hasn't actually pivoted the target yet, might want to adjust this Event name/timing) and RetrievePayload: Retrieving and verifying payload…
  • 11:17:28, CVO creates version–fgn9s.
  • 11:17:28, CVO starts watching version--fgn9s.
  • 11:17:33, version--fgn9s container started. So it's not slow image pulls.
  • 11:17:34, version--fgn9s container started again? Not clear what happened here.
  • 11:17:37, rename-to-final-location container started (the last container in version-... Pods, it should be an atomic, single-filesystem mv).
  • 11:17:38, rename-to-final-location exits 0, so success, but the CVO does not notice.
  • 11:19:27, test-case times out after 2m.
  • 11:21:27, CVO refreshes the watch on version--fgn9s.

and CVO logs like:

 I0320 11:17:58.306414       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 18.795159461s ago"
 I0320 11:18:08.306345       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 28.795076029s ago"
 ...
 I0320 11:21:18.307818       1 reflector.go:1159] "Warning: event bookmark expired" err="k8s.io/client-go/tools/watch/informerwatcher.go:162: hasn't received required bookmark event marking the end of initial events stream, received last event 3m38.796537922s ago"
 I0320 11:21:27.596115       1 trace.go:236] Trace[23813503]: "Reflector WatchList" name:k8s.io/client-go/tools/watch/informerwatcher.go:162 (20-Mar-2026 11:17:28.301) (total time: 239294ms):

21 - 17 = 4m, and I'm giving it 10m to be safe. The bump allows work like openshift/cluster-version-operator#1282 to move forward while we troubleshoot the client issue. We'll definitely want to revert this once the 1.35 clients are fixed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@wking wking changed the title NO-issue: test/e2e/upgrade: Raise default update-ack timeout to 10m NO-ISSUE: test/e2e/upgrade: Raise default update-ack timeout to 10m Mar 20, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 20, 2026

Walkthrough

The default acknowledgment timeout for the Cluster Version Operator (CVO) update was increased from 2 minutes to 10 minutes in the upgrade test file. This timeout serves as the fallback value for non-BareMetal and non-OpenStack platforms and impacts flake-threshold checks.

Changes

Cohort / File(s) Summary
CVO acknowledgment timeout configuration
test/e2e/upgrade/upgrade.go
Increased default cvoAckTimeout value from 2 * time.Minute to 10 * time.Minute for CVO update acknowledgment handling and flake-threshold validation.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from p0lyn0mial and sjenning March 20, 2026 19:09
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e/upgrade/upgrade.go (1)

85-85: Decouple ack timeout from flake-threshold signal

This constant now controls both the wait timeout and the flake threshold (Line 528), which effectively removes the slow-ack flake signal on current platform branches. Consider introducing a separate flake-threshold constant (or an explicit TODO+issue expiry) so this temporary timeout bump doesn’t silently reduce regression visibility.

As per coding guidelines, "-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/upgrade/upgrade.go` at line 85, The constant
defaultCVOUpdateAckTimeout currently gates both the wait timeout and the
flake-threshold signal (referenced by defaultCVOUpdateAckTimeout and the flake
check usage), which hides flake regressions when you temporarily bump the
timeout; introduce a separate constant (e.g., defaultCVOFlakeThreshold or
cvoUpdateFlakeThreshold) and replace the flake-threshold usage to read that new
constant while leaving defaultCVOUpdateAckTimeout as the actual wait timeout,
and update any references in the flake-detection logic and comments to use the
new name so the two behaviors are decoupled.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/e2e/upgrade/upgrade.go`:
- Line 85: The constant defaultCVOUpdateAckTimeout currently gates both the wait
timeout and the flake-threshold signal (referenced by defaultCVOUpdateAckTimeout
and the flake check usage), which hides flake regressions when you temporarily
bump the timeout; introduce a separate constant (e.g., defaultCVOFlakeThreshold
or cvoUpdateFlakeThreshold) and replace the flake-threshold usage to read that
new constant while leaving defaultCVOUpdateAckTimeout as the actual wait
timeout, and update any references in the flake-detection logic and comments to
use the new name so the two behaviors are decoupled.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 46d340d1-0aad-4cf6-8be6-9a4bf5751029

📥 Commits

Reviewing files that changed from the base of the PR and between 1883631 and 6c8653c.

📒 Files selected for processing (1)
  • test/e2e/upgrade/upgrade.go

@JoelSpeed
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 20, 2026
@neisw
Copy link
Contributor

neisw commented Mar 20, 2026

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed, neisw, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 20, 2026
@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-ovn-upgrade-rollback

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2026

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@JoelSpeed
Copy link
Contributor

/verified by E2E passing

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 21, 2026
@openshift-ci-robot
Copy link

@JoelSpeed: This PR has been marked as verified by E2E passing.

Details

In response to this:

/verified by E2E passing

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 0509de9 into openshift:main Mar 21, 2026
22 checks passed
@wking wking deleted the raise-update-ack-timeout branch March 21, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants