:bug: Fix race condition in e2e code coverage collection by pedjak · Pull Request #2644 · operator-framework/operator-controller

pedjak · 2026-04-13T11:06:13Z

Description

hack/test/e2e-coverage.sh has a race condition that causes intermittent
coverage data loss in CI. kubectl scale --replicas=0 is non-blocking —
it returns as soon as the API server accepts the change, not when pods
have terminated. The existing wait --for=condition=ready on the copy pod
was a no-op since it was already running. This meant kubectl cp could
execute before manager pods had terminated and flushed coverage data to
the PVC.

The fix replaces the no-op wait with kubectl wait --for=delete on the
manager pods, ensuring they have fully terminated and the Go coverage
runtime has written its data before copying.

Reviewer Checklist

API Go Documentation
Tests: Unit Tests (and E2E Tests, if appropriate)
Comprehensive Commit Messages
Links to related GitHub Issue(s)

kubectl scale --replicas=0 is non-blocking and returns as soon as the API server accepts the change, not when pods have terminated. The existing wait on the copy pod was a no-op since it was already running. This meant kubectl cp could run before manager pods had terminated and flushed coverage data to the PVC. Wait for each deployment's .status.replicas to reach 0 before copying, ensuring the Go coverage runtime has written its data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

netlify · 2026-04-13T11:06:20Z

✅ Deploy Preview for olmv1 ready!

Name	Link
🔨 Latest commit	`df9cdbe`
🔍 Latest deploy log	https://app.netlify.com/projects/olmv1/deploys/69dcd443a0e3240008074304
😎 Deploy Preview	https://deploy-preview-2644--olmv1.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes intermittent e2e coverage loss by ensuring manager deployments have fully scaled down (and thus flushed coverage data to the PVC) before kubectl cp runs.

Changes:

Replaces “wait for copy pod ready” with waits for both manager deployments’ replica counts to reach 0.
Adds explicit timeouts to the waits to avoid hanging indefinitely.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T11:07:56Z

hack/test/e2e-coverage.sh

+kubectl -n "$OPERATOR_CONTROLLER_NAMESPACE" wait --for=jsonpath='{.status.replicas}'=0 deployment/"$OPERATOR_CONTROLLER_MANAGER_DEPLOYMENT_NAME" --timeout=60s
+kubectl -n "$CATALOGD_NAMESPACE" wait --for=jsonpath='{.status.replicas}'=0 deployment/"$CATALOGD_MANAGER_DEPLOYMENT_NAME" --timeout=60s


DeploymentStatus.replicas is an optional (omitempty) field and may be absent when it is 0. In that case, kubectl wait --for=jsonpath='{.status.replicas}'=0 may never match (empty string != 0) and can time out intermittently. A more robust approach is to wait for the underlying pods to be deleted (e.g., kubectl wait --for=delete pod -l <selector> --timeout=...) or implement a small polling loop that treats empty .status.replicas as 0.

Copilot · 2026-04-13T11:07:57Z

hack/test/e2e-coverage.sh

+kubectl -n "$OPERATOR_CONTROLLER_NAMESPACE" wait --for=jsonpath='{.status.replicas}'=0 deployment/"$OPERATOR_CONTROLLER_MANAGER_DEPLOYMENT_NAME" --timeout=60s
+kubectl -n "$CATALOGD_NAMESPACE" wait --for=jsonpath='{.status.replicas}'=0 deployment/"$CATALOGD_MANAGER_DEPLOYMENT_NAME" --timeout=60s


Hard-coding --timeout=60s risks CI flakes on slower clusters or during API server pressure (scale-down and termination can exceed 60s). Consider making the timeout configurable (env var with a sensible default) and/or increasing it to a more conservative value to reduce intermittent failures.

kubectl scale --replicas=0 is non-blocking and returns as soon as the API server accepts the change, not when pods have terminated. The existing wait on the copy pod was a no-op since it was already running. This meant kubectl cp could run before manager pods had terminated and flushed coverage data to the PVC. Wait for manager pods to be deleted before copying, ensuring the Go coverage runtime has written its data on process exit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codecov · 2026-04-13T11:53:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.94%. Comparing base (dd57c28) to head (df9cdbe).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2644      +/-   ##
==========================================
+ Coverage   68.92%   68.94%   +0.02%     
==========================================
  Files         140      140              
  Lines        9905     9905              
==========================================
+ Hits         6827     6829       +2     
+ Misses       2566     2565       -1     
+ Partials      512      511       -1

Flag	Coverage Δ
e2e	`37.53% <ø> (-0.28%)`	⬇️
experimental-e2e	`52.39% <ø> (ø)`
unit	`53.58% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

openshift-ci · 2026-04-13T21:17:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: camilamacedo86

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~hack/OWNERS~~ [camilamacedo86]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot AI review requested due to automatic review settings April 13, 2026 11:06

openshift-ci bot requested review from bentito and joelanford April 13, 2026 11:06

Copilot started reviewing on behalf of pedjak April 13, 2026 11:06 View session

pedjak changed the title ~~🐛 Fix race condition in e2e coverage collection~~ 🐛 Fix race condition in e2e code coverage collection Apr 13, 2026

Copilot AI reviewed Apr 13, 2026

View reviewed changes

pedjak force-pushed the worktree-fix-e2e-coverage-race branch from 878686f to df9cdbe Compare April 13, 2026 11:32

camilamacedo86 approved these changes Apr 13, 2026

View reviewed changes

openshift-ci bot assigned camilamacedo86 Apr 13, 2026

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 13, 2026

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 13, 2026

openshift-merge-bot bot merged commit c641e2f into operator-framework:main Apr 13, 2026
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Fix race condition in e2e code coverage collection#2644

🐛 Fix race condition in e2e code coverage collection#2644
openshift-merge-bot[bot] merged 2 commits intooperator-framework:mainfrom
pedjak:worktree-fix-e2e-coverage-race

pedjak commented Apr 13, 2026 •

edited

Loading

Uh oh!

netlify bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

codecov bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

openshift-ci bot commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		kubectl -n "$OPERATOR_CONTROLLER_NAMESPACE" wait --for=jsonpath='{.status.replicas}'=0 deployment/"$OPERATOR_CONTROLLER_MANAGER_DEPLOYMENT_NAME" --timeout=60s
		kubectl -n "$CATALOGD_NAMESPACE" wait --for=jsonpath='{.status.replicas}'=0 deployment/"$CATALOGD_MANAGER_DEPLOYMENT_NAME" --timeout=60s

Conversation

pedjak commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Reviewer Checklist

Uh oh!

netlify bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for olmv1 ready!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

openshift-ci bot commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pedjak commented Apr 13, 2026 •

edited

Loading

netlify bot commented Apr 13, 2026 •

edited

Loading

codecov bot commented Apr 13, 2026 •

edited

Loading