Skip to content

Run MultiNetworkPolicy test 20x in parallel for flake detection#79572

Open
weliang1 wants to merge 5 commits into
openshift:mainfrom
weliang1:multinetworkpolicy-20x-parallel-test
Open

Run MultiNetworkPolicy test 20x in parallel for flake detection#79572
weliang1 wants to merge 5 commits into
openshift:mainfrom
weliang1:multinetworkpolicy-20x-parallel-test

Conversation

@weliang1
Copy link
Copy Markdown
Contributor

@weliang1 weliang1 commented May 20, 2026

Summary

Configure the nightly-4.22 e2e-aws-ovn-dedicated-serial-techpreview periodic job to execute the [sig-network][Feature:MultiNetworkPolicy] test 20 times in parallel for improved flake detection and stability testing.

Changes

  • Increased shard_count: 2 → 20 to create 20 parallel job instances
  • Added TEST_ARGS filter: --run \[sig-network\]\[Feature:MultiNetworkPolicy\] to run only MultiNetworkPolicy tests
  • Each of the 20 shards runs the same test filter concurrently

Purpose

This configuration provides 20x test coverage for the MultiNetworkPolicy feature to:

  • Detect flaky tests that may not appear in single runs
  • Ensure test stability across concurrent executions
  • Support investigation of OCPBUGS-85529 (policy race on secondary IPv6 network)

Modified Files

  • ci-operator/config/openshift/release/openshift-release-main__nightly-4.22.yaml - Source configuration
  • ci-operator/jobs/openshift/release/openshift-release-main-periodics.yaml - Generated Prow jobs

Testing Plan

  • Use /pj-rehearse to validate job configuration
  • Monitor rehearsal results for any configuration errors
  • Verify all 20 shards execute correctly

Related

  • OCPBUGS-85529: Policy race on secondary IPv6 network

/assign @weliang1

Changes to OpenShift Release CI Configuration

This PR updates OpenShift release CI (openshift/release) configuration for nightly-4.22 e2e runs to target the sig-network Feature:MultiNetworkPolicy tests and to enable rehearsal of a filtered, sharded job.

Practical effects:

  • The e2e job(s) for nightly-4.22 were modified to add TEST_ARGS: --run [sig-network][Feature:MultiNetworkPolicy] so CI runs are filtered to only the MultiNetworkPolicy tests.
  • A temporary presubmit-style job was added (optional, run_if_changed scoped to this YAML) so the filtered workflow can be rehearsed pre-merge.
  • The original plan to run 20 parallel shards (shard_count: 20) to detect flakes was attempted but later rolled back in generated commits: the repo now contains a reduced shard_count (2 in presubmit/test variants) and the large 20x shard expansion is not present in the current generated periodics.
  • The author experimented with setting TEST_SUITE at the job level (openshift/conformance/serial) to force the filter to apply; this was later removed/adjusted because setting TEST_SUITE could cause the entire serial suite to run before filtering (undesirable when sharding broadly).

Why this was done:

  • Purpose is to detect flaky MultiNetworkPolicy tests and investigate OCPBUGS-85529 (policy race on a secondary IPv6 network) by exercising the same filtered tests across multiple CI instances.

Validation and troubleshooting done / recommended:

  • The author ran /pj-rehearse and repeated rehearse commands to validate job wiring; initial rehearsals showed the filter did not run as expected.
  • A follow-up fix attempted to set TEST_SUITE at job-level to resolve ambiguity; the author recorded monitoring steps to check rehearsal JUnit output and, if tests still did not run, investigate (a) whether the tests exist in the 4.22 openshift-tests binary, (b) alternative test-suite/filter syntax, or (c) requiring a custom test step.
  • A generated commit was created to add the temporary presubmit job for rehearsal; commit history shows the shard_count was reduced after the initial 20x idea.

Files changed:

  • ci-operator/config/openshift/release/openshift-release-main__nightly-4.22.yaml (adds TEST_ARGS filter; presubmit job metadata)
  • ci-operator/jobs/openshift/release/openshift-release-main-periodics.yaml (generated job manifests updated; shows the effective periodic job definitions)

Summary conclusion:

  • The PR narrows e2e runs in the nightly-4.22 pipeline to the sig-network Feature:MultiNetworkPolicy tests and adds a rehearsal path. The aggressive 20x sharding objective was attempted but not retained in the current generated configuration; the author iterated on TEST_SUITE and shard_count to ensure the filter actually limits executed tests without unintentionally running the whole serial suite across many shards.

Configure the nightly-4.22 e2e-aws-ovn-dedicated-serial-techpreview
periodic job to execute the [sig-network][Feature:MultiNetworkPolicy]
test 20 times in parallel.

Changes:
- Set shard_count to 20 (was 2) to create 20 parallel job instances
- Add TEST_ARGS filter to run only MultiNetworkPolicy tests
- Each shard runs the same test filter concurrently

Purpose: Detect flaky tests and ensure test stability across
multiple concurrent executions for OCPBUGS-85529.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@weliang1
Copy link
Copy Markdown
Contributor Author

/pj-rehearse

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 32f75865-2bf9-4267-a58b-a885ef6d74b3

📥 Commits

Reviewing files that changed from the base of the PR and between c1a0b52 and 71a0bf4.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift/release/openshift-release-main-presubmits.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (1)
  • ci-operator/config/openshift/release/openshift-release-main__nightly-4.22.yaml

Walkthrough

Adds env.TEST_ARGS to the e2e-aws-ovn-dedicated-serial-techpreview job to run only sig-network tests labeled Feature:MultiNetworkPolicy. Also adds e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test as an optional presubmit with run_if_changed limited to this YAML and shard_count: 2.

Changes

Test Job Configuration

Layer / File(s) Summary
Update serial techpreview job
ci-operator/config/openshift/release/openshift-release-main__nightly-4.22.yaml
Adds env.TEST_ARGS to run only sig-network tests with Feature:MultiNetworkPolicy for e2e-aws-ovn-dedicated-serial-techpreview.
Add presubmit techpreview job
ci-operator/config/openshift/release/openshift-release-main__nightly-4.22.yaml
Introduces e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test with optional: true, run_if_changed scoped to this file, shard_count: 2, and env.TEST_ARGS plus DEDICATED_HOST/FEATURE_SET.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

ok-to-test

Suggested reviewers

  • petr-muller
🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Run MultiNetworkPolicy test 20x in parallel for flake detection' clearly and specifically describes the main change: configuring jobs to run MultiNetworkPolicy tests with increased parallelism (20 shards) for flake detection, matching the PR's core objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR modifies only CI configuration YAML files, not Ginkgo test code. No test definitions with It(), Describe(), etc. are present.
Test Structure And Quality ✅ Passed The custom check targets Ginkgo test code quality, but this PR only modifies CI/CD configuration files (YAML) without adding or changing any test code. The check is not applicable.
Microshift Test Compatibility ✅ Passed PR does not add new Ginkgo e2e tests; only CI configuration changes to filter and parallelize existing tests. Check not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR modifies only YAML CI job configuration, not new test implementations. SNO check applies only to new Ginkgo e2e tests; no test code added here.
Topology-Aware Scheduling Compatibility ✅ Passed This PR modifies CI test configuration, not deployment manifests, operator code, or controllers. The check applies only to those categories and is not applicable here.
Ote Binary Stdout Contract ✅ Passed PR contains only YAML configuration changes to CI infrastructure. No source code modifications to OTE binaries or process-level code that could violate the stdout contract.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR modifies CI configuration (YAML) to run existing tests in parallel; no new Ginkgo test implementations are added. The check applies only to new tests, not test configuration changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from petr-muller and smg247 May 20, 2026 21:23
@weliang1
Copy link
Copy Markdown
Contributor Author

/pj-rehearse ack

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot openshift-merge-bot Bot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label May 20, 2026
- Add TEST_SUITE: openshift/conformance/serial to env section
- Ensures test suite is explicitly defined for MultiNetworkPolicy filtering
- Matches configuration pattern used in multus-networkpolicy repo

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@weliang1
Copy link
Copy Markdown
Contributor Author

Fix Attempt: Explicitly Set TEST_SUITE

Issue Found: Initial rehearsal runs showed that MultiNetworkPolicy tests were not executed despite the TEST_ARGS filter being configured.

Fix Applied: Explicitly set TEST_SUITE: openshift/conformance/serial in the env section to ensure the test suite is unambiguously defined for the MultiNetworkPolicy test filter.

Rationale: While the workflow openshift-e2e-aws-ovn-serial sets TEST_SUITE by default, explicitly setting it at the job level ensures there's no ambiguity in test suite selection.

Next Steps:

  1. Monitor new rehearsal jobs triggered by commit a711e6b
  2. Verify that MultiNetworkPolicy tests actually execute (check junit XML files)
  3. If tests still don't run, investigate:
    • Whether MultiNetworkPolicy tests exist in the openshift-tests binary for 4.22
    • Alternative test suite or filter syntax
    • Potential need for custom test step configuration

/pj-rehearse

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot openshift-merge-bot Bot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label May 21, 2026
weliang1 and others added 2 commits May 21, 2026 09:02
Remove TEST_SUITE to properly filter MultiNetworkPolicy tests.
The previous config ran all 2290 serial conformance tests instead of
just the filtered MultiNetworkPolicy tests because TEST_SUITE loaded
the entire openshift/conformance/serial suite before applying the
--run filter.

Changes:
- Remove TEST_SUITE: openshift/conformance/serial
- Reduce shard_count from 20 to 2 (only running filtered tests now)
- Keep TEST_ARGS: --run \[sig-network\]\[Feature:MultiNetworkPolicy\]

This will significantly reduce test execution time by only running
the MultiNetworkPolicy tests across 2 shards instead of 2290 tests
across 20 shards.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This regenerates the periodic jobs to match the updated config where
shard_count was reduced from 20 to 2 for the MultiNetworkPolicy test job.

Generated changes:
- Removed shards 3-20 (18 jobs no longer needed)
- Updated to 2 shards: 1of2 and 2of2
- Total reduction: 1536 lines removed

Generated with: make jobs

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@openshift-merge-bot openshift-merge-bot Bot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label May 21, 2026
@weliang1
Copy link
Copy Markdown
Contributor Author

/pj-rehearse

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: no rehearsable tests are affected by this change

@weliang1
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: job(s): periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview either don't exist or were not found to be affected, and cannot be rehearsed

@weliang1
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-1of2

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: job(s): periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-1of2 either don't exist or were not found to be affected, and cannot be rehearsed

This adds a TEMPORARY presubmit version of the periodic job that can be
rehearsed for pre-merge testing.

Jobs created:
- pull-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test-1of2
- pull-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test-2of2

Configuration (same as periodic):
- shard_count: 2
- TEST_ARGS: --run \[sig-network\]\[Feature:MultiNetworkPolicy\]
- TEST_SUITE: removed (filtering only via TEST_ARGS)

Usage:
- /pj-rehearse nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test
- or /test nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test-1of2

**NOTE: This is TEMPORARY for testing only. Will be removed before final merge.**

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: weliang1
Once this PR has been reviewed and has the lgtm label, please assign xueqzhan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@weliang1
Copy link
Copy Markdown
Contributor Author

/pj-rehearse nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot openshift-merge-bot Bot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label May 21, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@weliang1: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test-1of2 openshift/release presubmit Presubmit changed
pull-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test-2of2 openshift/release presubmit Presubmit changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: job(s): nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test either don't exist or were not found to be affected, and cannot be rehearsed

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@weliang1: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-10of20 a711e6b link unknown /pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-10of20

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@weliang1
Copy link
Copy Markdown
Contributor Author

/pj-rehearse

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@weliang1
Copy link
Copy Markdown
Contributor Author

/test nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test-1of2

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@weliang1: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test boskos-config
/test boskos-config-generation
/test check-gh-automation
/test check-gh-automation-tide
/test check-trigger-trusted-apps
/test ci-operator-config
/test ci-operator-config-metadata
/test ci-operator-registry
/test ci-secret-bootstrap-config-validation
/test ci-testgrid-allow-list
/test clusterimageset-validate
/test config
/test core-valid
/test generated-config
/test generated-dashboards
/test hyperfleet-risk-scorer-test
/test image-mirroring-config-validation
/test jira-lifecycle-config
/test labels
/test openshift-image-mirror-mappings
/test ordered-prow-config
/test owners
/test pr-reminder-config
/test prow-config
/test prow-config-filenames
/test prow-config-semantics
/test pylint
/test release-config
/test release-controller-config
/test rover-groups-config-validation
/test secret-generator-config-valid
/test services-valid
/test stackrox-stackrox-stackrox-stackrox-check
/test step-registry-metadata
/test step-registry-shellcheck
/test sync-rover-groups
/test verified-config
/test yamllint

The following commands are available to trigger optional jobs:

/test check-cluster-profiles-config

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-release-check-gh-automation
pull-ci-openshift-release-main-ci-operator-config
pull-ci-openshift-release-main-ci-operator-config-metadata
pull-ci-openshift-release-main-ci-operator-registry
pull-ci-openshift-release-main-config
pull-ci-openshift-release-main-core-valid
pull-ci-openshift-release-main-generated-config
pull-ci-openshift-release-main-ordered-prow-config
pull-ci-openshift-release-main-owners
pull-ci-openshift-release-main-prow-config-filenames
pull-ci-openshift-release-main-prow-config-semantics
pull-ci-openshift-release-main-release-controller-config
pull-ci-openshift-release-openshift-image-mirror-mappings
pull-ci-openshift-release-yamllint
Details

In response to this:

/test nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test-1of2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@weliang1
Copy link
Copy Markdown
Contributor Author

/pj-rehearse pull-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test-1of2

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@weliang1
Copy link
Copy Markdown
Contributor Author

/pj-rehearse pull-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated-serial-techpreview-presubmit-test-2of2

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@weliang1: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant