Skip to content

DNM/TEST: Test udn settling sleep#31226

Open
jluhrsen wants to merge 2 commits into
openshift:mainfrom
jluhrsen:test-udn-settling-sleep
Open

DNM/TEST: Test udn settling sleep#31226
jluhrsen wants to merge 2 commits into
openshift:mainfrom
jluhrsen:test-udn-settling-sleep

Conversation

@jluhrsen
Copy link
Copy Markdown
Contributor

@jluhrsen jluhrsen commented May 28, 2026

Summary by CodeRabbit

  • Tests
    • Added a 15s wait after pod IPs are obtained to allow network settling before reachability checks.
    • Reworked connectivity checks into a 30s, time-windowed retry loop with repeated health probes.
    • Improved failure handling to tolerate occasional timeouts (limited) while failing on persistent or non-timeout errors.
    • Enhanced test logs with namespace/pod identifiers and elapsed time for easier diagnosis.

UDN tests are known to be heavy and these tests will fail on the
first dropped connection which seems to happen randomly, albeit
rarely.

test: allow single timeout in UDN KAPI reachability check

Hardens e2e test against transient networking blips while
preserving strict failure detection for real regressions.

Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@jluhrsen jluhrsen changed the title Test udn settling sleep DNM/TEST: Test udn settling sleep May 28, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 82f81192-f886-447b-83b5-48c51c3b363e

📥 Commits

Reviewing files that changed from the base of the PR and between e398282 and 9ae736b.

📒 Files selected for processing (1)
  • test/extended/networking/network_segmentation.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/extended/networking/network_segmentation.go

Walkthrough

Adds a 15s settling wait after default-network pod IPs are discovered, replaces the single in-pod curl healthz assertion with a 30s bounded retry loop that requires repeated ok responses and tolerates one curl timeout, and adds isCurlExitCode28 to detect curl exit code 28 timeouts.

Changes

Network Reachability Test Resilience

Layer / File(s) Summary
Network settling wait
test/extended/networking/network_segmentation.go
Adds a fixed 15s sleep after default-network pod IPs are obtained and logs start/finish with namespace/pod and elapsed time.
Curl timeout detection helper
test/extended/networking/network_segmentation.go
Adds isCurlExitCode28(err error) bool to detect curl timeout failures by matching rc: 28 in RunKubectl error output.
In-pod kapi healthz retry loop
test/extended/networking/network_segmentation.go
Replaces the prior single "curl eventually succeeds" check with a 30s retry loop inside the UDN pod that requires multiple successful curl https://kubernetes.default/healthz responses (stdout trimmed == ok), tolerates at most one curl timeout, tracks consecutive failures, and fails immediately on any non-timeout curl error or excessive timeout/consecutive-failure conditions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Test violates single responsibility principle: "is isolated from the default network" test asserts 6+ unrelated network isolation behaviors in one block instead of separate tests. Split the isolation test into separate tests for each behavior: default→UDN isolation, default↔default connectivity, UDN→host isolation, UDN→KAPI reachability, and UDN→KAPI non-reachability via default interface.
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning Line 493 uses fmt.Sprintf("https://%s/healthz", kapiIP) where kapiIP may be IPv6. IPv6 URLs require brackets; without them, colons are misinterpreted as port separators, breaking IPv6-only clusters. Use net.JoinHostPort or wrap IPv6 addresses in brackets: fmt.Sprintf("https://[%s]/healthz", kapiIP) to handle both IPv4 and IPv6 correctly.
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'DNM/TEST: Test udn settling sleep' directly describes the main change: adding a 15-second wait/settling period before reachability checks in UDN networking tests.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test titles are static and deterministic with no dynamic information. Dynamic values appear only in test bodies, not in test names.
Microshift Test Compatibility ✅ Passed PR modifies existing tests in network_segmentation.go, not adding new tests. Custom check applies only to new Ginkgo e2e tests.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR modifies existing test, not adding new tests. Network isolation test doesn't require multiple nodes—works on SNO with all pods on one node.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only test/extended/networking/network_segmentation.go (a test file). No Deployments, StatefulSets, DaemonSets, or operator code. Check applies only to production code, not tests.
Ote Binary Stdout Contract ✅ Passed All logging writes explicitly to stderr via fmt.Fprintln(os.Stderr,...) and framework.Logf() within test case blocks; no process-level stdout writes detected.
No-Weak-Crypto ✅ Passed No weak cryptography (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons found in the modified test file.
Container-Privileges ✅ Passed PR modifies only test code with timing/retry logic; no privileged containers, hostPID, hostNetwork, hostIPC, SYS_ADMIN, or allowPrivilegeEscalation settings introduced.
No-Sensitive-Data-In-Logs ✅ Passed Code changes contain no logging of passwords, tokens, API keys, PII, session IDs, or customer data; only test infrastructure identifiers and timing/retry metrics are logged.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@jluhrsen
Copy link
Copy Markdown
Contributor Author

/hold
testing purposes only

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 28, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jluhrsen
Once this PR has been reviewed and has the lgtm label, please assign knobunc for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 28, 2026
@openshift-ci openshift-ci Bot requested review from kyrtapz and martinkennelly May 28, 2026 04:42
@jluhrsen
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 28, 2026

@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cb599d00-5a4f-11f1-9407-4e0b19a9d78a-0

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/networking/network_segmentation.go`:
- Around line 1326-1328: The isCurlExitCode28 function should use a typed
exit-status check instead of string matching: unwrap the error (from RunKubectl)
with errors.As into a k8s.io/utils/exec.CodeExitError (or interface type
exposing ExitStatus()), then return exitErr.ExitStatus() == 28; fall back to
false if errors.As fails or err is nil. Update isCurlExitCode28 to perform this
typed check rather than strings.Contains(err.Error(), "rc: 28").
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 903494b6-a845-4c96-9bb8-f66fcd68e3f8

📥 Commits

Reviewing files that changed from the base of the PR and between a29f970 and 98bb327.

📒 Files selected for processing (1)
  • test/extended/networking/network_segmentation.go

Comment on lines +1326 to +1328
func isCurlExitCode28(err error) bool {
return err != nil && strings.Contains(err.Error(), "rc: 28")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect current classifier
rg -n -C4 'func isCurlExitCode28\(' test/extended/networking/network_segmentation.go

# Verify RunKubectl builds CodeExitError with an exit code
rg -n -C6 'CodeExitError|rc :=|return stdout.String\(\), stderr.String\(\)' \
  vendor/k8s.io/kubernetes/test/e2e/framework/kubectl/builder.go

# Verify CodeExitError supports typed exit-status checks
rg -n -C4 'type CodeExitError|func \(.*CodeExitError.*ExitStatus|func \(.*CodeExitError.*Error' \
  vendor/k8s.io/utils/exec

Repository: openshift/origin

Length of output: 2715


Replace "rc: 28" error-string matching with typed exit-status checks

isCurlExitCode28 currently depends on strings.Contains(err.Error(), "rc: 28"), but RunKubectl wraps failures as k8s.io/utils/exec.CodeExitError, which exposes the exit code via ExitStatus(). Unwrap with errors.As and return exitErr.ExitStatus() == 28 instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/networking/network_segmentation.go` around lines 1326 - 1328,
The isCurlExitCode28 function should use a typed exit-status check instead of
string matching: unwrap the error (from RunKubectl) with errors.As into a
k8s.io/utils/exec.CodeExitError (or interface type exposing ExitStatus()), then
return exitErr.ExitStatus() == 28; fall back to false if errors.As fails or err
is nil. Update isCurlExitCode28 to perform this typed check rather than
strings.Contains(err.Error(), "rc: 28").

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 28, 2026

@jluhrsen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-ovn 98bb327 link true /test e2e-vsphere-ovn
ci/prow/e2e-vsphere-ovn-upi 98bb327 link true /test e2e-vsphere-ovn-upi

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jluhrsen jluhrsen force-pushed the test-udn-settling-sleep branch from 98bb327 to e398282 Compare May 28, 2026 20:03
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@jluhrsen
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 28, 2026

@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ff5b7fc0-5ad0-11f1-888e-1f5d7ebbb367-0

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@jluhrsen
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/93d65130-5b7a-11f1-869b-b7c8064d8c97-0

@jluhrsen
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7bec6bc0-5b7c-11f1-8d24-b2e44a8ded3b-0

@jluhrsen
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/29020080-5b7e-11f1-9ee6-41df389bed03-0

@jluhrsen
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d637c770-5b7f-11f1-8c45-4881bb3f02fa-0

@jluhrsen
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/84e70f00-5b81-11f1-8809-4cc3c58e4698-0

@jluhrsen
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e58fc340-5b88-11f1-9845-8e380a4d5f32-0

This is a data collection PR to test whether network timing issues on
RHCOS10+Azure are resolved by allowing more time for network setup to
stabilize before running connectivity checks.

After pods are created and IPs retrieved, wait 3 minutes for:
- OVN flows to be fully installed
- Network policies to be enforced
- Pod networking to stabilize

Using 3 minutes (conservative) to ensure we capture any slow
initialization scenarios on RHCOS10+Azure.

NOT FOR MERGE - data collection only
@jluhrsen
Copy link
Copy Markdown
Contributor Author

/payload-aggregate periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview 10

@jluhrsen jluhrsen force-pushed the test-udn-settling-sleep branch from e398282 to 9ae736b Compare May 29, 2026 19:03
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/25066e40-5b91-11f1-9b91-50ae04db2950-0

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant