CORENET-6097: OVN BGP NoOverlay workflows for baremetal clusters#77768
CORENET-6097: OVN BGP NoOverlay workflows for baremetal clusters#77768asood-rh wants to merge 8 commits intoopenshift:mainfrom
Conversation
|
@asood-rh: This pull request references CORENET-6097 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdds CI jobs, test definitions, step-registry workflows, step references, and setup scripts to run OVN dual-stack BGP "no-overlay" E2E tests (managed LGW and unmanaged SGW) on Equinix metal; includes presubmits, workflows, nftables/FRR setup scripts, chains, OWNERS, and metadata. Changes
Sequence Diagram(s)sequenceDiagram
participant User as Developer (PR)
participant Prow as Prow/CI
participant CI as ci-operator workflow
participant Ofcir as Packet Server (ofcir)
participant Remote as Packet Host (SSH)
participant Cluster as Target Cluster
participant Tests as E2E Test Suite
User->>Prow: create PR triggers presubmit
Prow->>CI: start workflow (baremetalds-e2e-...)
CI->>Ofcir: acquire packet servers (ofcir-acquire)
CI->>Remote: run nftables-pre and no-overlay BGP scripts (SSH)
Remote-->>Remote: install nftables rules, start FRR container (if needed)
CI->>Cluster: apply generated Network/FRR manifests during install
CI->>Tests: run baremetalds-ipi-test
Tests-->>CI: test results
CI->>Prow: post debug artifacts and test outcome
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 9 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@asood-rh: This pull request references CORENET-6097 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
ci-operator/step-registry/baremetalds/e2e/ovn/bgp/nftables-pre/baremetalds-e2e-ovn-bgp-nftables-pre-commands.sh (1)
35-39: Consider parameterizing the topology constants.The pod CIDRs, bootstrap subnets, and
ostestbmare hard-coded here. Since this step already pulls shared host data earlier, lifting these into shared config or env would make future no-overlay variants less brittle.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ci-operator/step-registry/baremetalds/e2e/ovn/bgp/nftables-pre/baremetalds-e2e-ovn-bgp-nftables-pre-commands.sh` around lines 35 - 39, The script hard-codes topology constants (CLUSTER_NETWORK_V4, CLUSTER_NETWORK_V6, BOOTSTRAP_NETWORK, BOOTSTRAP_NETWORK_V6, INTERFACE_NAME); change these to read from shared configuration or environment variables with sensible defaults so callers can override them. Update the variable assignments to prefer exported env vars or values loaded from the shared host data (falling back to the current literals), and document/validate required values (e.g., error if INTERFACE_NAME is unset in no-default environments); adjust any downstream uses in the script to use the new variables unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@ci-operator/step-registry/baremetalds/e2e/ovn/bgp/dualstack-local-gw-no-overlay/baremetalds-e2e-ovn-bgp-dualstack-local-gw-no-overlay-workflow.yaml`:
- Around line 37-39: The workflow description incorrectly states that
baremetalds-e2e-ovn-bgp-pre is applied before the OFCIR pre step; update the
human-readable description to reflect the actual ordering where
baremetalds-ofcir-no-overlay-pre (OFCIR pre) runs before
baremetalds-e2e-ovn-bgp-pre by changing the sentence mentioning these steps so
it accurately lists baremetalds-ofcir-no-overlay-pre first and then
baremetalds-e2e-ovn-bgp-pre.
In
`@ci-operator/step-registry/baremetalds/e2e/ovn/bgp/nftables-pre/baremetalds-e2e-ovn-bgp-nftables-pre-commands.sh`:
- Line 29: The generated helper script fix-openshift-firewall.sh is not failing
fast: add strict shell options (set -euo pipefail and IFS=$'\n\t') at the top of
the script and remove the masking "|| true" from critical commands (the sysctl
mutations and nft commands referenced in the script) so any failure in those
commands (e.g., chain lookup or rule insert) causes the script to exit non‑zero;
update every occurrence (including the blocks covering lines noted: 43-70,
76-78, 91-99) to propagate errors rather than swallowing them.
- Around line 119-130: The wrapper loop calls
/usr/local/bin/fix-openshift-firewall.sh but does not propagate failures, so the
script can print success and exit 0 even when the helper fails; update the while
loop that invokes /usr/local/bin/fix-openshift-firewall.sh to fail fast by
enabling errexit (set -e) at the top of the script or by explicitly checking the
helper's exit status and exiting non‑zero on failure (i.e., after calling
/usr/local/bin/fix-openshift-firewall.sh, test its return code and if non‑zero
echo an error and exit 1) so systemd sees the real failure.
In
`@ci-operator/step-registry/baremetalds/e2e/ovn/bgp/no-overlay/baremetalds-e2e-ovn-bgp-no-overlay-commands.sh`:
- Around line 45-57: The BGP config only enables IPv4 unicast and hard-codes the
IPv4 pod CIDR; update the script to configure both IPv4 and IPv6 address
families and include both pod CIDRs in the listen ranges. Specifically, add a
"bgp listen range" line for the IPv6 pod CIDR (the cluster pod IPv6 range used
elsewhere) alongside the existing "bgp listen range 192.168.111.0/24 peer-group
NODES", and mirror the IPv4 address-family block by adding an "address-family
ipv6 unicast" block that activates "neighbor NODES" and marks it as
"route-reflector-client". Also update any other occurrences (the similar block
referenced at lines 120-141) so both families and both pod CIDRs are rendered
consistently.
In
`@ci-operator/step-registry/baremetalds/e2e/ovn/bgp/no-overlay/baremetalds-e2e-ovn-bgp-no-overlay-ref.yaml`:
- Around line 23-26: Update the documentation string under the documentation:
field to explicitly state that the FRR route reflector container and related FRR
provider manifests are created only when cluster routing is not set to "Managed"
(the step logic skips external FRR setup for routing == "Managed"); mention that
when routing == "Managed" the step will not set up the external FRR reflector
and that operators should configure routing accordingly.
In
`@ci-operator/step-registry/baremetalds/e2e/ovn/bgp/pre/baremetalds-e2e-ovn-bgp-pre-commands.sh`:
- Around line 45-49: The early guard that checks NO_OVERLAY_ROUTING currently
exits the script and prevents the LGW ipForwarding=Global workaround from
running; instead, change that block to set a flag (e.g.,
skip_external_bgp_setup="true") when NO_OVERLAY_ROUTING is "Managed" rather than
exiting, so the script continues to run the LGW workaround (the
ipForwarding=Global block referenced in the comment); then add a check after the
LGW workaround that exits if skip_external_bgp_setup is "true" (if [
"${skip_external_bgp_setup}" = "true" ]; then exit 0; fi) so only the external
FRR/BGP setup is skipped.
- Around line 418-420: The cleanup of stale BGP routes (the two commands using
"$IP route delete" and "$IP -6 route delete") must run before any best-effort
"$IP route add" / "$IP -6 route add" attempts so an add failure doesn't leave
stale routes that are then removed and strand the host; reorder the script to
execute the shown deletion block (the lines calling "$IP route show proto bgp |
grep '^10\.' ..." and "$IP -6 route show proto bgp | grep '^fd01\:' ...")
earlier—before the route-add logic—or ensure those delete commands are run
unconditionally at the top of the route-install section so stale IPv4/IPv6 BGP
routes are removed prior to any adds.
---
Nitpick comments:
In
`@ci-operator/step-registry/baremetalds/e2e/ovn/bgp/nftables-pre/baremetalds-e2e-ovn-bgp-nftables-pre-commands.sh`:
- Around line 35-39: The script hard-codes topology constants
(CLUSTER_NETWORK_V4, CLUSTER_NETWORK_V6, BOOTSTRAP_NETWORK,
BOOTSTRAP_NETWORK_V6, INTERFACE_NAME); change these to read from shared
configuration or environment variables with sensible defaults so callers can
override them. Update the variable assignments to prefer exported env vars or
values loaded from the shared host data (falling back to the current literals),
and document/validate required values (e.g., error if INTERFACE_NAME is unset in
no-default environments); adjust any downstream uses in the script to use the
new variables unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 6902f64e-09d0-48f8-b21f-43c1a9021888
📒 Files selected for processing (31)
ci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-master.yamlci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-release-4.22.yamlci-operator/config/openshift/origin/openshift-origin-main.yamlci-operator/config/openshift/origin/openshift-origin-release-4.22.yamlci-operator/config/openshift/ovn-kubernetes/openshift-ovn-kubernetes-master.yamlci-operator/config/openshift/ovn-kubernetes/openshift-ovn-kubernetes-release-4.22.yamlci-operator/jobs/openshift/cluster-network-operator/openshift-cluster-network-operator-master-presubmits.yamlci-operator/jobs/openshift/cluster-network-operator/openshift-cluster-network-operator-release-4.22-presubmits.yamlci-operator/jobs/openshift/origin/openshift-origin-main-presubmits.yamlci-operator/jobs/openshift/origin/openshift-origin-release-4.22-presubmits.yamlci-operator/jobs/openshift/ovn-kubernetes/openshift-ovn-kubernetes-master-presubmits.yamlci-operator/jobs/openshift/ovn-kubernetes/openshift-ovn-kubernetes-release-4.22-presubmits.yamlci-operator/step-registry/baremetalds/e2e/ovn/bgp/dualstack-local-gw-no-overlay/OWNERSci-operator/step-registry/baremetalds/e2e/ovn/bgp/dualstack-local-gw-no-overlay/baremetalds-e2e-ovn-bgp-dualstack-local-gw-no-overlay-workflow.metadata.jsonci-operator/step-registry/baremetalds/e2e/ovn/bgp/dualstack-local-gw-no-overlay/baremetalds-e2e-ovn-bgp-dualstack-local-gw-no-overlay-workflow.yamlci-operator/step-registry/baremetalds/e2e/ovn/bgp/dualstack-no-overlay/OWNERSci-operator/step-registry/baremetalds/e2e/ovn/bgp/dualstack-no-overlay/baremetalds-e2e-ovn-bgp-dualstack-no-overlay-workflow.metadata.jsonci-operator/step-registry/baremetalds/e2e/ovn/bgp/dualstack-no-overlay/baremetalds-e2e-ovn-bgp-dualstack-no-overlay-workflow.yamlci-operator/step-registry/baremetalds/e2e/ovn/bgp/nftables-pre/OWNERSci-operator/step-registry/baremetalds/e2e/ovn/bgp/nftables-pre/baremetalds-e2e-ovn-bgp-nftables-pre-commands.shci-operator/step-registry/baremetalds/e2e/ovn/bgp/nftables-pre/baremetalds-e2e-ovn-bgp-nftables-pre-ref.metadata.jsonci-operator/step-registry/baremetalds/e2e/ovn/bgp/nftables-pre/baremetalds-e2e-ovn-bgp-nftables-pre-ref.yamlci-operator/step-registry/baremetalds/e2e/ovn/bgp/no-overlay/OWNERSci-operator/step-registry/baremetalds/e2e/ovn/bgp/no-overlay/baremetalds-e2e-ovn-bgp-no-overlay-commands.shci-operator/step-registry/baremetalds/e2e/ovn/bgp/no-overlay/baremetalds-e2e-ovn-bgp-no-overlay-ref.metadata.jsonci-operator/step-registry/baremetalds/e2e/ovn/bgp/no-overlay/baremetalds-e2e-ovn-bgp-no-overlay-ref.yamlci-operator/step-registry/baremetalds/e2e/ovn/bgp/pre/baremetalds-e2e-ovn-bgp-pre-commands.shci-operator/step-registry/baremetalds/ofcir/no-overlay/OWNERSci-operator/step-registry/baremetalds/ofcir/no-overlay/pre/OWNERSci-operator/step-registry/baremetalds/ofcir/no-overlay/pre/baremetalds-ofcir-no-overlay-pre-chain.metadata.jsonci-operator/step-registry/baremetalds/ofcir/no-overlay/pre/baremetalds-ofcir-no-overlay-pre-chain.yaml
|
@asood-rh: This pull request references CORENET-6097 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@ci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-master.yaml`:
- Around line 521-526: The DEVSCRIPTS_CONFIG currently hardcodes an ephemeral CI
payload via the OPENSHIFT_RELEASE_IMAGE variable
(OPENSHIFT_RELEASE_IMAGE=registry.build10.ci.openshift.org/ci-ln-8drmbm2/release:latest),
which can break this job later; remove that hardcoded line from
DEVSCRIPTS_CONFIG (or replace it with a stable/release stream reference or leave
it unset) so the job uses the intended persistent release payload selection,
ensuring you edit the DEVSCRIPTS_CONFIG block that contains
OPENSHIFT_RELEASE_IMAGE and verify FEATURE_SET remains unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 5e96c8c1-c8d2-433f-8364-50dc5a9e07a2
📒 Files selected for processing (1)
ci-operator/config/openshift/cluster-network-operator/openshift-cluster-network-operator-master.yaml
|
/pj-rehearse pull-ci-openshift-cluster-network-operator-master-e2e-metal-ovn-dualstack-no-overlay-managed-lgw-techpreview |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
baa8c59 to
08499c1
Compare
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-metal-ovn-dualstack-no-overlay-unmanaged-sgw-techpreview |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
af85010 to
c4475aa
Compare
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-release-4.22-e2e-metal-ovn-dualstack-no-overlay-unmanaged-sgw-techpreview |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse abort |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-dualstack-bgp |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@asood-rh, |
c4475aa to
fc343cb
Compare
|
@asood-rh, Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
fc343cb to
1d95467
Compare
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-dualstack-bgp |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse abort |
4ad128e to
c0c2295
Compare
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ovn-dualstack-no-overlay-unmanaged-sgw-techpreview |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
c0c2295 to
7245cf4
Compare
7245cf4 to
1f5ca3f
Compare
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ovn-dualstack-no-overlay-unmanaged-sgw-techpreview |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ovn-dualstack-no-overlay-unmanaged-sgw-techpreview |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ovn-dualstack-no-overlay-unmanaged-sgw-techpreview |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ovn-dualstack-no-overlay-unmanaged-sgw-techpreview |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ovn-dualstack-no-overlay-unmanaged-sgw-techpreview |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ovn-dualstack-no-overlay-unmanaged-sgw-techpreview |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ovn-dualstack-no-overlay-unmanaged-sgw-techpreview |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
[REHEARSALNOTIFIER]
A total of 135 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ovn-dualstack-no-overlay-unmanaged-sgw-techpreview |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ovn-dualstack-no-overlay-unmanaged-sgw-techpreview |
|
@asood-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@asood-rh: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Cherry picking commits from #73335
Summary by CodeRabbit
New Features
Tests