Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 2040671: Fix the way the network stack is determined #239

Merged
merged 2 commits into from Jan 25, 2022

Conversation

sadasu
Copy link
Contributor

@sadasu sadasu commented Jan 20, 2022

We were looking at apiServerInternalURI and apiServerURL fields in the Infrastructure CR to figure out if the installation was a dualstack one. The problem with that approach was that these two URLs always had only one IP associated with them even in dualstack scenarios. That resulted in CBO passing the wrong IP_OPTIONS to metal3 containers.

With this revised approach, we are looking at the ServiceNetwork in the Network CR . In dualstack installs, the ServiceNetwork is expected to have 2 sets of IPs one for IPv4 and the other for IPv6.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 20, 2022
@sadasu
Copy link
Contributor Author

sadasu commented Jan 20, 2022

/test e2e-metal-ipi-ovn-dualstack

@openshift-ci openshift-ci bot requested review from hardys and kirankt January 20, 2022 15:22
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 20, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sadasu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 20, 2022
@zaneb
Copy link
Member

zaneb commented Jan 20, 2022

The cluster failed to build because of errors with the authentication and console operators.
Let's try again to make sure.
/retest

@zaneb
Copy link
Member

zaneb commented Jan 20, 2022

It's still passing

    - name: IP_OPTIONS
      value: ip=dhcp
2022-01-20T16:07:47.042548326Z I0120 16:07:47.042494       1 provisioning_controller.go:495] "Network stack calculation" APIServerInternalHost="api.ostest.test.metalkube.org" NetworkStack=1

@sadasu sadasu changed the title WIP: Fix the way the network stack is determined Bug 2040671: Fix the way the network stack is determined Jan 20, 2022
@openshift-ci openshift-ci bot added bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jan 20, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 20, 2022

@sadasu: This pull request references Bugzilla bug 2040671, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (augol@redhat.com), skipping review request.

In response to this:

Bug 2040671: Fix the way the network stack is determined

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bfournie
Copy link
Contributor

Looks like -metal-ipi-ovn-dualstack is failing because only 1 internal IP is defined
(https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/network/dual_stack.go#L65)

STEP: Destroying namespace "e2e-dualstack-497" for this suite.
fail [k8s.io/kubernetes@v1.23.0/test/e2e/network/dual_stack.go:65]: Expected
: 1
to equal
: 2

failed: (3.5s) 2022-01-20T20:07:15 "[sig-network] [Feature:IPv6DualStack] should have ipv4 and ipv6 internal node ip [Suite:openshift/conformance/parallel] [Suite:k8s]"

@sadasu sadasu force-pushed the sadasu-2040671 branch 2 times, most recently from 5ad7a59 to b6fb52a Compare January 21, 2022 06:52
@sadasu
Copy link
Contributor Author

sadasu commented Jan 21, 2022

/test e2e-metal-ipi-ovn-dualstack

@sadasu
Copy link
Contributor Author

sadasu commented Jan 21, 2022

/retest

@zaneb
Copy link
Member

zaneb commented Jan 21, 2022

Even if this doesn't work it should make what we're trying to do explicit, instead of the previous code that was highly misleading about what would actually happen.
/lgtm
/hold
(holding to give us time to see dualstack test results, feel free to unhold once it's complete)

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 21, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 21, 2022
@sadasu
Copy link
Contributor Author

sadasu commented Jan 21, 2022

We were looking at apiServerInternalURI and apiServerURL fields in the Infrastructure CR to figure out if the installation was a dualstack one. The problem with that approach was that these two URLs always had only one IP associated with them even in dualstack scenarios. That resulted in CBO passing the wrong IP_OPTIONS to metal3 containers.

With this revised approach, we are looking at the ServiceNetwork in the Network CR . In dualstack installs, the ServiceNetwork is expected to have 2 sets of IPs one for IPv4 and the other for IPv6.

@zaneb
Copy link
Member

zaneb commented Jan 21, 2022

We were looking at apiServerInternalURI and apiServerURL fields in the Infrastructure CR to figure out if the installation was a dualstack one. The problem with that approach was that these two URLs always had only one IP associated with them even in dualstack scenarios. That resulted in CBO passing the wrong IP_OPTIONS to metal3 containers.

With this revised approach, we are looking at the ServiceNetwork in the Network CR . In dualstack installs, the ServiceNetwork is expected to have 2 sets of IPs one for IPv4 and the other for IPv6.

This would be good info for the commit message :)

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 21, 2022

@sadasu: This pull request references Bugzilla bug 2040671, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (augol@redhat.com), skipping review request.

In response to this:

Bug 2040671: Fix the way the network stack is determined

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cybertron
Copy link
Member

/lgtm

For some reason it looks like the dual stack job is still passing ip=dhcp to the nodes, but I agree that this looks like a step in the right direction.

@andfasano
Copy link
Contributor

/test e2e-metal-ipi-serial-ipv4

@sadasu
Copy link
Contributor Author

sadasu commented Jan 21, 2022

/test e2e-metal-ipi-ovn-ipv6

@ardaguclu
Copy link
Member

Latest ipv6 failure is not related to this job;

/test e2e-metal-ipi-ovn-ipv6

@zaneb
Copy link
Member

zaneb commented Jan 21, 2022

/lgtm

For some reason it looks like the dual stack job is still passing ip=dhcp to the nodes, but I agree that this looks like a step in the right direction.

I don't see that on the workers. I see it passing ip=auto, which is the default - see coreos/fedora-coreos-tracker#1000

I do see that on the control plane, because obviously this change doesn't affect the installer.

However, none of the worker Nodes came up with an IPv6 address, so I suspect we will need to use ip=dhcp,dhcp6.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 21, 2022
@sadasu
Copy link
Contributor Author

sadasu commented Jan 21, 2022

/test e2e-metal-ipi-ovn-dualstack

@sadasu
Copy link
Contributor Author

sadasu commented Jan 21, 2022

/test e2e-agnostic

current failures don't seem related to current changes

@sadasu
Copy link
Contributor Author

sadasu commented Jan 21, 2022

/test e2e-metal-ipi

@sadasu
Copy link
Contributor Author

sadasu commented Jan 22, 2022

/retest

@zaneb
Copy link
Member

zaneb commented Jan 24, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 24, 2022
@sadasu
Copy link
Contributor Author

sadasu commented Jan 24, 2022

/retest

@sadasu
Copy link
Contributor Author

sadasu commented Jan 24, 2022

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 24, 2022
@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@sadasu
Copy link
Contributor Author

sadasu commented Jan 25, 2022

/override ci/prow/e2e-metal-ipi-ovn-dualstack
Expected to pass with this and openshift/machine-config-operator#2927.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 25, 2022

@sadasu: Overrode contexts on behalf of sadasu: ci/prow/e2e-metal-ipi-ovn-dualstack

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-dualstack
Expected top pass with this and openshift/machine-config-operator#2927.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Contributor Author

sadasu commented Jan 25, 2022

/retest

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 1d3dd18 into openshift:master Jan 25, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 25, 2022

@sadasu: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with /bugzilla refresh.

Bugzilla bug 2040671 has not been moved to the MODIFIED state.

In response to this:

Bug 2040671: Fix the way the network stack is determined

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 25, 2022

@sadasu: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants