Dual stack vips #3269

cybertron · 2022-07-26T17:11:13Z

Previously in dual stack clusters we only provided an IPv4 VIP because it was assumed that everything in the cluster would have access to either v4 or v6. This turns out to have been a bad assumption because some users are deploying single-stack IPv6 applications in dual stack clusters and these applications then have no access to API or Ingress services.

This PR is the MCO part of the implementation of dual stack VIPs. It switches the keepalived config to consume a list of VIPs from runtimecfg. It also implements sync logic for the PlatformStatus field of the Infrastructure object to migrate values from the deprecated VIP fields to the new plural VIP fields.

Depends on openshift/baremetal-runtimecfg#176

- What I did

- How to verify it

- Description for the changelog
Add support for dual stack VIPs in on-prem platforms

cybertron · 2022-08-31T19:39:31Z

The dependency in openshift/baremetal-runtimecfg#176 has merged. I believe this should be ready to go.

/test e2e-metal-ipi
/test e2e-vsphere-upgrade

dobsonj · 2022-08-31T20:22:59Z

To resolve the unit test failures, you just need to remove the CSIMigrationAWS: false, CSIMigrationGCE: false, and PodSecurity: true lines from both of these template files:
https://github.com/openshift/machine-config-operator/blob/master/templates/master/01-master-kubelet/_base/files/kubelet.yaml
https://github.com/openshift/machine-config-operator/blob/master/templates/worker/01-worker-kubelet/_base/files/kubelet.yaml

There are some feature gates that need to be removed in order for the KubeletConfiguration to be valid with the new api.

cybertron · 2022-08-31T20:40:20Z

To resolve the unit test failures, you just need to remove the CSIMigrationAWS: false, CSIMigrationGCE: false, and PodSecurity: true lines from both of these template files: https://github.com/openshift/machine-config-operator/blob/master/templates/master/01-master-kubelet/_base/files/kubelet.yaml https://github.com/openshift/machine-config-operator/blob/master/templates/worker/01-worker-kubelet/_base/files/kubelet.yaml

Shoot, I knew that and forgot to do it on this branch. Thanks for the reminder.

cybertron · 2022-08-31T20:40:42Z

/test e2e-metal-ipi
/test e2e-vsphere-upgrade

The singular API and Ingress VIP fields are deprecated and cause verification failures. The same value can be found in the first entry in the plural VIPs version of the fields.

nee1esh · 2022-09-01T13:11:03Z

/retest-required

cybertron · 2022-09-01T15:08:53Z

/hold

This also has a dependency on openshift/installer#5798, which is approved but waiting for ci passes.

sinnykumari · 2022-09-02T08:20:36Z

openshift/installer#5798 has been merged, so this should be good to go.
/test e2e-metal-ipi
/test e2e-vsphere-upgrade
/test e2e-openstack

cybertron · 2022-09-02T15:25:48Z

/retest

We're probably good here since metal-ipi passed and the openstack and vsphere failures look unrelated, but I'd really prefer to see them pass.

cybertron · 2022-09-02T15:26:23Z

/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-upi

dougsland · 2022-09-02T15:47:19Z

LGTM, just nit comments. The CI breaking as there is dependency.

pkg/controller/template/render.go

pkg/operator/render.go

cybertron · 2022-09-02T22:17:06Z

I noticed a possible issue with the vsphere logic, so I modified that to handle either nil values or zero-length arrays. Hopefully that will get those jobs passing.

/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-upi
/test e2e-metal-ipi
/test e2e-vsphere-upgrade

nee1esh · 2022-09-03T16:24:45Z

/retest-required

creydr · 2022-09-12T15:22:57Z

templates/common/on-prem/units/nodeip-configuration.service.yaml

@@ -26,7 +26,7 @@ contents: |
    {{ .Images.baremetalRuntimeCfgImage }} \
    node-ip \
    set --retry-on-failure \
-    {{ onPremPlatformAPIServerInternalIP . }}; \
+    {{- range onPremPlatformAPIServerInternalIPs . }}{{.}} {{end}}; \


@cybertron is it intended, to suppress the newline here?

Clearly not, that's the error, and what is causing the IP to be prefix with the \ char from the previous line.
I suggest you also add quotes around the IP address:

Suggested change

{{- range onPremPlatformAPIServerInternalIPs . }}{{.}} {{end}}; \

{{range onPremPlatformAPIServerInternalIPs . }}"{{.}}" {{end}}; \

Huh, I wonder how that ever worked. I wonder if we just got lucky and the default kubelet behavior was working even though nodeip-configuration failed? In any case, I've removed that and pushed a new version. Let's see what ci says.

This isn't needed and breaks the formatting of the service.

cybertron

/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-ipv6
/test e2e-metal-ipi-ovn-dualstack
/test e2e-ovirt-upgrade

cybertron · 2022-09-07T17:10:08Z

pkg/operator/render.go

-			return cfg.Infra.Status.PlatformStatus.VSphere.APIServerInternalIP, nil
+			if len(cfg.Infra.Status.PlatformStatus.VSphere.APIServerInternalIPs) > 0 {
+				return cfg.Infra.Status.PlatformStatus.VSphere.APIServerInternalIPs[0],
+				nil


Oops, didn't mean to do that. Fixed.

cybertron · 2022-09-13T14:12:23Z

templates/common/on-prem/units/nodeip-configuration.service.yaml

@@ -26,7 +26,7 @@ contents: |
    {{ .Images.baremetalRuntimeCfgImage }} \
    node-ip \
    set --retry-on-failure \
-    {{ onPremPlatformAPIServerInternalIP . }}; \
+    {{- range onPremPlatformAPIServerInternalIPs . }}{{.}} {{end}}; \


Huh, I wonder how that ever worked. I wonder if we just got lucky and the default kubelet behavior was working even though nodeip-configuration failed? In any case, I've removed that and pushed a new version. Let's see what ci says.

sinnykumari · 2022-09-13T17:00:10Z

e2e-metal-ipi-ovn-ipv6 test is passing with latest changes. latest commit only affects ipv6 changes, so I would say failing tests are unrelated. Adding my approval, feel free to add lgtm when it looks fine to on-prem team
/approve

cybertron · 2022-09-13T17:45:12Z

I forgot to run vsphere-upi again, but since the latest change only affects IPI that should be okay. It looks like all of the metal scenarios are passing now, so I think we're good?

/retest-required

jcpowermac · 2022-09-13T18:02:45Z

/test e2e-vsphere-upi

dougsland · 2022-09-13T19:28:09Z

/retest

mandre · 2022-09-14T06:46:22Z

Inspecting the latest e2e-vsphere-upi failure, we can see that the network operator entered a crash loop:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1b840e7]

goroutine 1026 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	sigs.k8s.io/controller-runtime@v0.12.0/pkg/internal/controller/controller.go:118 +0x1f4
panic({0x21a7ec0, 0x3cd09f0})
	runtime/panic.go:838 +0x207
github.com/openshift/cluster-network-operator/pkg/controller/infrastructureconfig.(*apiAndIngressVipsSynchronizer).VipsSynchronize(0x25be4dc, 0xc000df4000?)
	github.com/openshift/cluster-network-operator/pkg/controller/infrastructureconfig/sync_vips.go:34 +0x107
github.com/openshift/cluster-network-operator/pkg/controller/infrastructureconfig.(*ReconcileInfrastructureConfig).Reconcile(0xc000bf9cb0, {0x295dd10, 0xc000df2060}, {{{0x0?, 0x10?}, {0xc0004d6ed0?, 0x413f07?}}})
	github.com/openshift/cluster-network-operator/pkg/controller/infrastructureconfig/infrastructureconfig_controller.go:93 +0x368
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x295dc68?, {0x295dd10?, 0xc000df2060?}, {{{0x0?, 0x2411bc0?}, {0xc0004d6ed0?, 0x4095d4?}}})
	sigs.k8s.io/controller-runtime@v0.12.0/pkg/internal/controller/controller.go:121 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000c132c0, {0x295dc68, 0xc000bab140}, {0x228e8a0?, 0xc000bc80e0?})
	sigs.k8s.io/controller-runtime@v0.12.0/pkg/internal/controller/controller.go:320 +0x33c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000c132c0, {0x295dc68, 0xc000bab140})
	sigs.k8s.io/controller-runtime@v0.12.0/pkg/internal/controller/controller.go:273 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	sigs.k8s.io/controller-runtime@v0.12.0/pkg/internal/controller/controller.go:234 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	sigs.k8s.io/controller-runtime@v0.12.0/pkg/internal/controller/controller.go:230 +0x325

Meaning that we were missing Status.PlatformStatus.VSphere.APIServerInternalIPs in the cluster infrastructure object. Not sure if this is a known bug, or perhaps caused by this patch. Someone should look into it.

edit: Indeed, the must gather confirms that Status.PlatformStatus.VSphere is missing from the infrastructure object.

/retest

creydr · 2022-09-14T09:31:09Z

... Indeed, the must gather confirms that Status.PlatformStatus.VSphere is missing from the infrastructure object.

In the VIP sync logic in CNO I wasn't aware of the case, that VSphere UPI doesn't populate the VSphere field. I created a patch for CNO: openshift/cluster-network-operator#1558

jcpowermac · 2022-09-14T11:19:02Z

... Indeed, the must gather confirms that Status.PlatformStatus.VSphere is missing from the infrastructure object.

In the VIP sync logic in CNO I wasn't aware of the case, that VSphere UPI doesn't populate the VSphere field. I created a patch for CNO: openshift/cluster-network-operator#1558

Right, vSphere UPI doesn't use the static pods for API lb/keepalived and ingress keepalived. The expectation is the user creates an LB before install.

jcpowermac · 2022-09-14T11:23:28Z

Meaning that we were missing Status.PlatformStatus.VSphere.APIServerInternalIPs in the cluster infrastructure object. Not sure if this is a known bug, or perhaps caused by this patch. Someone should look into it.

Not a bug, by design. When developing vSphere IPI it was expected that UPI would stay as-is and not require the api or ingress vip.

sinnykumari · 2022-09-15T15:00:27Z

Retesting as openshift/cluster-network-operator#1558 has been merged
/test e2e-vsphere-upi

sinnykumari · 2022-09-15T16:29:27Z

/test e2e-vsphere-upi

kikisdeliveryservice · 2022-09-15T18:29:24Z

Vsphere UPI job passed @jcpowermac can this merge?

jcpowermac · 2022-09-15T18:32:58Z

/lgtm

openshift-ci · 2022-09-15T18:47:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cybertron, jcpowermac, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kikisdeliveryservice · 2022-09-15T21:55:16Z

/retest-required

kikisdeliveryservice · 2022-09-15T22:53:54Z

Infra failure:

level=error msg=Error: Error launching source instance: InvalidNetworkInterfaceID.NotFound: The networkInterface ID 'eni-010489741e0d5aa99' does not exist
level=error msg=	status code: 400, request id: 92c1d8a0-b645-4212-9ddc-bb6eeac33fea
level=error
level=error msg=  with module.masters.aws_instance.master[2],
level=error msg=  on master/main.tf line 129, in resource "aws_instance" "master":
level=error msg= 129: resource "aws_instance" "master" {
level=error
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "cluster" stage: failed to create cluster: failed to apply Terraform: exit status 1

report and retesting
/test e2e-agnostic-upgrade

openshift-ci · 2022-09-16T01:15:01Z

@cybertron: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-vsphere-upgrade	`0357c4b`	link	false	`/test e2e-vsphere-upgrade`
ci/prow/e2e-ovirt-upgrade	`a586894`	link	false	`/test e2e-ovirt-upgrade`
ci/prow/e2e-openstack	`a586894`	link	false	`/test e2e-openstack`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2022-09-16T01:17:45Z

/retest-required

Remaining retests: 0 against base HEAD a985910 and 2 for PR HEAD a586894 in total

The keepalived config template change for workers was missing from openshift#3269. This adds it.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 26, 2022

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 26, 2022

openshift-ci bot requested review from cgwalters and jstuever July 26, 2022 17:12

creydr mentioned this pull request Jul 27, 2022

Dual stack VIPs: Add rules for openshift/api fields on upgrades openshift/enhancements#1102

Merged

cybertron mentioned this pull request Aug 30, 2022

bump openshift api to 0f86a223d4bce1c63d9adce1f229afb08305a23b #3308

Closed

cybertron added 3 commits August 30, 2022 14:40

Update openshift-api vendoring to v0.0.0-20220824134416-0f86a223d4bc

32e7855

Update controllerconfig with new fields

003304e

Switch keepalived template to use new fields

87a5141

cybertron force-pushed the dual-stack-vips branch from 0f4944f to 87a5141 Compare August 31, 2022 19:37

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 31, 2022

cybertron changed the title ~~WIP: Dual stack vips~~ Dual stack vips Aug 31, 2022

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 31, 2022

Update KubeletConfiguration for new api version

be3c2da

There are some feature gates that need to be removed in order for the KubeletConfiguration to be valid with the new api.

Remove remaining references to deprecated fields

56c324d

The singular API and Ingress VIP fields are deprecated and cause verification failures. The same value can be found in the first entry in the plural VIPs version of the fields.

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 1, 2022

soltysh mentioned this pull request Sep 2, 2022

Bump openshift/api #3321

Closed

dougsland reviewed Sep 2, 2022

View reviewed changes

creydr reviewed Sep 12, 2022

View reviewed changes

Remove newline suppression from nodeip-configuration

a586894

This isn't needed and breaks the formatting of the service.

cybertron commented Sep 13, 2022

View reviewed changes

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 13, 2022

yuqi-zhang mentioned this pull request Sep 14, 2022

Bump openshift/api #3336

Merged

sinnykumari mentioned this pull request Sep 15, 2022

Update dependencies #3334

Closed

openshift-ci bot assigned jcpowermac Sep 15, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 15, 2022

openshift-merge-robot merged commit 8050e55 into openshift:master Sep 16, 2022

sinnykumari mentioned this pull request Sep 16, 2022

OCPNODE-1146 Add cgroup confiuration support in MCO #3311

Merged

mandre added a commit to shiftstack/machine-config-operator that referenced this pull request Sep 21, 2022

[on-prem]: Add missing bits for dual-stack ingress VIPs

86fd814

The keepalived config template change for workers was missing from openshift#3269. This adds it.

mandre mentioned this pull request Sep 21, 2022

[on-prem]: Add missing bits for dual-stack ingress VIPs #3341

Merged

cybertron mentioned this pull request Jan 27, 2023

[OCPBU-156] enhancement proposal for external loadbalancer openshift/enhancements#1322

Merged

	{{- range onPremPlatformAPIServerInternalIPs . }}{{.}} {{end}}; \
	{{range onPremPlatformAPIServerInternalIPs . }}"{{.}}" {{end}}; \

Dual stack vips #3269

Dual stack vips #3269

Conversation

cybertron commented Jul 26, 2022

cybertron commented Aug 31, 2022

dobsonj commented Aug 31, 2022

cybertron commented Aug 31, 2022

cybertron commented Aug 31, 2022

nee1esh commented Sep 1, 2022

cybertron commented Sep 1, 2022

sinnykumari commented Sep 2, 2022

cybertron commented Sep 2, 2022

cybertron commented Sep 2, 2022

dougsland commented Sep 2, 2022

cybertron commented Sep 2, 2022

nee1esh commented Sep 3, 2022

creydr Sep 12, 2022

Choose a reason for hiding this comment

mandre Sep 12, 2022 • edited Loading

Choose a reason for hiding this comment

cybertron Sep 13, 2022

Choose a reason for hiding this comment

cybertron left a comment

Choose a reason for hiding this comment

cybertron Sep 7, 2022

Choose a reason for hiding this comment

cybertron Sep 13, 2022

Choose a reason for hiding this comment

sinnykumari commented Sep 13, 2022

cybertron commented Sep 13, 2022

jcpowermac commented Sep 13, 2022

dougsland commented Sep 13, 2022

mandre commented Sep 14, 2022 • edited Loading

creydr commented Sep 14, 2022

jcpowermac commented Sep 14, 2022 • edited Loading

jcpowermac commented Sep 14, 2022 • edited Loading

sinnykumari commented Sep 15, 2022

sinnykumari commented Sep 15, 2022

kikisdeliveryservice commented Sep 15, 2022

jcpowermac commented Sep 15, 2022

openshift-ci bot commented Sep 15, 2022

kikisdeliveryservice commented Sep 15, 2022

kikisdeliveryservice commented Sep 15, 2022

openshift-ci bot commented Sep 16, 2022 • edited Loading

openshift-ci-robot commented Sep 16, 2022

mandre Sep 12, 2022 •

edited

Loading

mandre commented Sep 14, 2022 •

edited

Loading

jcpowermac commented Sep 14, 2022 •

edited

Loading

jcpowermac commented Sep 14, 2022 •

edited

Loading

openshift-ci bot commented Sep 16, 2022 •

edited

Loading