Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDN-3708: Support configurable masquerade subnet in ovn-k #1807

Merged
merged 1 commit into from Jan 22, 2024

Conversation

bpickard22
Copy link
Contributor

@bpickard22 bpickard22 commented May 8, 2023

Adds the necessary knobs in the cno to consume the api change (openshift/api#1410) exposing the internal masquerade subnet in ovn-k where we reserve addresses for ip masquerading

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 8, 2023
@openshift-ci openshift-ci bot requested review from jcaamano and tssurya May 8, 2023 22:07
@bpickard22 bpickard22 changed the title [WIP}Add knobs in cno referencing api change Support configurable masquerade subnet in ovn-k May 15, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 15, 2023
@bpickard22
Copy link
Contributor Author

/hold depends on openshift/api#1410

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 15, 2023
@tssurya
Copy link
Contributor

tssurya commented May 16, 2023

/assign @tssurya

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 18, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 31, 2023
@@ -179,6 +179,7 @@ func TestRenderedOVNKubernetesConfig(t *testing.T) {
egressIPConfig *operv1.EgressIPConfig
masterIPs []string
v4InternalSubnet string
v4InternalMasqueradeSubnet string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT align all lines above and below this to accommodate the longer string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IDE should do it automagically :)

@ricky-rav
Copy link
Contributor

Copy link
Contributor

@ricky-rav ricky-rav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave the PR a first pass. We need the OVNK PR to merge first, the API PR second, and then once we update vendor/github.com/openshift/api/ with the new API, we can finally test this CNO PR.

pkg/network/ovn_kubernetes.go Outdated Show resolved Hide resolved
pkg/network/ovn_kubernetes.go Outdated Show resolved Hide resolved
pkg/network/ovn_kubernetes.go Outdated Show resolved Hide resolved
pkg/network/ovn_kubernetes.go Outdated Show resolved Hide resolved
pkg/network/ovn_kubernetes.go Outdated Show resolved Hide resolved
pkg/network/ovn_kubernetes.go Outdated Show resolved Hide resolved
pkg/network/ovn_kubernetes.go Outdated Show resolved Hide resolved
@@ -179,6 +179,7 @@ func TestRenderedOVNKubernetesConfig(t *testing.T) {
egressIPConfig *operv1.EgressIPConfig
masterIPs []string
v4InternalSubnet string
v4InternalMasqueradeSubnet string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IDE should do it automagically :)

}
_, v4Net, err = net.ParseCIDR(oc.V4InternalSubnet)
if err != nil {
out = append(out, errors.Errorf("v4InternalSubnet is invalid: %s", err))
}
if !isV4InternalSubnetLargeEnough(conf) {
out = append(out, errors.Errorf("v4InternalSubnet is no large enough for the maximum number of nodes which can be supported by ClusterNetwork"))
out = append(out, errors.Errorf("v4InternalSubnet is not large enough for the maximum number of nodes which can be supported by ClusterNetwork"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's print what v4InternalSubnet is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the join subnet which is not part of this feature, but we can go ahead and update that log

@@ -920,7 +923,7 @@ func validateOVNKubernetes(conf *operv1.NetworkSpec) []error {
out = append(out, errors.Errorf("v6InternalSubnet is invalid: %s", err))
}
if !isV6InternalSubnetLargeEnough(conf) {
out = append(out, errors.Errorf("v6InternalSubnet is no large enough for the maximum number of nodes which can be supported by ClusterNetwork"))
out = append(out, errors.Errorf("v6InternalSubnet is not large enough for the maximum number of nodes which can be supported by ClusterNetwork"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's print what v6InternalSubnet is

@ricky-rav
Copy link
Contributor

@bpickard22 so in order to test CNO against the API changes that are still to be merged, you can modify the go.mod file to point to your own fork of openshift/api with the commit hash from your API PR. I tried it here: ricky-rav@94ac16e

Basically in the "replace" part of the go.mod file, you type:
github.com/openshift/api => github.com/bpickard22/api 680fd20165a4385e0b92a50ecd91d6c3763bd5d1

Then you run go mod vendor and it will rewrite the commit ID into the "v0.0.0..." form:
github.com/openshift/api => github.com/bpickard22/api v0.0.0-20230804184907-680fd20165a4

Of course you'll undo this once the API PR merges, but at least we can move forward with this CNO in the meantime :)

@zshi-redhat
Copy link
Contributor

@bpickard22 Is this PR ready to merge?

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 20, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 21, 2023
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 3, 2024
Copy link
Contributor

@jcaamano jcaamano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the first commit needs to be dropped as it should be merged already.

Comment on lines +69 to +75
{{- if (index . "V4InternalMasqueradeSubnet")}}
v4-internal-masquerade-subnet="{{.V4InternalMasqueradeSubnet}}"
{{- end }}
{{- if (index . "V6InternalMasqueradeSubnet")}}
v6-internal-masquerade-subnet="{{.V6InternalMasqueradeSubnet}}"
{{- end }}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this will cause a rollout/restart of ovn-k, making it a day-1 only configuration parameter. Is that ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is the current intended behavior

Comment on lines 890 to 895
_, v4Net, err = net.ParseCIDR(oc.GatewayConfig.IPv4.InternalMasqueradeSubnet)
if err != nil {
out = append(out, errors.Errorf("v4InternalMasqueradeSubnet is invalid: %s", err))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in other places that the error returned from net.ParseCIDR is ignored, I suspect because it is assumed to be properly validated by the API server. But here you don't. If there is an error here, are you sure you can derenference safely *v4Net later on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct the api would catch any invalid subnet configuration (or at least it should), just being cautious here to catch anything that i missed with my api validations. I am unsure if a value that could cause a panic would occur here because we already check for nil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you check for nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we check that the gatewayConfig is not nil, so if no value is passed into the internalMasqueradeSubnet it should just be an empty string im pretty sure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you have

_, v4MasqNet, err = net.ParseCIDR(oc.GatewayConfig.IPv4.InternalMasqueradeSubnet)

if that returns non-nil err and potentially nil v4MasqNet then you just append the error but continue on

out = append(out, errors.Errorf("v4InternalMasqueradeSubnet is invalid: %s", err)

and then a few lines below you dereference v4MasqNet which will panic if it is nil

if iputil.NetsOverlap(*v4MasqNet, *v4ClusterNet)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcaamano added a check to make sure that the v4/v6 masqNet return isnt nil

for _, sn := range conf.ServiceNetwork {
if utilnet.IsIPv4CIDRString(sn) {
_, v4ServiceNet, _ := net.ParseCIDR(sn)
if iputil.NetsOverlap(*v4Net, *v4ServiceNet) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that there is no overlap with the join subnet as well? Perhaps it would be useful to have an utility method to pass a map of network names to networks, that just checks that there are no overlaps among any of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would like to get this in before code freeze and then add some utility that allows us to check for overlap when we add the transit switch config this coming release, I can add a todo here if thats alright.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the utility, don't you think we need to check for overlaps with the join subnet in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes will add that now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcaamano added a check for just the join and masq subnets here, will add a utility when the transit swtich config is introduced here

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2024
@bpickard22 bpickard22 force-pushed the masq-subnet branch 2 times, most recently from 0a0a0d4 to f8b2469 Compare January 15, 2024 22:03
@bpickard22
Copy link
Contributor Author

/retest-required

@bpickard22 bpickard22 force-pushed the masq-subnet branch 2 times, most recently from 3358439 to c6ef9f9 Compare January 16, 2024 14:33
@bpickard22
Copy link
Contributor Author

/retest-required

Comment on lines 835 to 838
if v4JoinNet == nil {
out = append(out, errors.Errorf("Unable to parse cidr for v4InternalSubnet %s", oc.V4InternalSubnet))
return out
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If v4JoinNet == nil then err != nil so it doesn't make sense to check and append an error for both things.
Also this method is trying to validate all without breaking out early, I think we can keep doing that.

I would do this

_, v4JoinNet, err = net.ParseCIDR(oc.V4InternalSubnet)
if err != nil {
    out = append(out, errors.Errorf("v4InternalSubnet is invalid: %s", err))
} else if !isV4InternalSubnetLargeEnough(conf) {
    out = append(out, errors.Errorf("v4InternalSubnet %s is not large enough for the maximum number of nodes which can be supported by ClusterNetwork", oc.V4InternalSubnet))
}

...
...

    if v4JoinNet != nil {
        _, v4ClusterNet, _ := net.ParseCIDR(cn.CIDR)
	if iputil.NetsOverlap(*v4JoinNet, *v4ClusterNet) {
	    out = append(out, errors.Errorf("v4InternalSubnet %s overlaps with ClusterNetwork %s", oc.V4InternalSubnet, cn.CIDR))
	}
    }

Similar approach on the other instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest change implements this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You missed the second part, I added a comment for it

@bpickard22
Copy link
Contributor Author

/retest-required

if v4JoinNet != nil {
for _, cn := range conf.ClusterNetwork {
if utilnet.IsIPv4CIDRString(cn.CIDR) {
if oc.V4InternalSubnet != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should replace oc.V4InternalSubnet != "" with v4JoinNet != nil

Similar in other instances

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think that is a redundant if check that can just be removed

@bpickard22
Copy link
Contributor Author

/retest-required

Adds the necessary knobs in the cno to consume the api change exposing
the internal masquerade subnet in ovn-k where we reserve addresses for
ip masquerading

Had to rebase on top of small conflict with addition of logging for
libovsdb in the config yamls

Also adds unit tests and verification in cno for configured masquerade
subnets and corrects some tests/verification of the join subnet

Signed-off-by: Ben Pickard <bpickard@redhat.com>
@jcaamano
Copy link
Contributor

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 19, 2024
Copy link
Contributor

openshift-ci bot commented Jan 19, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bpickard22, jcaamano, ricky-rav

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 19, 2024
Copy link
Contributor

openshift-ci bot commented Jan 19, 2024

@bpickard22: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.15-upgrade-from-stable-4.14-images 6a66258 link true /test 4.15-upgrade-from-stable-4.14-images
ci/prow/4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-upgrade 6a66258 link false /test 4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-upgrade
ci/prow/4.15-upgrade-from-stable-4.14-e2e-azure-ovn-upgrade 6a66258 link false /test 4.15-upgrade-from-stable-4.14-e2e-azure-ovn-upgrade
ci/prow/4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade 6a66258 link false /test 4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade
ci/prow/e2e-azure-ovn-dualstack b8fe87e link false /test e2e-azure-ovn-dualstack
ci/prow/e2e-azure-ovn 1b4d11d link false /test e2e-azure-ovn
ci/prow/e2e-openstack-ovn 1b4d11d link false /test e2e-openstack-ovn
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 1b4d11d link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/4.16-upgrade-from-stable-4.15-e2e-azure-ovn-upgrade 1b4d11d link false /test 4.16-upgrade-from-stable-4.15-e2e-azure-ovn-upgrade
ci/prow/e2e-aws-hypershift-ovn-kubevirt 1b4d11d link false /test e2e-aws-hypershift-ovn-kubevirt
ci/prow/4.16-upgrade-from-stable-4.15-e2e-gcp-ovn-upgrade 1b4d11d link false /test 4.16-upgrade-from-stable-4.15-e2e-gcp-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@bpickard22 bpickard22 changed the title Support configurable masquerade subnet in ovn-k SDN-3708: Support configurable masquerade subnet in ovn-k Jan 22, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 22, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 22, 2024

@bpickard22: This pull request references SDN-3708 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

Adds the necessary knobs in the cno to consume the api change (openshift/api#1410) exposing the internal masquerade subnet in ovn-k where we reserve addresses for ip masquerading

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 2e23c82 and 2 for PR HEAD 1b4d11d in total

@openshift-merge-bot openshift-merge-bot bot merged commit dc94dc0 into openshift:master Jan 22, 2024
33 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants