Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPNET-351: Extend infra CR to store VIPs and MachineNetwork #1593

Merged
merged 1 commit into from
Nov 20, 2023

Conversation

mkowalski
Copy link
Contributor

@mkowalski mkowalski commented Sep 20, 2023

In order to enable work around mutable VIPs we are extending the Infrastructure CR to hold VIPs in the spec and not only in the status.

In order to be able to perform proper validations (which currently only happen in o/installer as VIPs are immutable) we are storing machine networks alongside.

Implements: OPNET-351
Contributes-to: OPNET-80
Contributes-to: OPNET-340

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 20, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 20, 2023

@mkowalski: This pull request references OPNET-351 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

In order to enable work around mutable VIPs we are extending the Infrastructure CR to hold VIPs in the spec and not only in the status.

In order to be able to perform proper validations (which currently only happen in o/installer as VIPs are immutable) we are storing machine networks alongside.

Implements: OPNET-351
Contributes-to: OPNET-80
Contributes-to: OPNET-340

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 20, 2023

@mkowalski: This pull request references OPNET-351 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

In order to enable work around mutable VIPs we are extending the Infrastructure CR to hold VIPs in the spec and not only in the status.

In order to be able to perform proper validations (which currently only happen in o/installer as VIPs are immutable) we are storing machine networks alongside.

Implements: OPNET-351
Contributes-to: OPNET-80
Contributes-to: OPNET-340

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 20, 2023

Hello @mkowalski! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 20, 2023
@mkowalski mkowalski force-pushed the OPNET-351 branch 2 times, most recently from 58a18cd to 8f95a94 Compare September 20, 2023 14:29
@mkowalski
Copy link
Contributor Author

/cc @cybertron
Baremetal

/cc @MaysaMacedo
OpenStack

/cc @rvanderp3
VSphere

@mkowalski
Copy link
Contributor Author

/retest

Prow is choking

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC these fields are copying from status fields? What happens if there's a diff between spec and status, you need a controller to reconcile the difference, validate and accept the change, what is going to do that?

config/v1/types_infrastructure.go Outdated Show resolved Hide resolved
config/v1/types_infrastructure.go Outdated Show resolved Hide resolved
config/v1/types_infrastructure.go Outdated Show resolved Hide resolved
config/v1/types_infrastructure.go Outdated Show resolved Hide resolved
config/v1/types_infrastructure.go Outdated Show resolved Hide resolved
@mkowalski mkowalski force-pushed the OPNET-351 branch 2 times, most recently from 2b0d5c4 to bbf3a91 Compare October 16, 2023 16:55
@openshift-ci openshift-ci bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 16, 2023
@mkowalski mkowalski force-pushed the OPNET-351 branch 3 times, most recently from 58b2953 to 657f306 Compare October 17, 2023 08:01
maxItems: 2
type: array
x-kubernetes-list-type: set
machineNetworks:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is machineNetworks required once apiServerInternalIPs and ingressIPs are set? If yes, should that be documented/verified?

Copy link
Contributor Author

@mkowalski mkowalski Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it is not because we need to support upgraded clusters. As the only source of this information is install-config.yaml to which Cluster Network Operator has no access, it can be only o/installer who sets this field during installation time.

So for newly installed 4.15 clusters we will populate this field and for clusters that originated in 4.14 or older this field will remain empty.

Also I know about at least one platform which users do not know the concept of Machine Network even though they use API/Ingress VIPs. I do not know how it started, but at some point someone started setting a catch-all CIDR as machine network for them so for a while we had clusters with something like 0.0.0.0/0 as machine network. This never failed because no component in OCP really used this subnet, but it's to give you the idea of how people misuse or ignore this parameter.

@mkowalski
Copy link
Contributor Author

Hey folks, can we get approve+lgtm on this unless there are more changes requested? That would unblock the works in o/installer and CNO

@MaysaMacedo
Copy link
Contributor

/lgtm
For the OpenStack part

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 24, 2023
@mkowalski mkowalski force-pushed the OPNET-351 branch 2 times, most recently from 3a8e0d3 to 18e6f05 Compare October 30, 2023 16:59
@mkowalski
Copy link
Contributor Author

/retest-required

@mkowalski
Copy link
Contributor Author

/test e2e-aws-ovn

Not my failure

level=info msg=Creating infrastructure resources...
level=error
level=error msg=Error: no matching EC2 Subnet found 

@mkowalski
Copy link
Contributor Author

/test e2e-aws-ovn

Failure of unrelated test

@@ -1176,6 +1291,11 @@ type VSpherePlatformStatus struct {
// +openshift:enable:FeatureSets=CustomNoUpgrade;TechPreviewNoUpgrade
// +optional
LoadBalancer *VSpherePlatformLoadBalancer `json:"loadBalancer,omitempty"`

// machineNetworks are IP networks used to connect all the OpenShift cluster nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many networks are supported? We should add a limit to the length of this list and explain that limit in the godoc

Can we add a comment inline that explains that the values in the list should be either of ipv4 or ipv6 CIDR notation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no limit on machine networks supported today. There are no rules as for the order of IP stacks in the list of machine networks today. This one field already in install-config.yaml is a bit of wild west, mainly for the reason that it's not used anywhere for real

We have a comment explaining notation added to the Spec. In Status I am not doing this because it's read only and no one should modify it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. so while I appreciate it's currently wild west, what do you think users are actually using? Do you think we could justify limiting this to 32 networks? This new API will only affect new clusters anyway right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with limiting to 32 right now but we need to be aware there may be incoming request to increase this. Let me explain the use case...

We have this thingy called "remote worker nodes". While in the most basic setup it means you have masters in 1st subnet and workers in 2nd (separate) subnet, I can easily imagine a customer deploying cluster with 100 workers where each of them lives in its own subnet. I believe it's only a matter of time, but again, today limit of 32 should be okay and in the worst case we receive a bugzilla to increase it.

This new API will only affect new clusters anyway right?

Yes, this is correct. The new MachineNetwork fields will remain empty for already existing clusters (even those that are upgraded to 4.15) because we don't have today any means to backfill it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna wager we won't get that request any time soon, but you may tell me you told me so if we do. It would be relaxing the API validation so is a compatible change given that we control the consumers of the field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it will be a very easy change to do shall we ever need it

Comment on lines 1238 to 1239
// machineNetworks are IP networks used to connect all the OpenShift cluster
// nodes. Each network is provided in the CIDR format, e.g. "192.0.2.0/24".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should explain that this supports ipv6 as well.

Is there a length limit that we can put on this list please?

Copy link
Contributor Author

@mkowalski mkowalski Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed a comment to

	// machineNetworks are IP networks used to connect all the OpenShift cluster
	// nodes. Each network is provided in the CIDR format and should be IPv4 or IPv6,
	// for example "10.0.0.0/8" or "fd00::/8".

We can not add length limit for the reasons explained above (i.e. the same field in install-config.yaml has no limits and in general in OCP we allow as many machine networks as you want)

config/v1/types_infrastructure.go Show resolved Hide resolved
config/v1/types_infrastructure.go Show resolved Hide resolved
config/v1/types_infrastructure.go Outdated Show resolved Hide resolved
@mkowalski mkowalski force-pushed the OPNET-351 branch 2 times, most recently from 6cdc2e1 to 76bd105 Compare November 14, 2023 14:25
In order to enable work around mutable VIPs we are extending the
Infrastructure CR to hold VIPs in the spec and not only in the status.

In order to be able to perform proper validations (which currently only
happen in o/installer as VIPs are immutable) we are storing machine
networks alongside.

Implements: OPNET-351
Contributes-to: OPNET-80
Contributes-to: OPNET-340
@openshift-ci openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 20, 2023
@JoelSpeed
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 20, 2023
Copy link
Contributor

openshift-ci bot commented Nov 20, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed, MaysaMacedo, mkowalski

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 20, 2023
Copy link
Contributor

openshift-ci bot commented Nov 20, 2023

@mkowalski: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit f872698 into openshift:master Nov 20, 2023
16 checks passed
@mkowalski mkowalski deleted the OPNET-351 branch November 20, 2023 18:40
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-cluster-config-api-container-v4.15.0-202311202349.p0.gf872698.assembly.stream for distgit ose-cluster-config-api.
All builds following this will include this PR.

mkowalski added a commit to mkowalski/openshift-installer that referenced this pull request Nov 21, 2023
This commit bumps openshift/api dependency in order to be able to use
work done in openshift/api#1593.

We want to be able to use new fields in the Infrastructure CR that have
been added by the PR mentioned above.
mkowalski added a commit to mkowalski/openshift-installer that referenced this pull request Nov 21, 2023
This commit bumps openshift/api dependency in order to be able to use
work done in openshift/api#1593.

It bumps openshift/client-go in order to be able to interact with new
fields implemented in the PR mentioned above.
mkowalski added a commit to mkowalski/openshift-installer that referenced this pull request Nov 24, 2023
This commit bumps openshift/api dependency in order to be able to use
work done in openshift/api#1593.

It bumps openshift/client-go in order to be able to interact with new
fields implemented in the PR mentioned above.
mkowalski added a commit to mkowalski/openshift-installer that referenced this pull request Nov 27, 2023
This commit bumps openshift/api dependency in order to be able to use
work done in openshift/api#1593.

It bumps openshift/client-go in order to be able to interact with new
fields implemented in the PR mentioned above.
mkowalski added a commit to mkowalski/openshift-installer that referenced this pull request Nov 28, 2023
This commit bumps openshift/api dependency in order to be able to use
work done in openshift/api#1593.

It bumps openshift/client-go in order to be able to interact with new
fields implemented in the PR mentioned above.
mkowalski added a commit to mkowalski/openshift-installer that referenced this pull request Nov 28, 2023
In openshift/api#1593 OpenShift API got new
fields related to the network configuration.

PR bumps openshift/client-go in order to be able to interact with new
fields implemented in the PR mentioned above.
@mkowalski
Copy link
Contributor Author

/revert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants