Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPNET-512: config/v1/types_infrastructure: change set to atomic for networks #1873

Merged
merged 1 commit into from
Apr 29, 2024

Conversation

mkowalski
Copy link
Contributor

@mkowalski mkowalski commented Apr 25, 2024

In order to handle ownership of network addresses in a more elegant way,
we are switching list type from set to atomic. Thanks to this the
whole list will be managed as one, instead of different owners for every
entry on the list.

We are adding CEL validation to fields in PlatformStatus for the
consistency. As this field has already been validated by o/installer,
this should not affect CRs with PlatformStatus already populated.

Copy link
Contributor

openshift-ci bot commented Apr 25, 2024

Hello @mkowalski! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 25, 2024
@openshift-ci openshift-ci bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 25, 2024
// +optional
IngressIPs []IP `json:"ingressIPs"`

// machineNetworks are IP networks used to connect all the OpenShift cluster
// nodes. Each network is provided in the CIDR format and should be IPv4 or IPv6,
// for example "10.0.0.0/8" or "fd00::/8".
// +listType=set
// +listType=atomic
// +kubebuilder:validation:MaxItems=32
// +optional
MachineNetworks []CIDR `json:"machineNetworks"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need CEL to keep set semantic here and on every other MachineNetworks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, adding

// +kubebuilder:validation:XValidation:rule="self.all(x, self.exists_one(y, x == y))"

and testing. Can you confirm it's okay to have CEL only in Spec and leave it out of Status as only the former is supposed to accept user input?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to have it in both, because while only spec is supposed to be accessible by user, they can still edit status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Given that API and Ingress VIP have CELs only in Spec, do you want me to also add them to Status in this PR ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are confident that isn't going to break anything, then yes.

The no breaking way to add it is to add ratcheting, prefix with self == oldSelf ||

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "confident" is a stretch because in the past this field was basically copy-pasted from install-config.yaml and all the validations were happening in o/installer. I believe it wouldn't break anything, but it's more faith than certainty

Was the installer doing any validation?

Yes, we have https://github.com/openshift/cluster-network-operator/blob/master/pkg/controller/infrastructureconfig/validations.go#L40-L50 that is validating that if you provide more than 1 VIP, you need to have multiple IP families. So we will not allow setting 2 IPs that are IPv4-only or IPv6-only

At what stage does this occur? Does it cause the cluster to degrade or anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the installer doing any validation?

Yes, o/installer should prevent you from adding 2 VIPs of the same IP stack to the install-config.yaml

At what stage does this occur? Does it cause the cluster to degrade or anything?

This happens in a controller inside CNO which reconciles Infrastructure CR. So if you edit Infra with "illegal" values, the output of oc get co will show you that Network Operator is degraded and the message will be something around 2 vips of the same ip stack are not allowed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and when were these APIs introduced as GA? I think we are safe to copy the validation to the status provided the installer has already validated this and you go degraded if the status gets edited.

If the status gets edited to invalid, does it get put back by anything?

Copy link
Contributor Author

@mkowalski mkowalski Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when were these APIs introduced as GA

#1152 (merged in August 2022) + openshift/enhancements#1048 (looks like 4.11)

If the status gets edited to invalid, does it get put back by anything?

Yes. In the master branch of CNO if you modify Status without modifying Spec, Status will get reverted because operator reconciles Status to match Spec. So for a scenario

  1. start from valid Spec and Status
  2. update with invalid Status (without modifying Spec)

you end up with no changes applied because Spec is authoritative over Status. You would need to modify Spec and Status, but then Spec is validated via CEL already now and that will stop you.

Hope this explanation is enough to support statement "we have no way of introducing illegal values into Status"

Copy link
Contributor

@JoelSpeed JoelSpeed Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, SGTM, lets copy over the validation from spec to status and enforce in the same way there too

Edit: I see you've done that before I commented

@mkowalski mkowalski changed the title config/v1/types_infrastructure: change set to atomic for networks OPNET-512: config/v1/types_infrastructure: change set to atomic for networks Apr 25, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 25, 2024

@mkowalski: This pull request references OPNET-512 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

In order to handle ownership of network addresses in a more elegant way, we are switching list type from set to atomic. Thanks to this the whole list will be managed as one, instead of different owners for every entry on the list.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 25, 2024
@mkowalski mkowalski force-pushed the api-update-for-cno branch 2 times, most recently from e9599f3 to 454f5bf Compare April 25, 2024 14:25
@mkowalski
Copy link
Contributor Author

/cc @cybertron

@openshift-ci openshift-ci bot requested a review from cybertron April 25, 2024 15:07
@mkowalski
Copy link
Contributor Author

/test e2e-metal-ipi

Copy link
Contributor

openshift-ci bot commented Apr 25, 2024

@mkowalski: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test build
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-hypershift
  • /test e2e-aws-ovn-techpreview
  • /test e2e-aws-serial
  • /test e2e-aws-serial-techpreview
  • /test e2e-upgrade
  • /test e2e-upgrade-minor
  • /test images
  • /test integration
  • /test unit
  • /test verify
  • /test verify-client-go
  • /test verify-crd-schema
  • /test verify-deps

The following commands are available to trigger optional jobs:

  • /test e2e-azure
  • /test e2e-gcp

Use /test all to run all jobs.

In response to this:

/test e2e-metal-ipi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mkowalski
Copy link
Contributor Author

The current version of the PR has been tested using the following procedure

  • ask clusterbot to build a release from this PR
  • spawn dev-scripts cluster with custom OPENSHIFT_RELEASE_IMAGE value of what clusterbot produced (e.g. registry.build03.ci.openshift.org/ci-ln-jc2mt8b/release:latest)
  • modify Infra CR using oc edit infrastructure cluster and modify API VIP in PlatformSpec from fd2e:6f44:5dd8:c956::5 to e.g. fd2e:6f44:5dd8:c956::51, and Ingress VIP respectively
  • use oc get co to confirm Network Operator is not degraded
  • use oc get infrastructure cluster --show-managed-fields -o yaml to confirm PlatformStatus has been updated correctly

@mkowalski
Copy link
Contributor Author

/retest-required

@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 26, 2024

@mkowalski: This pull request references OPNET-512 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

In order to handle ownership of network addresses in a more elegant way,
we are switching list type from set to atomic. Thanks to this the
whole list will be managed as one, instead of different owners for every
entry on the list.

We are adding CEL validation to fields in PlatformStatus for the
consistency. As this field has already been validated by o/installer,
this should not affect CRs with PlatformStatus already populated.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 26, 2024
In order to handle ownership of network addresses in a more elegant way,
we are switching list type from `set` to `atomic`. Thanks to this the
whole list will be managed as one, instead of different owners for every
entry on the list.

We are adding CEL validation to fields in PlatformStatus for the
consistency. As this field has already been validated by o/installer,
this should not affect CRs with PlatformStatus already populated.
@mkowalski
Copy link
Contributor Author

/retest-required

@JoelSpeed
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2024
Copy link
Contributor

openshift-ci bot commented Apr 29, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed, mkowalski

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2024
Copy link
Contributor

openshift-ci bot commented Apr 29, 2024

@mkowalski: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit ac9356b into openshift:master Apr 29, 2024
18 checks passed
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-cluster-config-api-container-v4.16.0-202404291547.p0.gac9356b.assembly.stream.el9 for distgit ose-cluster-config-api.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants