From 68a359276f5ab214bde4d1221ab1eac6c59ba836 Mon Sep 17 00:00:00 2001 From: Kevin Delgado Date: Thu, 9 Jun 2022 18:00:47 +0000 Subject: [PATCH 01/22] update kep.yaml --- keps/sig-api-machinery/1027-api-unions/kep.yaml | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/keps/sig-api-machinery/1027-api-unions/kep.yaml b/keps/sig-api-machinery/1027-api-unions/kep.yaml index 6f74bebec83..86597fa8f4e 100644 --- a/keps/sig-api-machinery/1027-api-unions/kep.yaml +++ b/keps/sig-api-machinery/1027-api-unions/kep.yaml @@ -2,6 +2,7 @@ title: Union types kep-number: 1027 authors: - "@apelisse" + - "@kevindelgado" owning-sig: sig-api-machinery participating-sigs: reviewers: @@ -13,7 +14,7 @@ approvers: - "@lavalamp" editor: TBD creation-date: 2019-03-25 -last-updated: 2019-03-25 +last-updated: 2022-06-09 status: implementable see-also: - "/keps/sig-api-machinery/0006-apply.md" @@ -21,5 +22,5 @@ replaces: - "https://docs.google.com/document/d/1lrV-P25ZTWukixE9ZWyvchfFR0NE2eCHlObiCUgNQGQ" superseded-by: -latest-milestone: "1.15" +latest-milestone: "1.25" stage: "alpha" From cd96e98a4edef66ca9ca33c3e47a4803d10feca0 Mon Sep 17 00:00:00 2001 From: Kevin Delgado Date: Thu, 9 Jun 2022 18:18:23 +0000 Subject: [PATCH 02/22] update kep readme to match current template --- .../1027-api-unions/README.md | 530 +++++++++++++++++- 1 file changed, 522 insertions(+), 8 deletions(-) diff --git a/keps/sig-api-machinery/1027-api-unions/README.md b/keps/sig-api-machinery/1027-api-unions/README.md index 68d82ac53af..9736f7395fd 100644 --- a/keps/sig-api-machinery/1027-api-unions/README.md +++ b/keps/sig-api-machinery/1027-api-unions/README.md @@ -1,4 +1,4 @@ -# Union types +# KEP-1027: Union types ## Table of Contents @@ -9,6 +9,12 @@ - [Goals](#goals) - [Non-Goals](#non-goals) - [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) - [Go tags](#go-tags) - [OpenAPI](#openapi) - [Discriminator](#discriminator) @@ -18,22 +24,54 @@ - [Backward compatibilities properties](#backward-compatibilities-properties) - [Validation](#validation) - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) - [Graduation Criteria](#graduation-criteria) - [Beta -> GA Graduation](#beta---ga-graduation) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) - [Future Work](#future-work) - [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) ## Release Signoff Checklist -- [x] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR) -- [x] KEP approvers have set the KEP status to `implementable` -- [x] Design details are appropriately documented -- [x] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input -- [x] Graduation criteria is in place -- [x] "Implementation History" section is up-to-date for milestone +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone - [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] -- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website ## Summary @@ -102,6 +140,44 @@ proposing the following changes. Note that this proposes unions to be "at most one of". Whether exactly one is supported or not should be implemented by the validation logic. +### User Stories (Optional) + + + +#### Story 1 + +#### Story 2 + +### Notes/Constraints/Caveats (Optional) + + + +### Risks and Mitigations + + + +## Design Details + ### Go tags We're proposing a new type of tags for go types (in-tree types, and also @@ -276,6 +352,66 @@ There are mostly 3 aspects to this plan: - [x] Functionality as part of server-side apply: How human and robot interactions work: https://github.com/kubernetes-sigs/structured-merge-diff/blob/master/typed/union_test.go - [x] Integration in kubernetes: https://github.com/kubernetes/kubernetes/pull/77370/files#diff-4ac5831d494b1b52c7c7be81e552a458 +[ ] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + + + +##### Unit tests + + + + + +- ``: `` - `` + +##### Integration tests + + + +- : + +##### e2e tests + + + +- : + ### Graduation Criteria Since this project is a sub-project of Server-side apply, it will be introduced @@ -288,6 +424,367 @@ criteria below. - Core-types all implement the semantics properly - Stable and bug-free for two releases +### Upgrade / Downgrade Strategy + + + +### Version Skew Strategy + + + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [ ] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: + - Components depending on the feature gate: +- [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). + +###### Does enabling the feature change any default behavior? + + + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +###### What happens if we reenable the feature if it was previously rolled back? + +###### Are there any tests for feature enablement/disablement? + + + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +###### Will enabling / using this feature result in introducing new API types? + + + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + ## Future Work Since the proposal only normalizes the object after the patch has been applied, @@ -313,3 +810,20 @@ Here are the major milestones for this KEP: - conversion between schema and openapi definition are in k8s.io/kube-openapi - core types have been modified in k8s.io/kubernetes - Feature is ready to be released in Beta in kubernetes 1.15 + +## Drawbacks + + +* Stutter with discriminator +* Inconsistent for existing types here + +## Alternatives +* Non-Discrminated + + From 5100aa8438e1f887c620bfa64b1ad32078a46358 Mon Sep 17 00:00:00 2001 From: Kevin Delgado Date: Thu, 9 Jun 2022 21:21:36 +0000 Subject: [PATCH 03/22] Update unions KEP for v1.25 --- .../1027-api-unions/README.md | 245 ++++++++++-------- 1 file changed, 134 insertions(+), 111 deletions(-) diff --git a/keps/sig-api-machinery/1027-api-unions/README.md b/keps/sig-api-machinery/1027-api-unions/README.md index 9736f7395fd..f756070b1fb 100644 --- a/keps/sig-api-machinery/1027-api-unions/README.md +++ b/keps/sig-api-machinery/1027-api-unions/README.md @@ -125,12 +125,14 @@ become possible. ### Goals The goal is to enable a union or "oneof" semantics in Kubernetes types, both for -in-tree types and for CRDs. +in-tree types and for CRDs. Validation of these unions should be centralized +rather than being written on a per resource basis in validation functions. ### Non-Goals -We're not planning to use this KEP to release the feature, but mostly as a way -to document what we're doing. +A priority will be minimum disruption to existing types rather than ensuring all +existing types conform to a single union paradigm (i.e. unions without a +discriminator should continue to not need a discriminator) ## Proposal @@ -147,15 +149,27 @@ Detail the things that people will be able to do if this KEP is implemented. Include as much detail as possible so that people can understand the "how" of the system. The goal here is to make this feel real for users without getting bogged down. + +TODO: ask antoine if these stories are complete/sufficient --> + #### Story 1 +As a CRD owner, I should be able to author my CRD such that the openapi schema +indicates which fields are members of this oneOf, so that when users crate +custom resources, the oneOf fields will be automatically validated, without me +having to write custom validation logic. + #### Story 2 +As a maintainer of existing builtin APIs, I should be able to add new fields to +a union and have it behave as expected. + ### Notes/Constraints/Caveats (Optional) ### Risks and Mitigations +- We need to ensure we do not break existing union types. This can be done by + not forcing existing unions to conform to the newly proposed union semantics. + Integration testing with older types should give us the confidence to be sure + we have done so. +- There is a lot of risk for errors when there exists skew between clients and + server. In the section on normalization, we discuss mitigating these risks. - ``: `` - `` +--> +Core functionality will be extensively unit tested in the SMD typed package +(union_test.go). + +Parts of the kubernetes endpoints handlers package that are modified to call +into the SMD code will also be unit tested as appropriate. + + + + + ##### Integration tests +We will have extensive integration testing of the union code in the +`test/integration/apiserver` package. + +We will be testing along the dimensions of: +* Which fields of the union get modified (none, existing fields, newly updated + fields) +* Type of union (discriminated vs non-discriminated) +* Whether the client is aware of all the fields +* Whether the server is aware of all fields +* Whether the union is optional or required + +A fully documented test matrix exists in a [google +spreadsheet](https://docs.google.com/spreadsheets/d/1dIOOqgrgvMI9b2412eVuRSydEaOxhYOoqk6bUjZOY9c/edit?resourcekey=0-wlOfJTC_EX-qpU680STHMA#gid=3601413) along with a +[guide +doc](https://docs.google.com/document/d/1Wruosjo0ELLl1yxauzpsUjgH2fK9KdgXDmOdJ5sG7Kg/edit?resourcekey=0-8Pwzx6EvsFR7VQoXzCTY4Q) on how to read and understand the test matrix. - : +--> ##### e2e tests @@ -408,21 +443,21 @@ For Beta and GA, add links to added tests together with links to k8s-triage for https://storage.googleapis.com/k8s-triage/index.html We expect no non-infra related flakes in the last month as a GA graduation criteria. ---> - : +--> + We are considering adding kubectl e2e tests to mimic kubectl users performing + various operations on objects with union fields. ### Graduation Criteria -Since this project is a sub-project of Server-side apply, it will be introduced -directly as Beta, and will graduate to GA in a later release, according to the -criteria below. +#### Alpha Graduation -#### Beta -> GA Graduation - -- CRD support has been proven successful -- Core-types all implement the semantics properly -- Stable and bug-free for two releases +- CRDs can be created with union fields and are properly validated when + created/updated. +- Existing unions that have discriminators, are properly tagged and validated via + union validation +- Existing unions that don't have discriminators do not break when upgraded. ### Upgrade / Downgrade Strategy @@ -785,42 +820,30 @@ For each of them, fill in the following information by copying the below templat ###### What steps should be taken if SLOs are not being met to determine the problem? -## Future Work - -Since the proposal only normalizes the object after the patch has been applied, -it is hard to fail if the patch is invalid. There are scenarios where the patch -is invalid but it results in an unpredictable object. For example, if a patch -sends both a discriminator and a field that is not the discriminated field, it -will either clear the value sent if the discriminator changes, or it will change -the value of the sent discriminator. - -Validating patches is not a problem that we want to tackle now, but we can -validate "Apply" objects to make sure that they do not define such broken -semantics. - ## Implementation History - -Here are the major milestones for this KEP: -- Initial discussion happened a year before the creation of this kep: - https://docs.google.com/document/d/1lrV-P25ZTWukixE9ZWyvchfFR0NE2eCHlObiCUgNQGQ/edit#heading=h.w5eqnf1f76x5 -- Points made in the initial document have been improved and put into this kep, - which has approved by sig-api-machinery tech-leads -- KEP has been implemented: - - logic mostly lives in sigs.k8s.io/structured-merge-diff - - conversion between schema and openapi definition are in k8s.io/kube-openapi - - core types have been modified in k8s.io/kubernetes -- Feature is ready to be released in Beta in kubernetes 1.15 +- Unions implemented, but disabled in SMD. ## Drawbacks -* Stutter with discriminator -* Inconsistent for existing types here +An issue that one might have with requiring a discriminator is that it might +seem redundant to have to set a field _and_ set another field indicating to the +server to use the set field. The reasons for doing so are discussed above in the +normalization section. + +One other drawback is that our approach does not standardize all existing unions +into a single format. We don't see a way to do so without drastically changing +existing APIs and breaking backwards compatibility ## Alternatives * Non-Discrminated +The primary alternative discussed is to not have a discriminator for new union +types. As discussed in the normalization section, requiring a discriminator +allows the server to better understand the intentions of clients that do not +have knowledge of all the fields in a union if newer verisons of the server add +new fields to the union. #### Story 1 -As a CRD owner, I should be able to author my CRD such that the openapi schema -indicates which fields are members of this oneOf, so that when users crate -custom resources, the oneOf fields will be automatically validated, without me -having to write custom validation logic. +As a CRD owner, I can use simple semantics (such as go tags), to express the +desired validation of a oneOf (at most one or exactly one field may be set), and +the API server will perform this validation automatically. #### Story 2 +As a client, I can read, modify, and update the union fields of an object, even +if I am not aware of all of the possible fields, and the server will properly +interpret my intent. + +#### Story 3 + As a maintainer of existing builtin APIs, I should be able to add new fields to a union and have it behave as expected. ### Notes/Constraints/Caveats (Optional) +At first, Only new union types will follow the prescribed guinance, so no upgrades/downgrades are possible for types +that don't exist yet. + ### Version Skew Strategy +See test matrix and commentary about discriminators. It clearly documents how +the server will use the discriminator to understand the client's intention even +if the client is not aware of all union fields because of version skew. + ## Production Readiness Review Questionnaire -- [ ] Feature gate (also fill in values in `kep.yaml`) - - Feature gate name: - - Components depending on the feature gate: -- [ ] Other - - Describe the mechanism: - - Will enabling / disabling the feature require downtime of the control - plane? - - Will enabling / disabling the feature require downtime or reprovisioning - of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). +- [x] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: APIUnions + - Components depending on the feature gate: kube-apiserver + +Request handlers in the api server will call into union validation and +normalization function from the structured-merge-diff repo when feature is +enabled. + ###### Does enabling the feature change any default behavior? +No + +Yes, requests will simply skip union validation and normalization. + ###### What happens if we reenable the feature if it was previously rolled back? +Requests will resume perform union validation and normalization + ###### Are there any tests for feature enablement/disablement? @@ -658,6 +663,7 @@ previous answers based on experience in the field. ###### How can an operator determine if the feature is in use by workloads? +Simply creating and updating objects with union fields. - [ ] Events - Event Reason: @@ -682,9 +691,12 @@ Recall that end users cannot usually observe component logs or access metrics. - Other field: - [ ] Other (treat as last resort) - Details: +--> ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? +N/A + - [ ] Metrics - Metric name: @@ -712,9 +724,11 @@ Pick one more of these and delete the rest. - Components exposing the metric: - [ ] Other (treat as last resort) - Details: +--> ###### Are there any missing metrics that would be useful to have to improve observability of this feature? +N/A ###### Does this feature depend on any specific services running in the cluster? +N/A -At first, Only new union types will follow the prescribed guinance, so no upgrades/downgrades are possible for types -that don't exist yet. +Turning the flag on for alpha just enables different runtime codepaths (i.e. +performing the unified union validation and normalization) + +Any schema markers (added by CRD authors or propagated from tags on built-in +types) will appear in the schema, but not do anything if the flag is off. ### Version Skew Strategy @@ -520,6 +548,13 @@ See test matrix and commentary about discriminators. It clearly documents how the server will use the discriminator to understand the client's intention even if the client is not aware of all union fields because of version skew. +Skew with alpha flag on/off shouldn't make much of a difference. +* Objects created with the union semantics, but applied to a cluster with the + alpha flag off will simply not perform union validation and normalization. +* Objects created without union semantics will simply not trigger union + validation and normalization (regardless of whether the server has the alpha + enabled or disabled). + ## Production Readiness Review Questionnaire +We will have integration tests demonstrating how CRs with persisted invalid data +will need to be corrected when the feature is re-enabled (and requires more +strict union validation). + ### Rollout, Upgrade and Rollback Planning