-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2885 Field Validation 1.27 GA graduation #3725
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,3 +3,5 @@ alpha: | |
approver: "@deads2k" | ||
beta: | ||
approver: "@deads2k" | ||
stable: | ||
approver: "@deads2k" |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -105,6 +105,8 @@ tags, and then generate with `hack/update-toc.sh`. | |||||
- [Graduation Criteria](#graduation-criteria) | ||||||
- [Alpha](#alpha) | ||||||
- [Beta](#beta) | ||||||
- [GA](#ga) | ||||||
- [Version Skew Strategy](#version-skew-strategy) | ||||||
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) | ||||||
- [Feature Enablement and Rollback](#feature-enablement-and-rollback) | ||||||
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) | ||||||
|
@@ -313,11 +315,9 @@ client-side validation, albeit one that is error-prone and not officially | |||||
supported). | ||||||
|
||||||
Long-term, we want to favor using out-of-tree solutions for client-side | ||||||
validation, though this idea is still in its infancy. | ||||||
|
||||||
The [kubeval](https://www.kubeval.com/) project is an example of an out-of-tree solution that does this, and | ||||||
we will look into expanding its support of open API to v3, and investigate | ||||||
whether it makes sense as a permanent solution to client-side validation. | ||||||
validation. The [kubeconform](https://github.com/yannh/kubeconform) project is an example of an out-of-tree solution that does this, and | ||||||
we recommend using this or similar tools to validate manifests offline going | ||||||
forward. | ||||||
|
||||||
##### Aligning json and yaml errors | ||||||
|
||||||
|
@@ -615,6 +615,11 @@ It tests the cross product of all valid permutations along the dimensions of: | |||||
With field validation on by default in beta, we will modify | ||||||
[test/e2e/kubectl/kubectl.go](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/kubectl/kubectl.go) to ensure that kubectl defaults to using server side field validation and detects unknown/duplicate fields as expected. | ||||||
|
||||||
[GA] | ||||||
We will introduce field validation specific e2e/conformance tests to submit | ||||||
requests directly against the API server for both built-in and custom resources | ||||||
to test that duplicate and unknown fields are appropriately detected. | ||||||
|
||||||
### Graduation Criteria | ||||||
<!-- | ||||||
**Note:** *Not required until targeted at a release.* | ||||||
|
@@ -655,29 +660,20 @@ Below are some examples to consider, in addition to the aforementioned [maturity | |||||
- [x] field validation integration tests check for exact match of strict errors | ||||||
- [x] In tree NestedObjectDecoders no longer short circuit on strict decoding | ||||||
errors [#107545](https://github.com/kubernetes/kubernetes/issues/107545) | ||||||
- [ ] Unknown/Duplicate fields are properly detected in the metadata at both the | ||||||
- [x] Unknown/Duplicate fields are properly detected in the metadata at both the | ||||||
root level and within embedded objects | ||||||
[#109215](https://github.com/kubernetes/kubernetes/issues/109215),[#109316](https://github.com/kubernetes/kubernetes/pull/109316), and | ||||||
[#109494](https://github.com/kubernetes/kubernetes/pull/109494) | ||||||
|
||||||
|
||||||
<!-- | ||||||
#### GA | ||||||
|
||||||
- N examples of real-world usage | ||||||
- N installs | ||||||
- More rigorous forms of testing—e.g., downgrade tests and scalability tests | ||||||
- Allowing time for feedback | ||||||
|
||||||
**Note:** Generally we also wait at least two releases between beta and | ||||||
GA/stable, because there's no opportunity for user feedback, or even bug reports, | ||||||
in back-to-back releases. | ||||||
|
||||||
**For non-optional features moving to GA, the graduation criteria must include | ||||||
[conformance tests].** | ||||||
|
||||||
[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md | ||||||
- [x] Wait two releases (1.25 and 1.26) with field validation enabled by default | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's good. I think in practice it's still very early to say anything about 1.26, but it's probably in the process of being rolled-out in different places. In the meantime, we must have some data for 1.25. Do we have metrics about how this is going? Can we know if it's working properly? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a data point, we (GKE) are seeing 1.25 API servers use less CPU and memory overall in the fleet than 1.24 API servers did at the same point in the 1.24 rollout. That at least means that any added cost of this feature was offset by other improvements in 1.25 in our use. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've summarized your comment inline. Let me know if there's a better place to put it or if we want/are able to provide more concrete metrics, to further support the claim I'm making that existing releases are not seeing issues in the wild. |
||||||
and receive minimal bug reports pertaining to the feature. | ||||||
- [] More rigorous e2e/conformance testing added to invoke field validation directly | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: maybe drop the "more rigorous" bit, since the new test will not actually exercise anything that is currently untested, it will just do it directly and in conformance
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||||||
against the API server | ||||||
|
||||||
<!-- | ||||||
#### Deprecation | ||||||
|
||||||
- Announce deprecation and support policy of the existing flag | ||||||
|
@@ -696,20 +692,22 @@ enhancement: | |||||
cluster required to make on upgrade, in order to maintain previous behavior? | ||||||
- What changes (in invocations, configurations, API use, etc.) is an existing | ||||||
cluster required to make on upgrade, in order to make use of the enhancement? | ||||||
--> | ||||||
|
||||||
### Version Skew Strategy | ||||||
|
||||||
If applicable, how will the component handle version skew with other | ||||||
components? What are the guarantees? Make sure this is in the test plan. | ||||||
Following the graduation of server-side field validation to GA, there will be a | ||||||
period of time when a newer client (which expects server-side field validation locked in GA) will send | ||||||
requests to an older server that could have server-side field validation | ||||||
disabled. Until version skew makes it incompatible for a client to hit a server | ||||||
with field validation disabled, kubectl will need to continue to have | ||||||
client-side validation available. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The implications in this sentence seem incorrect.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks, updated it. |
||||||
|
||||||
Consider the following in developing a version skew strategy for this | ||||||
enhancement: | ||||||
- Does this enhancement involve coordinating behavior in the control plane and | ||||||
in the kubelet? How does an n-2 kubelet without this feature available behave | ||||||
when this feature is used? | ||||||
- Will any other components on the node change? For example, changes to CSI, | ||||||
CRI or CNI may require updating that component before the kubelet. | ||||||
--> | ||||||
After this skew (2-3 releases), client-side | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why 2-3? Officially, kubectl is only supposed to support 1 version skew. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if the cost is low, I'd be open to keeping the client-side validation in place until 1.26 goes out of support, even if it's not strictly required There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. though... server-side validation has been on by default since 1.25, and I can't find evidence anyone has disabled it, so dropping client-side validation in 1.28 would be 1-version-skew-compatible with GA in 1.27 and practically would mean all OSS-supported versions as of August 2023 (1.25+) would have server-side validation enabled by default. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, as I mentioned below, there are two things:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
August 2023 1.28 is estimated to release in August 2023, at which point 1.25+ will be still in support :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've attempted to summarize this here. Let me know if it should be worded differently or if I should add anything about potentitally keeping it until 1.26 goes out of support |
||||||
validation will be removed from kubectl entirely. (See section on [Out-of-Tree | ||||||
Alternatives to Client Side | ||||||
Validation](#out-of-tree-alternatives-to-client-side-validation) for | ||||||
alternatives to client-side validation) | ||||||
|
||||||
## Production Readiness Review Questionnaire | ||||||
|
||||||
|
@@ -749,6 +747,7 @@ Pick one of these and delete the rest. | |||||
- [x] Feature gate (also fill in values in `kep.yaml`) | ||||||
- Feature gate name: ServerSideFieldValidation | ||||||
- Components depending on the feature gate: kube-apiserver | ||||||
- Note: feature gate removed upon graduation to GA (1.27) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecation
we lock to true and give 2 releases for folks to update their invocations to drop explicitly setting the gate |
||||||
- [x] Other | ||||||
- Describe the mechanism: query parameter | ||||||
- Will enabling / disabling the feature require downtime of the control | ||||||
|
@@ -851,8 +850,9 @@ Recall that end users cannot usually observe component logs or access metrics. | |||||
|
||||||
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? | ||||||
|
||||||
Users should expect to see an increase in request latency and memory usage | ||||||
(~20-25% increase) for requests that desire server side schema validation. | ||||||
Users should expect to see an increase in request latency (~5% increase) and memory usage | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Side note, can you remind me where the benchmarks are for that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The benchmarks are here and the analysis of it is here kubernetes/kubernetes#107848 (comment) Updated the kep to more accurately reflect the findings of the benchmarking |
||||||
(~10% increase) for requests that desire server side schema validation (json and | ||||||
yaml only, no change for protobuf). | ||||||
|
||||||
Cluster operators can also use cpu usage monitoring to determine whether | ||||||
increased usage of server-side schema validation dramatically increases CPU usage after feature enablement. | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,26 +18,26 @@ see-also: | |
replaces: | ||
|
||
# The target maturity stage in the current dev cycle for this KEP. | ||
stage: beta | ||
stage: stable | ||
|
||
# The most recent milestone for which work toward delivery of this KEP has been | ||
# done. This can be the current (upcoming) milestone, if it is being actively | ||
# worked on. | ||
latest-milestone: "v1.25" | ||
latest-milestone: "v1.27" | ||
|
||
# The milestone at which this feature was, or is targeted to be, at each stage. | ||
milestone: | ||
alpha: "v1.23" | ||
beta: "v1.25" | ||
stable: "v1.26" | ||
stable: "v1.27" | ||
|
||
# The following PRR answers are required at alpha release | ||
# List the feature gate name and the components for which it must be enabled | ||
feature-gates: | ||
- name: ServerSideFieldValidation | ||
components: | ||
- kube-apiserver | ||
disable-supported: true | ||
disable-supported: false | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this meant "could the gate be disabled during feature development", not "can the gate be disabled when locked to true at GA" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gotcha, I've reverted this for now. It looks like based on this one example we remove the feature gate entirely once it's GA. This one did it three releases after going GA, but I can't tell if that wait is required, or if we can just remove the gate in the release that it goes GA. Do you know? If so, I imagine we can just keep the gate in the kep.yaml file but adda comment that the gate was removed in release 1.x like I've done here? |
||
|
||
# The following PRR answers are required at beta release | ||
metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair, we might want to be a little more involved with that project. We've done a lot of work with regards to performance and openapi v3 that could benefit that project. I guess what I'm saying is that your sentence makes it sound like we just have to use kubeconform and that's it, but there is actually quite some work to do there to have parity with what we have today, but I think there is also a lot of potential to do a much better job than what we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded to hopefully capture what you are saying.