Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2885 Field Validation 1.27 GA graduation #3725

Merged
merged 3 commits into from
Jan 24, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions keps/prod-readiness/sig-api-machinery/2885.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ alpha:
approver: "@deads2k"
beta:
approver: "@deads2k"
stable:
approver: "@deads2k"
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@ tags, and then generate with `hack/update-toc.sh`.
- [Graduation Criteria](#graduation-criteria)
- [Alpha](#alpha)
- [Beta](#beta)
- [GA](#ga)
- [Version Skew Strategy](#version-skew-strategy)
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
Expand Down Expand Up @@ -313,11 +315,9 @@ client-side validation, albeit one that is error-prone and not officially
supported).

Long-term, we want to favor using out-of-tree solutions for client-side
validation, though this idea is still in its infancy.

The [kubeval](https://www.kubeval.com/) project is an example of an out-of-tree solution that does this, and
we will look into expanding its support of open API to v3, and investigate
whether it makes sense as a permanent solution to client-side validation.
validation. The [kubeconform](https://github.com/yannh/kubeconform) project is an example of an out-of-tree solution that does this, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, we might want to be a little more involved with that project. We've done a lot of work with regards to performance and openapi v3 that could benefit that project. I guess what I'm saying is that your sentence makes it sound like we just have to use kubeconform and that's it, but there is actually quite some work to do there to have parity with what we have today, but I think there is also a lot of potential to do a much better job than what we have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded to hopefully capture what you are saying.

we recommend using this or similar tools to validate manifests offline going
forward.

##### Aligning json and yaml errors

Expand Down Expand Up @@ -615,6 +615,11 @@ It tests the cross product of all valid permutations along the dimensions of:
With field validation on by default in beta, we will modify
[test/e2e/kubectl/kubectl.go](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/kubectl/kubectl.go) to ensure that kubectl defaults to using server side field validation and detects unknown/duplicate fields as expected.

[GA]
We will introduce field validation specific e2e/conformance tests to submit
requests directly against the API server for both built-in and custom resources
to test that duplicate and unknown fields are appropriately detected.

### Graduation Criteria
<!--
**Note:** *Not required until targeted at a release.*
Expand Down Expand Up @@ -655,29 +660,20 @@ Below are some examples to consider, in addition to the aforementioned [maturity
- [x] field validation integration tests check for exact match of strict errors
- [x] In tree NestedObjectDecoders no longer short circuit on strict decoding
errors [#107545](https://github.com/kubernetes/kubernetes/issues/107545)
- [ ] Unknown/Duplicate fields are properly detected in the metadata at both the
- [x] Unknown/Duplicate fields are properly detected in the metadata at both the
root level and within embedded objects
[#109215](https://github.com/kubernetes/kubernetes/issues/109215),[#109316](https://github.com/kubernetes/kubernetes/pull/109316), and
[#109494](https://github.com/kubernetes/kubernetes/pull/109494)


<!--
#### GA

- N examples of real-world usage
- N installs
- More rigorous forms of testing—e.g., downgrade tests and scalability tests
- Allowing time for feedback

**Note:** Generally we also wait at least two releases between beta and
GA/stable, because there's no opportunity for user feedback, or even bug reports,
in back-to-back releases.

**For non-optional features moving to GA, the graduation criteria must include
[conformance tests].**

[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
- [x] Wait two releases (1.25 and 1.26) with field validation enabled by default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good. I think in practice it's still very early to say anything about 1.26, but it's probably in the process of being rolled-out in different places. In the meantime, we must have some data for 1.25. Do we have metrics about how this is going? Can we know if it's working properly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a data point, we (GKE) are seeing 1.25 API servers use less CPU and memory overall in the fleet than 1.24 API servers did at the same point in the 1.24 rollout. That at least means that any added cost of this feature was offset by other improvements in 1.25 in our use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've summarized your comment inline. Let me know if there's a better place to put it or if we want/are able to provide more concrete metrics, to further support the claim I'm making that existing releases are not seeing issues in the wild.

and receive minimal bug reports pertaining to the feature.
- [] More rigorous e2e/conformance testing added to invoke field validation directly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe drop the "more rigorous" bit, since the new test will not actually exercise anything that is currently untested, it will just do it directly and in conformance

Suggested change
- [] More rigorous e2e/conformance testing added to invoke field validation directly
- [] Add conformance testing to invoke field validation directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

against the API server

<!--
#### Deprecation

- Announce deprecation and support policy of the existing flag
Expand All @@ -696,20 +692,22 @@ enhancement:
cluster required to make on upgrade, in order to maintain previous behavior?
- What changes (in invocations, configurations, API use, etc.) is an existing
cluster required to make on upgrade, in order to make use of the enhancement?
-->

### Version Skew Strategy

If applicable, how will the component handle version skew with other
components? What are the guarantees? Make sure this is in the test plan.
Following the graduation of server-side field validation to GA, there will be a
period of time when a newer client (which expects server-side field validation locked in GA) will send
requests to an older server that could have server-side field validation
disabled. Until version skew makes it incompatible for a client to hit a server
with field validation disabled, kubectl will need to continue to have
client-side validation available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implications in this sentence seem incorrect.

Until version skew makes it incompatible for a client to hit a server
with field validation disabled, kubectl will need to continue to have
client-side validation available.

  1. Until version skew makes it incompatible for a client to hit a server with field validation disabled, kubectl MUST continue to check if the feature is enabled server-side.
  2. As long as the feature can be disabled server-side, kubectl MUST (SHOULD?) continue to have client-side validation
  3. When the feature can't be disabled server-side, kubectl SHOULD (MAY?) remove client-side validation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated it.


Consider the following in developing a version skew strategy for this
enhancement:
- Does this enhancement involve coordinating behavior in the control plane and
in the kubelet? How does an n-2 kubelet without this feature available behave
when this feature is used?
- Will any other components on the node change? For example, changes to CSI,
CRI or CNI may require updating that component before the kubelet.
-->
After this skew (2-3 releases), client-side
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 2-3? Officially, kubectl is only supposed to support 1 version skew.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the cost is low, I'd be open to keeping the client-side validation in place until 1.26 goes out of support, even if it's not strictly required

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though... server-side validation has been on by default since 1.25, and I can't find evidence anyone has disabled it, so dropping client-side validation in 1.28 would be 1-version-skew-compatible with GA in 1.27 and practically would mean all OSS-supported versions as of August 2023 (1.25+) would have server-side validation enabled by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

August 20232022

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as I mentioned below, there are two things:

  1. do we stop checking if server-side strict validation is enabled on the server (I'd like that to stop as early as possible because it forces a pull on the openapi every single time).
  2. do we stop support client-side validation at all? That is probably up to sig-cli to decide whether they are happy with the alternatives. That still requires a pull on openapi v2 but would only happen when someone specifically asks for it (which should be much more occasional).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all OSS-supported versions as of August 2023 (1.25+)

August 20232022

August 20232022 :)

1.28 is estimated to release in August 2023, at which point 1.25+ will be still in support :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've attempted to summarize this here. Let me know if it should be worded differently or if I should add anything about potentitally keeping it until 1.26 goes out of support

validation will be removed from kubectl entirely. (See section on [Out-of-Tree
Alternatives to Client Side
Validation](#out-of-tree-alternatives-to-client-side-validation) for
alternatives to client-side validation)

## Production Readiness Review Questionnaire

Expand Down Expand Up @@ -749,6 +747,7 @@ Pick one of these and delete the rest.
- [x] Feature gate (also fill in values in `kep.yaml`)
- Feature gate name: ServerSideFieldValidation
- Components depending on the feature gate: kube-apiserver
- Note: feature gate removed upon graduation to GA (1.27)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Note: feature gate removed upon graduation to GA (1.27)
- Note: feature gate locked to true upon graduation to GA (1.27) and removed two releases later (1.29)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecation

Rule #8: Feature gates must be deprecated when the corresponding feature they control transitions a lifecycle stage as follows. Feature gates must function for no less than:

  • Beta feature to GA: 6 months or 2 releases (whichever is longer)
    ...

we lock to true and give 2 releases for folks to update their invocations to drop explicitly setting the gate

- [x] Other
- Describe the mechanism: query parameter
- Will enabling / disabling the feature require downtime of the control
Expand Down Expand Up @@ -851,8 +850,9 @@ Recall that end users cannot usually observe component logs or access metrics.

###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?

Users should expect to see an increase in request latency and memory usage
(~20-25% increase) for requests that desire server side schema validation.
Users should expect to see an increase in request latency (~5% increase) and memory usage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, can you remind me where the benchmarks are for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmarks are here
https://github.com/kubernetes/kubernetes/blob/master/test/integration/apiserver/field_validation_test.go#L2936-L2996

and the analysis of it is here kubernetes/kubernetes#107848 (comment)

Updated the kep to more accurately reflect the findings of the benchmarking

(~10% increase) for requests that desire server side schema validation (json and
yaml only, no change for protobuf).

Cluster operators can also use cpu usage monitoring to determine whether
increased usage of server-side schema validation dramatically increases CPU usage after feature enablement.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,26 +18,26 @@ see-also:
replaces:

# The target maturity stage in the current dev cycle for this KEP.
stage: beta
stage: stable

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.25"
latest-milestone: "v1.27"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.23"
beta: "v1.25"
stable: "v1.26"
stable: "v1.27"

# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
feature-gates:
- name: ServerSideFieldValidation
components:
- kube-apiserver
disable-supported: true
disable-supported: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this meant "could the gate be disabled during feature development", not "can the gate be disabled when locked to true at GA"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I've reverted this for now.

It looks like based on this one example we remove the feature gate entirely once it's GA. This one did it three releases after going GA, but I can't tell if that wait is required, or if we can just remove the gate in the release that it goes GA. Do you know?

If so, I imagine we can just keep the gate in the kep.yaml file but adda comment that the gate was removed in release 1.x like I've done here?


# The following PRR answers are required at beta release
metrics: