-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation Tightening is not Generally Possible #64841
Comments
a methodical way to do ratcheting validation would be ideal:
|
Yes, that's the mechanism I had in mind when I wrote the two bullet points in the original post. (also thanks for the extensive list of cross references) |
/cc @mbohlool |
/assign |
WIP in #64907 |
xref #64532 |
Sorry for not noticing this earlier. Tightening validation can break clients. I don't see much of a way around that. Special-casing updates protects the apiserver but not clients and users. It seems like the only recourse would be to downgrade to the prior release until the clients can be fixed. And downgrade is known to not work in the general case. Maybe we need an advisory mode that also applies to create, so that users can debug problems before they irreversibly hose their clusters? |
Changing any validation rules always has the potential of breaking some client, since it changes the assumptions about part of the API, similar to adding new enum values. Validation rules on spec fields can neither be relaxed nor strengthened. Strengthening cannot be permitted because any requests that previously worked must continue to work. Weakening validation has the potential to break other consumers and generators of the API resource. Status fields whose writers are under our control (e.g., written by non-pluggable controllers), may potentially tighten validation, since that would cause a subset of previously valid values to be observable by clients. |
The end of that should be: "to no longer be observable by clients" |
The list of cross references was useful. We should document how to handle common cases, such as duplicate list entries. It looks like maybe a few of them wouldn't have worked, but with asynchronous failures rather than synchronous ones. However, it's possible for cases we didn't intend to be supported, such as multiple instances of the same environment variable, to be working for someone. |
Other thoughts: We also need to demand validation tests that fail for every field that doesn't accept all values of the base type. This doesn't handle cases where contributors introduce new validation checks outside the ratcheting mechanism. What are we trying to defend against? The cluster admin will want to know if any of potential problems. We need to introduce exported variables that surface affected resource types. |
What are we protecting clients and users from? My intent with these was to only address validation gaps that were allowing in fatally flawed objects.
Is it a great loss to block a client that is submitting deployments/pods that can not succeed? Is the concern that such a client would not back off and would perpetually attempt to create the objects? Doesn't that risk already exist with dynamic reasons for denial (quota, authorization, webhook admission)?
what does that mean for an object which, if created, can not work properly, and will inevitably fail asynchronously (sometimes taking a long time to do so, as in the case of the duplicate PVC volume refs)? the fixes in the linked PR were intended to resolve problems of that variety. The only one I could see being questionable is the duplicate envvar fix. That seemed valuable to prevent because it is known to cause data loss when using apply to try to remove the extra envvar on an update, but I can pull that one out if we'd rather try to think of a different way to address that (since a pod with that duplication can still run successfully) |
Added to the SIG Arch backlog: https://github.com/kubernetes-sigs/architecture-tracking/projects/3 |
@liggitt I agree with respect to requests that were fatally flawed. However, we haven't always been correct when trying to identify such cases in the past. |
Example:
It looks like unknown emptyDir medium values succeed:
|
If that's the case, I agree it shouldn't be tightened. I was going off of #52936 (comment) |
Sorry, that's not the right check. So, maybe not: kubernetes/pkg/volume/empty_dir/empty_dir.go Line 217 in c4f3102
But, anyway, my point is sometimes these can unintentionally succeed |
Yeah, point taken. I put a hold on the PR until we can think through whether this approach is a net positive. If we end up moving forward with it, we'll need to be really clear about the types of scenarios where it is an appropriate tool to use. |
@liggitt One approach could be to write tests that demonstrate asynchronous failure and commit them before enabling new validation. I also think we need to start requiring tests for issues such as duplicate entries in associative lists ASAP. |
Along those lines, I actually don't think the integer memory requirement should be included. I can successfully start a pod with fractional memory requirements, which just get rounded when necessary. |
Would it be possible to implement a tighter validation as a validating webhook, that the cluster admin can decide to enable/disable? There could even be a way (annotation?) to ignore specific objects known to be broken. |
Good idea.
Yes. Validation in general tends to get little attention in new API PRs. |
There's another way validation changes can break clients. By e.g. permitting formerly disallowed field contents, e.g. a new character in a name. Clients that depended on the former limited expressiveness could be broken. This is similar to the problem that obtains when one adds an enum value. Detecting this in a fully general way is going to be hard to impossible, because the error happens after the client has done a read. The error may actually be very subtle and not cause a crash (imagine allowing |
Correct. #63362 (comment) are the types of questions we need to ask in cases like that.
At least with enums, we have the option (when defining the field for the first time) to specify what a client should do if it encounters an unknown enum value. |
Clients which ignore suggestions like that will still exist, so it doesn't seem like a solution to me. |
Yeah, it feels more like it's up to client to decide what to do with unknown enum values. At least for enum values they can expect that MAYBE it's going to change in the future and plan accordingly. For value validation, there isn't much they can do. |
|
/wg apply |
I have made this comment on a few PRs so maybe it's time to document and/or build a mechanism.
You cannot generally make validation tighter, because any existing objects that violate the new rule will not be able to be updated, and in many cases, deleted (because deletions involving finalizers require a sequence of updates).
This is because, to validate updates, we validate BOTH the transition, AND the final state of the object. It's this last part that is problematic.
It should be possible to address this by adding a new class of validation function that ONLY gets called on new objects, but then we have these problems:
Even at version boundaries it's hard to tighten validation, since objects have to be round-trippable and there are still old objects in the system.
So, new API authors: start out with restrictive validation; it's easier to relax than to tighten.
Existing API authors: prepare to write defensive clients.
/kind bug
/sig api-machinery
The text was updated successfully, but these errors were encountered: