-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRD's openAPIV3Schema field is ignored #2044
Comments
I think this is working as intended unfortunately. Pre 1.15, the feature is in alpha and requires a feature gate. Check out the schema field description in 1.14:
I tested by starting a 1.14 cluster with the feature gate ( Unfortunately in GKE there's no 1.15 support yet, and you cannot enable alpha feature gates in production clusters. Question:Do we think there is value in maintaining the v1alpha1 versions now? I think the initial motivation was to explore actually implementing multiple versions. BUT the operator still can't actually process v1alpha1 and v1beta1 CRs without a webhook conversion, which we haven't implemented and which we can't enable on our main testing platform at the moment. We basically support v1alpha1 in name only. Until 1.15 is more widely supported, I think it makes sense to only support a single CRD version. I would be in favor of removing v1alpha1 and cutting a new release. In the future when both we are GA and when webhook conversions are enabled in more clusters I think we could revisit supporting multiple versions (though hopefully we got the specs mostly right and that's not necessary). If we remove the v1alpha1 spec and regenerate CRDs, we can have openapi validation again and we can all rejoice. |
Oh, so |
Trying to summarize what's in the API spec:
Our current documentation says:
I think we should have instead the following choices:
Since almost nobody uses 1.16 yet, I think it's fine to keep only the first choice. What I want to verify is that using the top-level validation field while having some resources in |
I tested patching our generated CRD to move I don't think we need to ship the trivial CRD anymore? Which means we don't ship v1alpha1 validation with our v1beta1, but what's the point anyway? We still need to list |
I think there might be a bug in controller-gen: kubernetes-sigs/controller-tools#349 |
Trying to summarize our options from some out of band discussion, if I missed something or didn't describe it accurately please correct me. Leave as isCurrently our CRD includes both v1alpha1 and v1beta1 versions, but validation for neither. Users can submit whatever they want. The api server assumes all versions are interchangeable. If a beta operator receives an object that it cannot parse -- such as a valid v1alpha1 object with the old secure settings format -- it will choke and not proceed with parsing anything. As far as I know that is the only field that causes that though. It is debuggable by looking at the operator logs, but not easily. Leaving it as is allows people to have v1alpha1 resources and v1beta1 resources in the same cluster. The intent was for the operator to not operate on v1alpha1, and just skip it when it parses the annotation with the old controller version. If they are using secure settings though, the operator breaks in a hard to debug way. If they are not, then it is working as intended and the operator skips the older resources. Leaving it as is also means it is easy for people to get into this in other ways though -- there is no validation for anything, so you can do something silly like Keep v1alpha1, update CRD to use v1beta1 validation for everythingIf Seb's PR is approved, we could regenerate CRDs so that we do have v1beta1 validation on admission. This means that existing v1alpha1 objects can stay, but they cannot be updated unless they are updated in a fashion to match the v1beta1 spec. So say if someone has a separate namespace for 0.9 and v1alpha1 clusters, the operator (and the user) will not be able to update the object any longer. But it will still exist as long as it can without operator intervention. We still have the same issue where if a beta operator encounters an older v1alpha1 object that it cannot parse (say it was admitted before validation was enabled), it will choke on it. This is pretty close to what our desired behavior was -- in most cases, existing objects will be ignored by the operator and new objects will work as expected. In some cases an older object can cause the operator to fail hard though. If someone tries to say, copy a v1alpha1 spec and update it to v1beta1 and forgets to update the secureSettings field, it will be caught at at admission time though. So we only have to worry about already persisted objects, which mitigates the issue somewhat. Remove v1alpha1We could remove v1alpha1 from the CRD. This means that it is clear what the behavior is -- anything submitted with an older version will not be admitted, either because it is an unknown API version or it does not match validation. We also only need to ship one CRD rather than multiple flavors. To upgrade, users would need to delete existing v1alpha1 CRs before applying the new CRD. In k8s, all objects are supposed to be round trippable. E.g., if I'm listening for v2 MyResource, the API server can give me a v2 version of MyResource even if it is stored as v1 and if the user submitted it as v1 (and vice versa). Pre-conversion webhook, this was only correct for CRDs if the versions were different in name only (e.g. v1 spec == v2 spec), a no-op conversion. More detail in k8s docs here. Our versions are actually different, but the API server is treating it as a no-op conversion and sending us v1alpha1 specs when our operator requests v1beta1. We cannot enable webhook conversions in at least GKE right now, so no-op conversion is the only option available to us everywhere. So this gives us the "most correct" behavior, but is an upgrade path we were trying to avoid. Potential twistsRegardless of what we choose above, we could also remove v1alpha1 from the next release (as we deprecated it in this release), so we have the most "correct" behavior. But it does not necessarily need to be a rushed release just for that change. I also think regardless of what we do we need to update the documentation with troubleshooting and expected behavior in different k8s versions. RecommendationPersonally I think I am in favor of regenerating CRDs once Seb's PR is merged, which gets us v1beta1 validation. We can update the docs for the secure settings case to make it clear what is happening. It should only happen on existing resources -- anything newly created that will cause the operator to choke will be rejected -- so I think that is an okay trade off. In the next larger release we should remove the v1alpha1 as existing users have had a release to upgrade their CRs. If the Kubebuilder folks reject Seb's PR I may revise my opinion depending on their reasoning. |
We discussed this on Zoom and agreed on removing v1alpha1 (@anyasabo's approach) since that what we want to do for the next release anyway. |
To reproduce:
config/crds
kubectl get crd elasticsearches.elasticsearch.k8s.elastic.co -o json
The openAPIV3Schema field from the CRD is dropped in the stored CRD (but present in the kubectl annotation).
Looking at https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/#create-a-customresourcedefinition, it looks like we're using the
v1
schema of the CRD where the openAPIV3Schema field is under a particular version, but the overall CRD is defined using thev1beta1
version.As a result, we don't have any validation performed on our Elasticsearch resource.
Which is probably the cause for #2039.
The text was updated successfully, but these errors were encountered: