Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Allow openapi references in CRD validation schema #54579

Closed
mtaufen opened this issue Oct 25, 2017 · 49 comments
Closed

Feature Request: Allow openapi references in CRD validation schema #54579

mtaufen opened this issue Oct 25, 2017 · 49 comments
Assignees
Labels
area/api Indicates an issue on api area. area/custom-resources kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/multicluster Categorizes an issue or PR as relevant to SIG Multicluster.

Comments

@mtaufen
Copy link
Contributor

mtaufen commented Oct 25, 2017

Offline chat with @mbohlool: CRD validation schema doesn't currently support references. You can get around this by specifying all of the types inline, but it's extraordinarily verbose (and prone to typos).

@mtaufen mtaufen added area/api Indicates an issue on api area. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Oct 25, 2017
@nikhita
Copy link
Member

nikhita commented Oct 26, 2017

/area custom-resources
/cc @sttts @deads2k @enisoc @colemickens

@sttts
Copy link
Contributor

sttts commented Oct 26, 2017

We were chatting about this idea before. Basically, we need a reference namespace for our API groups, something like apps.k8s.io/v1.StateFulSetSpec (whatever the actual syntax is).

One note about this: the apiextensions apiserver is not coupled to kube-apiserver right now and shouldn't be in the future. How does apiextensionsapiserver know about the specs of the types of other groups? The aggregrator is the only one who knows how to resolve such a reference correctly or at least it know which downstream server to ask.

In other words, getting the information flow right is not so trivial.

@mtaufen
Copy link
Contributor Author

mtaufen commented Oct 26, 2017 via email

@enisoc
Copy link
Member

enisoc commented Oct 26, 2017

Could we have apiextensions-apiserver download OpenAPI from kube-apiserver, like kubectl does?

@mbohlool
Copy link
Contributor

mbohlool commented Oct 26, 2017

/cc @cheftako @mml @lavalamp

@sttts
Copy link
Contributor

sttts commented Oct 26, 2017

Could we have apiextensions-apiserver download OpenAPI from kube-apiserver, like kubectl does?

Yes, something like that which loosely couples those two.

@lavalamp
Copy link
Member

I am concerned about giving ourselves really hard problems when we do version updates that drop a previously deprecated field.

@sttts
Copy link
Contributor

sttts commented Oct 27, 2017

I am concerned about giving ourselves really hard problems when we do version updates that drop a previously deprecated field.

We cannot express in our CRD JSON Schema that certain fields do not exist (following the kube API conventions). So a CR cannot become invalid because a deprecated field was dropped. But as we store raw JSON for CRs, those removed fields would continue to exist in old CRs.

@lavalamp
Copy link
Member

CRD embeds v1.Pod. (time passes) we deprecate v1.Pod in favor of v2.Pod (...much later...) We remove v1.Pod. Suddenly the entire system loses the ability to interpret the CRD that embedds the v1.Pod object.

@sttts
Copy link
Contributor

sttts commented Oct 27, 2017

At least it won't validate it anymore. Deletion is not a problem.

@sttts
Copy link
Contributor

sttts commented Oct 27, 2017

Which also means that we need a migration early enough, or more precisely the CRD author has to take care of this. Also with native types: if you miss the point in time to migrate, the objects are lost.

@nikhita
Copy link
Member

nikhita commented Oct 29, 2017

/cc @sdminonne

@sdminonne
Copy link
Contributor

sdminonne commented Oct 30, 2017

@lavalamp, personally I agree with @sttts, my understanding is that a coupling problem does exist but it's totally on the CRD creation/handling side. A CR/CRD is explicitly created with a reference to a native object. If the object does not exist anymore, it's a user error.

@nikhita thanks
@clamoriniere1A FYI

@lavalamp
Copy link
Member

lavalamp commented Nov 7, 2017

It's not about new objects, I agree those will be fine. It is about stored objects.

@sdminonne
Copy link
Contributor

OK, I didn't get it. Thanks for the clarification.

@nikhita
Copy link
Member

nikhita commented Jan 17, 2018

/kind feature

@erictune
Copy link
Member

erictune commented Mar 1, 2018

We should not support this.

There should not be references to schemas that cross two separately released components (the core and the software that has the CRD). This will result in unexpected addition or deletion (think rollback of server version) of fields in the CRD schema. It also makes it hard for the controller to determine if objects it previously created are what it intended to create, since they change unexpectedly.

Most CRDs I've seen do not typically embed a pod spec. They only have select fields in them.
When they do embed a pod spec, it is usually optional and not the common case.
I think it is better to give up validation in uncommon cases than to couple unrelated systems.

@markmandel
Copy link
Contributor

Most CRDs I've seen do not typically embed a pod spec.

Apologies, but I'm working on a project right now that embeds a PodSpecTemplate. From my experience this is not that uncommon an occurrence. Sounds like something we might need more data on.

Not saying that refs are the absolute best answer, but from my perspective there are definitely projects that will need this type of functionality.

@enisoc
Copy link
Member

enisoc commented Mar 3, 2018

Just to add another aspect to consider:

So far the only proposal for supporting server-side apply (and/or strategic merge patch) on resources served through CRD is to put the necessary schema info in OpenAPI. Without the ability to directly pull in the schema for a PodTemplateSpec, you would have to recursively copy in all those schemas into your own, if you want server-side apply to work correctly for your resource.

Maybe we could consider some tooling to expand references inline, which would then be checked in with the CRD manifest. That wouldn't be my ideal user experience, although it would get a little better if we at least had the ability to break down a schema into definitions that can be referenced from within the same schema.

@ant31
Copy link
Member

ant31 commented Mar 5, 2018

Maybe we could consider some tooling to expand references inline

In case it helps

I'm a maintainer of prometheus-operator, its CRD uses many of the Kubernetes API specifications.
To add CRD validation, we developed a small OpenAPI generator to inline all references.
output example: https://github.com/coreos/prometheus-operator/blob/master/example/prometheus-operator-crd/prometheus.crd.yaml

To use it:

crd.Spec.validation: GetCustomResourceValidation(config.SpecDefinitionName, config.GetOpenAPIDefinitions)

That wouldn't be my ideal user experience

It's not a terrible one

@enisoc
Copy link
Member

enisoc commented Mar 5, 2018

@ant31 That's awesome!

+cc @pwittrock Have you considered this approach for kube-builder?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 21, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nick4fake
Copy link

I am sorry for sending notification to everyone involved in this thread but I simply wanted to note another aspect: same-file referencing.
What do you think about allowing referencing at least definitions in the same file?

@rjeberhard
Copy link

While migrating from CRD v1beta1 to v1, this issue comes back up... We used the same technique as described above by the maintainer of the Prometheus operator to generate and inline the schema for the built-in types that we reference; however, this makes for very large schema sections. This was acceptable when all versions had to share a single, backward compatible schema, but since CRD v1 wants the schema for each version, even if they are the same, this quickly leads to a CRD that is too big to create.

Has there been any momentum on this discussion? Or, have I misunderstood that CRD v1 can still allow for a single definition of a shared schema?

Note: x-kubernetes-embedded-resource isn't useful because few of the types we need to reference are standalone resources, e.g. we want to allow customers to specify additional volumes for the pods our operator will create.

@erictune
Copy link
Member

I acknowledge this is an issue and will look into it.

@rjeberhard
Can you point at an example of your types.go files that ends up being too large?

@rjeberhard
Copy link

Sorry for the delay. Our operator isn't written in Go, but instead in Java using the standard client. Also, I discovered that Kubernetes will accept the large CRD if I use kubectl create or replace rather than apply. Therefore, I'm not blocked. Finally, all of the standard types we reference are for the use case where a customer needs to specify additional content for the pods our operator will create, e.g. resources or additional volumes. While it would be simpler for me to be able to reference standard types, I'm going to explore if we could change to having the customer specify a pod template.

@lavalamp
Copy link
Member

Serverside apply doesn't use an annotation and wouldn't hit this limit.

@rajasaur
Copy link

rajasaur commented Jun 7, 2020

How would this work for recursive references? We have a case where a property references a type that was previously defined in the same document.

@lavalamp
Copy link
Member

lavalamp commented Jun 8, 2020

References to things included in the type definition already seem safe. @sttts or @jpbetz might have thoughts?

juliogreff added a commit to DataDog/extendeddaemonset that referenced this issue Jun 11, 2020
It would be nice for a user to know that an EDS resource they're trying
to create is invalid. There are already several validations in the CRDs'
definition, but it's still missing some, like in
.spec.template.metadata. When a resource with invalid metadata (like
annotations or labels where the value is an int instead of a string,
which is common) is created, it will break the informer since it's no
longer able to de-serialize it into the Go structs, throwing the
following error:

```
E0520 14:25:07.400582       1 reflector.go:123]
sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204:
Failed to list *v1alpha1.ExtendedDaemonSet:
v1alpha1.ExtendedDaemonSetList.Items: []v1alpha1.ExtendedDaemonSet:
v1alpha1.ExtendedDaemonSet.Spec:
v1alpha1.ExtendedDaemonSetSpec.Template: v1.PodTemplateSpec.ObjectMeta:
v1.ObjectMeta.CreationTimestamp: Annotations: ReadString: expects " or
n, but found 1, error found in #10 byte of ...|rollout":1},"creatio|...,
bigger context
...|e-eu1.staging.dog","wheelofmisery/force-rollout":1},"creationTimestamp":null,"labels":{"app":"datado|...
```

To avoid that, this commit expands the validation to also check the
metadata more thoroughly. The actual validation inlined here comes from
K8s OpenAPI specs for a PodSpec. Currently there is no way to reference
that, so unfortunately we have to inline it (see
kubernetes/kubernetes#54579).

Now, when a user tries to create an invalid resource, they'll be stopped
with an error message:

```
$ kubectl apply -f invalid-foo-eds.yml
The ExtendedDaemonSet "foo" is invalid:
spec.template.metadata.annotations.rollout: Invalid value: "integer":
spec.template.metadata.annotations.rollout in body must be of type
string: "integer"
```
juliogreff added a commit to DataDog/extendeddaemonset that referenced this issue Jun 11, 2020
It would be nice for a user to know that an EDS resource they're trying
to create is invalid. There are already several validations in the CRDs'
definition, but it's still missing some, like in
.spec.template.metadata. When a resource with invalid metadata (like
annotations or labels where the value is an int instead of a string,
which is common) is created, it will break the informer since it's no
longer able to de-serialize it into the Go structs, throwing the
following error:

```
E0520 14:25:07.400582       1 reflector.go:123]
sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204:
Failed to list *v1alpha1.ExtendedDaemonSet:
v1alpha1.ExtendedDaemonSetList.Items: []v1alpha1.ExtendedDaemonSet:
v1alpha1.ExtendedDaemonSet.Spec:
v1alpha1.ExtendedDaemonSetSpec.Template: v1.PodTemplateSpec.ObjectMeta:
v1.ObjectMeta.CreationTimestamp: Annotations: ReadString: expects " or
n, but found 1, error found in #10 byte of ...|rollout":1},"creatio|...,
bigger context
...|e-eu1.staging.dog","wheelofmisery/force-rollout":1},"creationTimestamp":null,"labels":{"app":"datado|...
```

To avoid that, this commit expands the validation to also check the
metadata more thoroughly. The actual validation inlined here comes from
K8s OpenAPI specs for a PodSpec. Currently there is no way to reference
that, so unfortunately we have to inline it (see
kubernetes/kubernetes#54579).

Now, when a user tries to create an invalid resource, they'll be stopped
with an error message:

```
$ kubectl apply -f invalid-foo-eds.yml
The ExtendedDaemonSet "foo" is invalid:
spec.template.metadata.annotations.rollout: Invalid value: "integer":
spec.template.metadata.annotations.rollout in body must be of type
string: "integer"
```
juliogreff added a commit to DataDog/extendeddaemonset that referenced this issue Jun 11, 2020
It would be nice for a user to know that an EDS resource they're trying
to create is invalid. There are already several validations in the CRDs'
definition, but it's still missing some, like in
.spec.template.metadata. When a resource with invalid metadata (like
annotations or labels where the value is an int instead of a string,
which is common) is created, it will break the informer since it's no
longer able to de-serialize it into the Go structs, throwing the
following error:

```
E0520 14:25:07.400582       1 reflector.go:123]
sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204:
Failed to list *v1alpha1.ExtendedDaemonSet:
v1alpha1.ExtendedDaemonSetList.Items: []v1alpha1.ExtendedDaemonSet:
v1alpha1.ExtendedDaemonSet.Spec:
v1alpha1.ExtendedDaemonSetSpec.Template: v1.PodTemplateSpec.ObjectMeta:
v1.ObjectMeta.CreationTimestamp: Annotations: ReadString: expects " or
n, but found 1, error found in #10 byte of ...|rollout":1},"creatio|...,
bigger context
...|e-eu1.staging.dog","wheelofmisery/force-rollout":1},"creationTimestamp":null,"labels":{"app":"datado|...
```

To avoid that, this commit expands the validation to also check the
metadata more thoroughly. The actual validation inlined here comes from
K8s OpenAPI specs for a PodSpec. Currently there is no way to reference
that, so unfortunately we have to inline it (see
kubernetes/kubernetes#54579).

Now, when a user tries to create an invalid resource, they'll be stopped
with an error message:

```
$ kubectl apply -f invalid-foo-eds.yml
The ExtendedDaemonSet "foo" is invalid:
spec.template.metadata.annotations.rollout: Invalid value: "integer":
spec.template.metadata.annotations.rollout in body must be of type
string: "integer"
```
@drpmma
Copy link

drpmma commented May 12, 2021

References to things included in the type definition already seem safe. @sttts or @jpbetz might have thoughts?

Is there any update? same issue here.

@james-mchugh
Copy link

What's the recommended solution for this in 2024? I have a CRD that uses a podTemplateSpec. I attempted to pull the OpenAPI spec for the core API, inline all references, and then use its podTemplateSpec definition in my CRD, but that did not work. I got various validation errors when trying to apply the CRD. I've been trying to resolve the errors one-by-one, but I'm hoping there is a better way to do this.

@rjeberhard
Copy link

@james-mchugh, just responding to let you know that I've not found a better solution. We're using the Java client, so we generate schemas that pull in standard types by building a large CRD with schema that matches the standard types by walking the types and then accounting for special cases like IntOrString.

It's a pain...

@AlexanderYastrebov
Copy link
Contributor

We used the same technique as described above by the maintainer of the Prometheus operator to generate and inline the schema for the built-in types that we reference; however, this makes for very large schema sections.

We trimmed descriptions to stay below annotation size limits, see zalando-incubator/stackset-controller#498

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/custom-resources kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/multicluster Categorizes an issue or PR as relevant to SIG Multicluster.
Projects
No open projects
Development

No branches or pull requests