Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large CRDs go over size limits (e.g. those with embedded podspecs) #82292

Open
DirectXMan12 opened this issue Sep 3, 2019 · 16 comments
Open
Assignees
Labels
area/custom-resources kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@DirectXMan12
Copy link
Contributor

DirectXMan12 commented Sep 3, 2019

Over in KubeBuilder land, we've started to see issues around very large validation blocks going over size limits. Thus far, it's mainly just been the client-size-apply-induced annotation limit, but I worry that when we start getting multiple versions we might go over the 1M limit. For instance, see kubernetes-sigs/kubebuilder#962, which has single-version 700k CRD, due to the large validation schema.

So far, we've mostly been able to solve the issues by partially or fully truncating the field descriptions, but this seems suboptimal, since you're basically saying "you don't get API docs now".

From what I've seen so far, the issues are usually hit with things like PodSpec (e.g. we hit the client-side-apply annotation limit with our conversion of CronJob in our tutorial when we introduced a new version).

Worst comes to worst, we can add more pruning in controller-tools/kubebuilder, but I was wondering if some folks had better ideas or more discussion upstream.

Refs (#62872) could help alleviate this a bit in the case of multiple podspecs, but don't solve the problem entirely (unless we get cross-object refs, which have previously been rejected).

Increasing the object size avoids the issue we haven't hit yet, but won't solve the client-side-apply annotation limit issue. Practically even though SSA will get here eventually, folks are going to be using pre-SSA kubectls for a while, I expect.

TL;DR: Pod spec validation makes CRDs large, any suggestions?

/sig api-machinery

@DirectXMan12 DirectXMan12 added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 3, 2019
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 3, 2019
@DirectXMan12
Copy link
Contributor Author

/sig api-machinery

@liggitt liggitt added this to Proposed features in Custom Resource Definitions Sep 3, 2019
@liggitt liggitt added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Sep 3, 2019
@yue9944882
Copy link
Member

/cc

@yue9944882
Copy link
Member

@sttts we can probably lessen the CRD size by supporting $ref declarative from the OpenAPI schema..

@fedebongio
Copy link
Contributor

/assign @liggitt

@yliaog
Copy link
Contributor

yliaog commented Sep 5, 2019

/cc

@jpbetz
Copy link
Contributor

jpbetz commented Sep 24, 2019

/assign

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 23, 2019
@DirectXMan12
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2020
@liggitt liggitt added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Feb 4, 2020
@erictune
Copy link
Member

The $ref option seems like it will help with the problem of including pod templates (while also changing the behavior of the CRD when the core kubernetes version is upgraded).

It doesn't seem like it would help with the problem where descriptions are simply too long, but where you want to include the rest of the

Since $ref replaces all the properties at its level, you can't replace the value of a description tag with a URL (which would be convenient). The $ref could point to all the levels, but then the user's authoring experience in GoIDL is compromised, and the API Server is obliged to fetch all the $ref links.

As a hack, verbose descriptions could be by convention replaced with short links, like this:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
# ...
spec:
  versions:
  - name: v1beta1
    schema:
      openAPIV3Schema:
        properties:
          # ...
          spec:
            properties:
              replicas:
                description: |- 
                  How many replicas.  See http://bitly.com/w34kjkj. 

But, this isn't ideal for tools that need to further process the CRD like docs generators or the API Server itself when it converts them to /openapi/v2. When should they inline the linked-to content, and when should it be re-fetched, for example.

@DirectXMan12
Copy link
Contributor Author

The biggest

It doesn't seem like it would help with the problem where descriptions are simply too long

Generally, even with long descriptions, you don't start to hit the limit till you include other types, from what I've seen. Mainly, most folks haven't hit limits till they've involved core types and flattening (which increases duplication since re-used types aren't deduped, even inside the same CRD).

@erictune
Copy link
Member

Here is some data on the sizes of CRDs in the wild.

Size in Bytes
count(distinct group_kind) 5605
mean 7050
min 224
50% 973
90% 20765
95% 22292
99% 61044
99.9% 178471
max 219724

This data is from October of last year and covers 5606 different group/kinds.

@erictune
Copy link
Member

At the time I gathered that data, one of the largest CRDs was prometheus.monitoring.coreos.com, at a side of 199408 Bytes. Today it has grown to 346389 Bytes.

@erictune
Copy link
Member

Some other large CRDs are listed, and all use pod templates or parts of pod templates:

  • seldondeployments.machinelearning.seldon.io
    at 357214 bytes.
    • Uses several kinds of object templates, including HPA and pod.
  • cephosds.ceph.elemental.net at 372244 bytes
    • Uses podTemplate. The size of the pod template within Ceph for instance is most of the CRD (351887 / 372244 bytes).
  • Prometheus (see previous comment)
    • Does not use explicit pod template, but does include many parts of it.
    • Does use a volumeClaimTemplate

@erictune
Copy link
Member

Here is a possible workaround, which would be much simpler than introducing references, and could tide users over until SSA is widely available.

It would be a comment tag that controller-gen would recognize and it would recursively strip comments from a message. So you would do something like:

// FooSpec defines resource Foo.
type FooSpec struct {
  // An important knob with a very lengthy description 
  // because users won't be familiar with this property/
  ImportantKnob string `json:"importantKnob"`

  // Volumes to mount on the Foo's pods.
  // Subfields do not have descriptions.
  // For those please refer to https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#volume-v1-core 
  // The v1.17 volume specification is supported.
  // 
  // +kubebuilder:message:compactify=true
  Volumes []corev1.Volume `json:"volumes,omitempty"`
}

@erictune
Copy link
Member

Here is a size distribution for CRDs that don't include pod templates or volume templates.
(Number of CRDs=5519)

      Size (B)
mean     5791
min       224
50%       913
60%      1454
70%      3007
80%     18907
90%     20589
95%     21713
99%     27617
99.9%   72960
max    171823

Here is the largest of those ( 171823B in my dataset, has grown to 270142B since ):
https://github.com/cilium/cilium/blob/master/examples/crds/ciliumnetworkpolicies.yaml

@nyarly
Copy link

nyarly commented Mar 26, 2021

The Strimzi Kafkas CRD, as of 0.22.1 (https://github.com/strimzi/strimzi-kafka-operator/releases) is 1,231,093 bytes. They're also, IIUC, generated from Java, so Go comment based solutions won't help.

@liggitt liggitt removed their assignment May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/custom-resources kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
Development

No branches or pull requests

10 participants