Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-apiserver memory consumption during CRD creation #101755

Closed
alexarefev opened this issue May 6, 2021 · 15 comments · Fixed by #106181
Closed

kube-apiserver memory consumption during CRD creation #101755

alexarefev opened this issue May 6, 2021 · 15 comments · Fixed by #106181
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@alexarefev
Copy link

alexarefev commented May 6, 2021

What happened:

The CustomResourceDefinitions creation causes kube-apiserver high memory consumption. CRD has multiple versions.
Several creations during a short period of time may cause kube-apiserver restarting.

What you expected to happen:

kube-apiserver shouldn't consume an unreasonably huge amount of memory during processing objects such as CRD

How to reproduce it (as minimally and precisely as possible):

Apply yaml and watch kube-apiserver memory consumption:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.4.0
  creationTimestamp: null
  name: applications.deployment.nrm.netcracker.com
spec:
  group: deployment.nrm.netcracker.com
  names:
    kind: Application
    listKind: ApplicationList
    plural: applications
    singular: application
  scope: Namespaced
  versions:
  - name: hub
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            type: string
          kind:
            type: string
          metadata:
            type: object
          spec:
            properties:
              activeVersion:
                type: string
            type: object
          status:
            properties:
              conditions:
                items:
                  properties:
                    forResource:
                      properties:
                        name:
                          type: string
                        type:
                          type: string
                      type: object
                    lastProbeTime:
                      format: date-time
                      type: string
                    lastTransitionTime:
                      format: date-time
                      type: string
                    message:
                      type: string
                    reason:
                      type: string
                    status:
                      type: string
                    type:
                      type: string
                  required:
                  - status
                  - type
                  type: object
                type: array
              milestones:
                properties:
                  activationTimestamp:
                    format: date-time
                    type: string
                type: object
              phase:
                type: string
            type: object
        type: object
    served: true
    storage: false
    subresources:
      status: {}
  - name: v1alpha1
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            type: string
          kind:
            type: string
          metadata:
            type: object
          spec:
            properties:
              notServed:
                type: string
            type: object
        type: object
    served: false
    storage: false
  - name: v1alpha10
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            type: string
          kind:
            type: string
          metadata:
            type: object
          spec:
            properties:
              activeVersion:
                type: string
            type: object
          status:
            properties:
              conditions:
                items:
                  properties:
                    forResource:
                      properties:
                        name:
                          type: string
                        type:
                          type: string
                      type: object
                    lastProbeTime:
                      format: date-time
                      type: string
                    lastTransitionTime:
                      format: date-time
                      type: string
                    message:
                      type: string
                    reason:
                      type: string
                    status:
                      type: string
                    type:
                      type: string
                  required:
                  - status
                  - type
                  type: object
                type: array
              milestones:
                properties:
                  activationTimestamp:
                    format: date-time
                    type: string
                type: object
              phase:
                type: string
            type: object
        type: object
    served: true
    storage: true
    subresources:
      status: {}
  - name: v1alpha2
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            type: string
          kind:
            type: string
          metadata:
            type: object
          spec:
            properties:
              activeVersion:
                type: string
            type: object
          status:
            properties:
              conditions:
                items:
                  properties:
                    forResource:
                      properties:
                        appService:
                          type: string
                        database:
                          type: string
                        endpoint:
                          type: string
                        networkService:
                          type: string
                        queue:
                          type: string
                        storage:
                          type: string
                      type: object
                    lastProbeTime:
                      format: date-time
                      type: string
                    lastTransitionTime:
                      format: date-time
                      type: string
                    message:
                      type: string
                    reason:
                      type: string
                    status:
                      type: string
                    type:
                      type: string
                  required:
                  - status
                  - type
                  type: object
                type: array
              phase:
                type: string
            type: object
        type: object
    served: true
    storage: false
    subresources:
      status: {}
  - name: v1alpha3
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            type: string
          kind:
            type: string
          metadata:
            type: object
          spec:
            properties:
              activeVersion:
                type: string
            type: object
          status:
            properties:
              conditions:
                items:
                  properties:
                    forResource:
                      properties:
                        appService:
                          type: string
                        database:
                          type: string
                        endpoint:
                          type: string
                        networkService:
                          type: string
                        queue:
                          type: string
                        storage:
                          type: string
                      type: object
                    lastProbeTime:
                      format: date-time
                      type: string
                    lastTransitionTime:
                      format: date-time
                      type: string
                    message:
                      type: string
                    reason:
                      type: string
                    status:
                      type: string
                    type:
                      type: string
                  required:
                  - status
                  - type
                  type: object
                type: array
              phase:
                type: string
            type: object
        type: object
    served: true
    storage: false
    subresources:
      status: {}
  - name: v1alpha4
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            type: string
          kind:
            type: string
          metadata:
            type: object
          spec:
            properties:
              activeVersion:
                type: string
            type: object
          status:
            properties:
              conditions:
                items:
                  properties:
                    forResource:
                      properties:
                        appService:
                          type: string
                        database:
                          type: string
                        endpoint:
                          type: string
                        networkService:
                          type: string
                        queue:
                          type: string
                        storage:
                          type: string
                      type: object
                    lastProbeTime:
                      format: date-time
                      type: string
                    lastTransitionTime:
                      format: date-time
                      type: string
                    message:
                      type: string
                    reason:
                      type: string
                    status:
                      type: string
                    type:
                      type: string
                  required:
                  - status
                  - type
                  type: object
                type: array
              phase:
                type: string
            type: object
        type: object
    served: true
    storage: false
    subresources:
      status: {}
  - name: v1alpha5
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            type: string
          kind:
            type: string
          metadata:
            type: object
          spec:
            properties:
              activeVersion:
                type: string
            type: object
          status:
            properties:
              conditions:
                items:
                  properties:
                    forResource:
                      properties:
                        appService:
                          type: string
                        database:
                          type: string
                        endpoint:
                          type: string
                        networkService:
                          type: string
                        queue:
                          type: string
                        storage:
                          type: string
                      type: object
                    lastProbeTime:
                      format: date-time
                      type: string
                    lastTransitionTime:
                      format: date-time
                      type: string
                    message:
                      type: string
                    reason:
                      type: string
                    status:
                      type: string
                    type:
                      type: string
                  required:
                  - status
                  - type
                  type: object
                type: array
              phase:
                type: string
            type: object
        type: object
    served: true
    storage: false
    subresources:
      status: {}
  - name: v1alpha6
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            type: string
          kind:
            type: string
          metadata:
            type: object
          spec:
            properties:
              activeVersion:
                type: string
            type: object
          status:
            properties:
              conditions:
                items:
                  properties:
                    forResource:
                      properties:
                        appService:
                          type: string
                        database:
                          type: string
                        endpoint:
                          type: string
                        networkService:
                          type: string
                        queue:
                          type: string
                        storage:
                          type: string
                      type: object
                    lastProbeTime:
                      format: date-time
                      type: string
                    lastTransitionTime:
                      format: date-time
                      type: string
                    message:
                      type: string
                    reason:
                      type: string
                    status:
                      type: string
                    type:
                      type: string
                  required:
                  - status
                  - type
                  type: object
                type: array
              phase:
                type: string
            type: object
        type: object
    served: true
    storage: false
    subresources:
      status: {}
  - name: v1alpha7
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            type: string
          kind:
            type: string
          metadata:
            type: object
          spec:
            properties:
              activeVersion:
                type: string
            type: object
          status:
            properties:
              conditions:
                items:
                  properties:
                    forResource:
                      properties:
                        name:
                          type: string
                        type:
                          type: string
                      type: object
                    lastProbeTime:
                      format: date-time
                      type: string
                    lastTransitionTime:
                      format: date-time
                      type: string
                    message:
                      type: string
                    reason:
                      type: string
                    status:
                      type: string
                    type:
                      type: string
                  required:
                  - status
                  - type
                  type: object
                type: array
              phase:
                type: string
            type: object
        type: object
    served: true
    storage: false
    subresources:
      status: {}
  - name: v1alpha8
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            type: string
          kind:
            type: string
          metadata:
            type: object
          spec:
            properties:
              activeVersion:
                type: string
            type: object
          status:
            properties:
              conditions:
                items:
                  properties:
                    forResource:
                      properties:
                        name:
                          type: string
                        type:
                          type: string
                      type: object
                    lastProbeTime:
                      format: date-time
                      type: string
                    lastTransitionTime:
                      format: date-time
                      type: string
                    message:
                      type: string
                    reason:
                      type: string
                    status:
                      type: string
                    type:
                      type: string
                  required:
                  - status
                  - type
                  type: object
                type: array
              phase:
                type: string
            type: object
        type: object
    served: true
    storage: false
    subresources:
      status: {}
  - name: v1alpha9
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            type: string
          kind:
            type: string
          metadata:
            type: object
          spec:
            properties:
              activeVersion:
                type: string
            type: object
          status:
            properties:
              conditions:
                items:
                  properties:
                    forResource:
                      properties:
                        name:
                          type: string
                        type:
                          type: string
                      type: object
                    lastProbeTime:
                      format: date-time
                      type: string
                    lastTransitionTime:
                      format: date-time
                      type: string
                    message:
                      type: string
                    reason:
                      type: string
                    status:
                      type: string
                    type:
                      type: string
                  required:
                  - status
                  - type
                  type: object
                type: array
              phase:
                type: string
            type: object
        type: object
    served: true
    storage: false
    subresources:
      status: {}
status:
  acceptedNames:
    kind: ""
    plural: ""
  conditions: []
  storedVersions: []

Anything else we need to know?:

/sig api-machinery

Environment:

  • Kubernetes version: 1.20.2
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release): CentOS Linux release 7.6.1810
  • Kernel (e.g. uname -a): 3.10.0-957.27.2.el7.x86_64
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@alexarefev alexarefev added the kind/bug Categorizes issue or PR as related to a bug. label May 6, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 6, 2021
@alexarefev
Copy link
Author

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 6, 2021
@249043822
Copy link
Member

Can you enable profiling via web interface host:port/debug/pprof/heap to give more details about this issue?

@249043822
Copy link
Member

I hava tested you crds, the apiserver memory may up for a while after creation of crd, but it will down after some periods, so what is you apiserver memroy limit? I think you should increase some memory for apiserver first, and test again

@alexarefev
Copy link
Author

Can you enable profiling via web interface host:port/debug/pprof/heap to give more details about this issue?

Yes, I can give you more details via profiler.

I hava tested you crds, the apiserver memory may up for a while after creation of crd, but it will down after some periods, so what is you apiserver memroy limit? I think you should increase some memory for apiserver first, and test again

Memory increasing is a good way but it needs actions on the IaaS tier. Also, it may be necessary before CRDs creation only.

@caesarxuchao
Copy link
Member

cc @apelisse @jpbetz @leilajal
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 6, 2021
@lavalamp
Copy link
Member

lavalamp commented May 7, 2021

Almost certainly a problem constructing the OpenAPI schema; not clear if it's in the extensions-apiserver or the aggregator's merging, however. My money is on the former.

@koryaga
Copy link

koryaga commented May 13, 2021

Hi. Guys, am I understanding correct that this request is under assessment by kuber community ?

@apelisse
Copy link
Member

Yeah, I investigated that, most of the memory allocation comes from k8s.io/kubernetes/vendor/k8s.io/kube-openapi/pkg/handler.(*OpenAPIService).UpdateSpec. It allocates a large amount of memory, and it's not clear why, but neither how we could fix it.

@alexarefev
Copy link
Author

Hi guys! please let us know if you have any new information about the issue. Thanks in advance.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 30, 2021
@Jefftree
Copy link
Member

/remove-lifecycle stale

My suspicions were around the openapi aggregation as well but after digging into it a bit deeper, the bulk of the memory allocation comes from k8s.io/kubernetes/vendor/k8s.io/kube-openapi/pkg/handler.(*OpenAPIService).UpdateSpec as apelisse mentioned, and this is a step after the aggregation. Within the UpdateSpec function, k8s.io/kubernetes/vendor/github.com/json-iterator/go.(*frozenConfig).Marshal takes up most of the memory. The amount of memory used during Marshal is magnitudes larger than the final marshaled object, although it's possible that's the expected behavior. After a quick memory spike, the memory usage does return to normal fairly quickly though.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 30, 2021
@Jefftree
Copy link
Member

kubernetes/kube-openapi#251 has been opened to mitigate this

@roycaihw
Copy link
Member

#101755 (comment) Do we know if it's UpdateSpec in the apiextensions-apiserver or the aggregation layer, or both?

@apelisse
Copy link
Member

I think we measured it in the aggregation layer, @Jefftree to confirm. @DangerOnTheRanger, could you look into it and make sure that your fix solves it?

@Jefftree
Copy link
Member

Since the apiextensions-apiserver spec is a subset of the aggregated spec and the memory consumption is proportional to the size of the spec, the aggregation layer strictly uses more memory that the apiextensions-apiserver. On a cluster with a couple of sample CRDs, the bulk of the memory was consumed in the aggregator. But if the number/size of the CRDs becomes large enough, the UpdateSpec in the apiextensions-apiserver would end up consuming a sizable amount of memory.

@DangerOnTheRanger's PR targets the genericapiserver imported by all apiservers, so I think we should see benefits in both the aggregator and apiextensions?

ffromani added a commit to k8stopologyawareschedwg/resource-topology-exporter that referenced this issue Jul 25, 2022
Add metrics to report the traffic towards CRDs
generated by the RTEs.
Even though issues like
kubernetes/kubernetes#105932
kubernetes/kubernetes#101755
should be solved, it's both cheap and useful to provide these
metrics on RTE side, since we already have all the infrastructure
in place.

Signed-off-by: Francesco Romani <fromani@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants