-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watch request for CRs costs about 10-15x more memory in k8s-apiserver than in-tree resource watches #124680
Comments
/sig apimachinery |
@nagygergo: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig api-machinery |
/triage accepted |
@benluddy any idea how much of this could be optimized away by CBOR? |
I'd expect the steady-state memory to depend on the size of the Go objects in the watch cache. Regardless of the object storage encoding, the decoded Unstructured objects will be larger than in-tree counterparts, mainly because of all the maps with duplicate copies of the string keys. Maybe there is an opportunity to do efficient string interning using the information in the CR schemas? |
+1 - I don't think CBOR will change much here. The memory usage is large because of inefficient in-memory representation of CRDs. So for now, CRDs are by-design less efficient than built-in resources, with CBOR we're attacking the CPU part first thought. |
What happened?
I was running some load testing related to flux. When creating 10.000 kustomization custom resources (about 1KiB), the k8s apiserver consumes about 1GiB of memory. When checking with 100.000k and 300.00k, the k8s apiserver scales linearly.
![10k kustomizations](https://private-user-images.githubusercontent.com/6106093/327495968-de508042-fe96-4ccc-a1ef-a6d535e611fa.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE3NjIwMDMsIm5iZiI6MTcyMTc2MTcwMywicGF0aCI6Ii82MTA2MDkzLzMyNzQ5NTk2OC1kZTUwODA0Mi1mZTk2LTRjY2MtYTFlZi1hNmQ1MzVlNjExZmEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcyMyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MjNUMTkwODIzWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NWQ2NjBjZmQ4OTdiYzlmMjMxYzkxNDIxODBjODAwMDQyMzA4NzQ4ZmU0N2E3NzM4OTM5M2I4ODdiZmVkYmQ2MiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.p-g8nzstup8iFjFLaFRw-QIPCspOlzMLSC5vmafOpVo)
When doing the same thing for 1KiB conifgmaps, creating 10.000 resources, the k8s apiserver consumes about 100 MiB of memory.
Memory pprof for 10k kustomizations:
Memory pprof for 10k configmaps:
![image](https://private-user-images.githubusercontent.com/6106093/327505510-d710dae6-4818-460e-9384-b319938df92d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE3NjIwMDMsIm5iZiI6MTcyMTc2MTcwMywicGF0aCI6Ii82MTA2MDkzLzMyNzUwNTUxMC1kNzEwZGFlNi00ODE4LTQ2MGUtOTM4NC1iMzE5OTM4ZGY5MmQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcyMyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MjNUMTkwODIzWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YjZhYjBhYjE1NGQzNzk4NzliM2FkY2I3MGQyMDdiN2U3YmQ2YjEwMTA4MWIzZjJmZDk2YzEyNGVkMTQ4OTJhMSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.kbkTn-Ef5Eyyu3eA3Av1Td50jumpR4VtFyld5lNEtDM)
The memory usage stays the same as long as the resources are existing. After looking a bit into what might force this, it seems that the kube-controller-manager sets up a watch for the kustomizations/configmaps resources.
This is needed because garbage collector that runs in kube-controller-manager needs to walk the ownership reference map, and it wants to do that in cache:
kubernetes/pkg/controller/garbagecollector/garbagecollector.go
Line 253 in a9eded0
What did you expect to happen?
Expectation would've been that there is similar memory usage for in-tree and custom resources.
Also, the current garbage collector seems to force k8s-apiserver to cache the full contents of etcd. Is that a correct implementation?
How can we reproduce it (as minimally and precisely as possible)?
Create a cluster
kind create cluster
Add the kustomize CRD
curl -L https://raw.githubusercontent.com/fluxcd/kustomize-controller/main/config/crd/bases/kustomize.toolkit.fluxcd.io_kustomizations.yaml | kubectl apply -f -
Create 10k of the following
Anything else we need to know?
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: