Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIG-scale] ClusterProfiler virt-api memory usage scales lineary with number of kubevirt pods #6478

Closed
knopt opened this issue Sep 29, 2021 · 3 comments · Fixed by #6511
Closed
Assignees

Comments

@knopt
Copy link
Contributor

knopt commented Sep 29, 2021

Is this a BUG REPORT or FEATURE REQUEST?:

/kind enhancement

What happened:

ClusterProfiler, feature introduced in #6211, causes significant memory usage on virt-api, even if a cluster is not under any load (no virt-launcher pods). Memory usage grows lineary with size of the cluster. This might lead to OOM kills for virt-api pods. When I was experimenting on a large scale cluster, OOMs happened after running the following commands in a short timespan:

$ ./cluster-profiler --cmd start
$ ./cluster-profiler --cmd dump

With the current implementation, upon dump request, the virt-api pod is gathering results in memory from all kubevirt pods (namely: virt-api, virt-operator, virt-controller, virt-handler and virt-launcher). Then, it returns the result to the kubevirt client.

When I did experiments on a smaller cluster (28 kubevirt pods), the size of object holding results (v1.ClusterProfilerResults) was ~250Mb. This results in 9Mb/pod, and does not take into consideration collecting long-running profile, which size might grow in time. 9Mb/pod seems reasonable value as heap and allocs profile take around ~3Mb each (I gathered these profiler results with an alternative way).

Assuming that one might want to run profiler one a cluster of 500 nodes, each node running 4 VMs (hence 4 virt-launcher pods), there are ~2500 kubevirt pods to collect profiles from. This might result in 20Gb+ of MEM usage.

What you expected to happen:

I expect being able to gather ClusterProfiler results with reasonable amount of memory used, regardless of the size of a cluster.

How to reproduce it (as minimally and precisely as possible):

Deploy a cluster and try ClusterProfiler with a different number of kubevirt pods running. You will observe that memory usage of virt-api pods grows lineary with the number of pods.

Anything else we need to know?:

My initial thought on solution to this problem is that struct v1.ClusterProfilerResults has to be removed, as for large scale cluster it simply won't fit into memory, neither of virt-api nor a client that initiates dump request.

Alternative solution would be to adjust cluster-profiler tool to be able to ask virt-api for profiler results of a single kubevirt pod, one-by-one.
Another approach is to modify logic of dump request, instead of virt-api gathering all results in memory, each of the kubevirt pod could dump the results into its local /profile-data volume. Then a client could fetch the results one-by-one, by executing kubectl cp (or similar).

I'm happy to propose PR once we decide on the solution to this problem.

@knopt
Copy link
Contributor Author

knopt commented Sep 29, 2021

/cc @davidvossel

@davidvossel
Copy link
Member

ssuming that one might want to run profiler one a cluster of 500 nodes, each node running 4 VMs (hence 4 virt-launcher pods), there are ~2500 kubevirt pods to collect profiles from. This might result in 20Gb+ of MEM usage.

only the cluster control plane (virt-controller, virt-api, virt-operator) and the node level control plane (virt-handler) currently report back profiles. So for 500 nodes, that's probably about 506 profiles... Which is still a whole lot of data.

Part of the reason the profiler is designed to aggregate all the profiles in virt-api is that it solves the problem of how to extract the profiles from the cluster. If we can talk to the kubernetes api-server, then ingress is already solved for us. So retrieving all this data by proxying through the api-server is convenient (since it's just debug data and not a production workflow)

My recommendation here is to start to reduce the size of the dumps by adding selectors to the api that starts/dumps the profile data. For example, node selectors could be used which only picks the nodes running the KubeVirt control plane and perhaps a couple of virt-handlers. That would reduce the amount of data being collected to exactly what you're interested in.

Another selection mechanism could be the type of profiling data you wish to retrieve. For example, maybe you're only interested in the cpu profile and not everything else. The api could be extended to do that as well.

If all this isn't feasible, the last approach i'd consider is utilizing kubectl cp to extract all this aggregated data from a pod that collects it within the cluster and dumps it to disk.

@knopt
Copy link
Contributor Author

knopt commented Oct 4, 2021

/assign @knopt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants