[SIG-scale] ClusterProfiler virt-api memory usage scales lineary with number of kubevirt pods #6478

knopt · 2021-09-29T12:20:17Z

Is this a BUG REPORT or FEATURE REQUEST?:

/kind enhancement

What happened:

ClusterProfiler, feature introduced in #6211, causes significant memory usage on virt-api, even if a cluster is not under any load (no virt-launcher pods). Memory usage grows lineary with size of the cluster. This might lead to OOM kills for virt-api pods. When I was experimenting on a large scale cluster, OOMs happened after running the following commands in a short timespan:

$ ./cluster-profiler --cmd start
$ ./cluster-profiler --cmd dump

With the current implementation, upon dump request, the virt-api pod is gathering results in memory from all kubevirt pods (namely: virt-api, virt-operator, virt-controller, virt-handler and virt-launcher). Then, it returns the result to the kubevirt client.

When I did experiments on a smaller cluster (28 kubevirt pods), the size of object holding results (v1.ClusterProfilerResults) was ~250Mb. This results in 9Mb/pod, and does not take into consideration collecting long-running profile, which size might grow in time. 9Mb/pod seems reasonable value as heap and allocs profile take around ~3Mb each (I gathered these profiler results with an alternative way).

Assuming that one might want to run profiler one a cluster of 500 nodes, each node running 4 VMs (hence 4 virt-launcher pods), there are ~2500 kubevirt pods to collect profiles from. This might result in 20Gb+ of MEM usage.

What you expected to happen:

I expect being able to gather ClusterProfiler results with reasonable amount of memory used, regardless of the size of a cluster.

How to reproduce it (as minimally and precisely as possible):

Deploy a cluster and try ClusterProfiler with a different number of kubevirt pods running. You will observe that memory usage of virt-api pods grows lineary with the number of pods.

Anything else we need to know?:

My initial thought on solution to this problem is that struct v1.ClusterProfilerResults has to be removed, as for large scale cluster it simply won't fit into memory, neither of virt-api nor a client that initiates dump request.

Alternative solution would be to adjust cluster-profiler tool to be able to ask virt-api for profiler results of a single kubevirt pod, one-by-one.
Another approach is to modify logic of dump request, instead of virt-api gathering all results in memory, each of the kubevirt pod could dump the results into its local /profile-data volume. Then a client could fetch the results one-by-one, by executing kubectl cp (or similar).

I'm happy to propose PR once we decide on the solution to this problem.

The text was updated successfully, but these errors were encountered:

knopt · 2021-09-29T12:21:07Z

/cc @davidvossel

davidvossel · 2021-09-30T17:10:47Z

ssuming that one might want to run profiler one a cluster of 500 nodes, each node running 4 VMs (hence 4 virt-launcher pods), there are ~2500 kubevirt pods to collect profiles from. This might result in 20Gb+ of MEM usage.

only the cluster control plane (virt-controller, virt-api, virt-operator) and the node level control plane (virt-handler) currently report back profiles. So for 500 nodes, that's probably about 506 profiles... Which is still a whole lot of data.

Part of the reason the profiler is designed to aggregate all the profiles in virt-api is that it solves the problem of how to extract the profiles from the cluster. If we can talk to the kubernetes api-server, then ingress is already solved for us. So retrieving all this data by proxying through the api-server is convenient (since it's just debug data and not a production workflow)

My recommendation here is to start to reduce the size of the dumps by adding selectors to the api that starts/dumps the profile data. For example, node selectors could be used which only picks the nodes running the KubeVirt control plane and perhaps a couple of virt-handlers. That would reduce the amount of data being collected to exactly what you're interested in.

Another selection mechanism could be the type of profiling data you wish to retrieve. For example, maybe you're only interested in the cpu profile and not everything else. The api could be extended to do that as well.

If all this isn't feasible, the last approach i'd consider is utilizing kubectl cp to extract all this aggregated data from a pod that collects it within the cluster and dumps it to disk.

knopt · 2021-10-04T13:55:01Z

/assign @knopt

kubevirt-bot added the kind/enhancement label Sep 29, 2021

kubevirt-bot assigned knopt Oct 4, 2021

knopt mentioned this issue Oct 4, 2021

Introduce pagination and filtering to improve ClusterProfiler memory usage #6511

Merged

kubevirt-bot closed this as completed in #6511 Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SIG-scale] ClusterProfiler virt-api memory usage scales lineary with number of kubevirt pods #6478

[SIG-scale] ClusterProfiler virt-api memory usage scales lineary with number of kubevirt pods #6478

knopt commented Sep 29, 2021

knopt commented Sep 29, 2021

davidvossel commented Sep 30, 2021

knopt commented Oct 4, 2021

[SIG-scale] ClusterProfiler virt-api memory usage scales lineary with number of kubevirt pods #6478

[SIG-scale] ClusterProfiler virt-api memory usage scales lineary with number of kubevirt pods #6478

Comments

knopt commented Sep 29, 2021

knopt commented Sep 29, 2021

davidvossel commented Sep 30, 2021

knopt commented Oct 4, 2021