You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ClusterProfiler, feature introduced in #6211, causes significant memory usage on virt-api, even if a cluster is not under any load (no virt-launcher pods). Memory usage grows lineary with size of the cluster. This might lead to OOM kills for virt-api pods. When I was experimenting on a large scale cluster, OOMs happened after running the following commands in a short timespan:
With the current implementation, upon dump request, the virt-api pod is gathering results in memory from all kubevirt pods (namely: virt-api, virt-operator, virt-controller, virt-handler and virt-launcher). Then, it returns the result to the kubevirt client.
When I did experiments on a smaller cluster (28 kubevirt pods), the size of object holding results (v1.ClusterProfilerResults) was ~250Mb. This results in 9Mb/pod, and does not take into consideration collecting long-running profile, which size might grow in time. 9Mb/pod seems reasonable value as heap and allocs profile take around ~3Mb each (I gathered these profiler results with an alternative way).
Assuming that one might want to run profiler one a cluster of 500 nodes, each node running 4 VMs (hence 4 virt-launcher pods), there are ~2500 kubevirt pods to collect profiles from. This might result in 20Gb+ of MEM usage.
What you expected to happen:
I expect being able to gather ClusterProfiler results with reasonable amount of memory used, regardless of the size of a cluster.
How to reproduce it (as minimally and precisely as possible):
Deploy a cluster and try ClusterProfiler with a different number of kubevirt pods running. You will observe that memory usage of virt-api pods grows lineary with the number of pods.
Anything else we need to know?:
My initial thought on solution to this problem is that struct v1.ClusterProfilerResults has to be removed, as for large scale cluster it simply won't fit into memory, neither of virt-api nor a client that initiates dump request.
Alternative solution would be to adjust cluster-profiler tool to be able to ask virt-api for profiler results of a single kubevirt pod, one-by-one.
Another approach is to modify logic of dump request, instead of virt-api gathering all results in memory, each of the kubevirt pod could dump the results into its local /profile-data volume. Then a client could fetch the results one-by-one, by executing kubectl cp (or similar).
I'm happy to propose PR once we decide on the solution to this problem.
The text was updated successfully, but these errors were encountered:
ssuming that one might want to run profiler one a cluster of 500 nodes, each node running 4 VMs (hence 4 virt-launcher pods), there are ~2500 kubevirt pods to collect profiles from. This might result in 20Gb+ of MEM usage.
only the cluster control plane (virt-controller, virt-api, virt-operator) and the node level control plane (virt-handler) currently report back profiles. So for 500 nodes, that's probably about 506 profiles... Which is still a whole lot of data.
Part of the reason the profiler is designed to aggregate all the profiles in virt-api is that it solves the problem of how to extract the profiles from the cluster. If we can talk to the kubernetes api-server, then ingress is already solved for us. So retrieving all this data by proxying through the api-server is convenient (since it's just debug data and not a production workflow)
My recommendation here is to start to reduce the size of the dumps by adding selectors to the api that starts/dumps the profile data. For example, node selectors could be used which only picks the nodes running the KubeVirt control plane and perhaps a couple of virt-handlers. That would reduce the amount of data being collected to exactly what you're interested in.
Another selection mechanism could be the type of profiling data you wish to retrieve. For example, maybe you're only interested in the cpu profile and not everything else. The api could be extended to do that as well.
If all this isn't feasible, the last approach i'd consider is utilizing kubectl cp to extract all this aggregated data from a pod that collects it within the cluster and dumps it to disk.
Is this a BUG REPORT or FEATURE REQUEST?:
/kind enhancement
What happened:
ClusterProfiler, feature introduced in #6211, causes significant memory usage on virt-api, even if a cluster is not under any load (no virt-launcher pods). Memory usage grows lineary with size of the cluster. This might lead to OOM kills for virt-api pods. When I was experimenting on a large scale cluster, OOMs happened after running the following commands in a short timespan:
With the current implementation, upon
dump
request, the virt-api pod is gathering results in memory from all kubevirt pods (namely: virt-api, virt-operator, virt-controller, virt-handler and virt-launcher). Then, it returns the result to the kubevirt client.When I did experiments on a smaller cluster (28 kubevirt pods), the size of object holding results (
v1.ClusterProfilerResults
) was ~250Mb. This results in 9Mb/pod, and does not take into consideration collecting long-running profile, which size might grow in time. 9Mb/pod seems reasonable value asheap
andallocs
profile take around ~3Mb each (I gathered these profiler results with an alternative way).Assuming that one might want to run profiler one a cluster of 500 nodes, each node running 4 VMs (hence 4 virt-launcher pods), there are ~2500 kubevirt pods to collect profiles from. This might result in 20Gb+ of MEM usage.
What you expected to happen:
I expect being able to gather
ClusterProfiler
results with reasonable amount of memory used, regardless of the size of a cluster.How to reproduce it (as minimally and precisely as possible):
Deploy a cluster and try ClusterProfiler with a different number of kubevirt pods running. You will observe that memory usage of virt-api pods grows lineary with the number of pods.
Anything else we need to know?:
My initial thought on solution to this problem is that struct
v1.ClusterProfilerResults
has to be removed, as for large scale cluster it simply won't fit into memory, neither ofvirt-api
nor a client that initiatesdump
request.Alternative solution would be to adjust
cluster-profiler
tool to be able to ask virt-api for profiler results of a single kubevirt pod, one-by-one.Another approach is to modify logic of
dump
request, instead of virt-api gathering all results in memory, each of the kubevirt pod could dump the results into its local/profile-data
volume. Then a client could fetch the results one-by-one, by executingkubectl cp
(or similar).I'm happy to propose PR once we decide on the solution to this problem.
The text was updated successfully, but these errors were encountered: