[SURE-7356] Rancher pods memory increasing over time #44090

aruiz14 · 2024-01-18T15:39:16Z

Internal reference: SURE-7356

Rancher Server Setup

Rancher version: 2.7.9
Installation option (Docker install/Helm Chart): Helm Chart
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): AKS 1.26
Number of nodes: 3
Proxy/Cert Details: not relevant

Describe the bug
Rancher pod memory usage is growing over time (about 1GB per day).

To Reproduce
Unknown.

Additional context
Rancher at 10 GB of heap:

60% of memory is spent on the cache

Profile of the same pod, shows 30% of CPU time spent on GC:

MSpencer87 · 2024-06-06T17:09:56Z

Verified on Rancher 2.7.11(debug) on RKE2 v1.26.9+rke2r1

Observed increase in memory usage over time on Rancher pods is no loger present

Steps to reproduce

Bring up HA Rancher on 2.7.9, install Monitoring (Local->Apps->Charts)
Monitor metrics via Prometheus:
- go_memstats_heap_inuse_bytes{container="rancher"}
- container_memory_working_set_bytes{container="rancher"}
Create downstream cluster, install kubectl, save kubeconfig
Import downstream into Rancher, confirm success & active status
Create 725 ~800MB secrets on downstream via kubeconfig

createRandomSecret() {
	name=$1
	tmpdir=$(mktemp -d)

	args=""
	for i in $(seq 1 8); do
		dd if=/dev/urandom of=${tmpdir}/tmp-$i bs=80k count=1 2>/dev/null
		args="${args} --from-file=key$i=${tmpdir}/tmp-$i"
	done

	KUBECONFIG=/path/to/downstream-kubeconfig.yaml \
		kubectl -n cattle-system create secret generic $name $args

	rm -rf ${tmpdir}
}

Unregister cluster from Rancher, wait for rancher-cleanup to complete
Re-register/import downstream cluster into Rancher, confirm success & active status
Repeat steps 5-7 a few times while monitoring metrics, observe increasing memory
- Create 725 ~800MB secrets on downstream via kubeconfig
- Unregister cluster from Rancher, wait for rancher-cleanup to complete
- Re-register/import downstream cluster into Rancher, confirm success & active status

Steps to validate

Bring up HA Rancher on debug image, install Monitoring (Local->Apps->Charts)
Monitor metrics via Prometheus:
- go_memstats_heap_inuse_bytes{container="rancher"}
- container_memory_working_set_bytes{container="rancher"}
Create downstream cluster, install kubectl, save kubeconfig
Set environment variable CATTLE_AGENT_IMAGE to rancher/rancher-agent:v2.7.9 (for downstream)
Import downstream into Rancher, confirm success & active status
Create 725 ~800MB secrets on downstream via kubeconfig (see above function)
Unregister cluster from Rancher, wait for rancher-cleanup to complete
Re-register/import downstream cluster into Rancher, confirm success & active status
Repeat steps 6-8 a few times while monitoring metrics, observe increasing usage is not present
- Create 725 ~800MB secrets on downstream via kubeconfig
- Unregister cluster from Rancher, wait for rancher-cleanup to complete
- Re-register/import downstream cluster into Rancher, confirm success & active status

Left: 2.7.9, Right: 2.7.11(debug)

aruiz14 self-assigned this Jan 18, 2024

aruiz14 mentioned this issue Jan 18, 2024

DEBUG: Troubleshooting memory usage for caches #44091

Closed

This was referenced Mar 4, 2024

fix(relatedresource): remove event handlers when context is closed rancher/wrangler#361

Merged

[Backport 2.7] Rancher pods memory increasing over time #44641

Closed

[Backport 2.8] Rancher pods memory increasing over time #44642

Closed

MbolotSuse closed this as completed in rancher/wrangler#361 Mar 7, 2024

aruiz14 mentioned this issue Mar 7, 2024

deps(wrangler): bump wrangler to v2.1.4 #44715

Merged

snasovich added this to the v2.9-Next1 milestone Mar 13, 2024

git-ival self-assigned this Mar 13, 2024

sowmyav27 reopened this Apr 4, 2024

sowmyav27 assigned MSpencer87 May 7, 2024

MSpencer87 closed this as completed Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SURE-7356] Rancher pods memory increasing over time #44090

[SURE-7356] Rancher pods memory increasing over time #44090

aruiz14 commented Jan 18, 2024 •

edited by moio

Loading

MSpencer87 commented Jun 6, 2024

[SURE-7356] Rancher pods memory increasing over time #44090

[SURE-7356] Rancher pods memory increasing over time #44090

Comments

aruiz14 commented Jan 18, 2024 • edited by moio Loading

Internal reference: SURE-7356

MSpencer87 commented Jun 6, 2024

Verified on Rancher 2.7.11(debug) on RKE2 v1.26.9+rke2r1

Observed increase in memory usage over time on Rancher pods is no loger present

Steps to reproduce

Steps to validate

aruiz14 commented Jan 18, 2024 •

edited by moio

Loading