Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SURE-7356] Rancher pods memory increasing over time #44090

Closed
aruiz14 opened this issue Jan 18, 2024 · 1 comment · Fixed by rancher/wrangler#361
Closed

[SURE-7356] Rancher pods memory increasing over time #44090

aruiz14 opened this issue Jan 18, 2024 · 1 comment · Fixed by rancher/wrangler#361
Assignees
Milestone

Comments

@aruiz14
Copy link
Contributor

aruiz14 commented Jan 18, 2024

Internal reference: SURE-7356

Rancher Server Setup

  • Rancher version: 2.7.9
  • Installation option (Docker install/Helm Chart): Helm Chart
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): AKS 1.26
  • Number of nodes: 3
  • Proxy/Cert Details: not relevant

Describe the bug
Rancher pod memory usage is growing over time (about 1GB per day).

To Reproduce
Unknown.

Additional context
Rancher at 10 GB of heap:
image

60% of memory is spent on the cache

Profile of the same pod, shows 30% of CPU time spent on GC:
image

@MSpencer87
Copy link
Contributor

Verified on Rancher 2.7.11(debug) on RKE2 v1.26.9+rke2r1

Observed increase in memory usage over time on Rancher pods is no loger present

Steps to reproduce

  1. Bring up HA Rancher on 2.7.9, install Monitoring (Local->Apps->Charts)
  2. Monitor metrics via Prometheus:
    • go_memstats_heap_inuse_bytes{container="rancher"}
    • container_memory_working_set_bytes{container="rancher"}
  3. Create downstream cluster, install kubectl, save kubeconfig
  4. Import downstream into Rancher, confirm success & active status
  5. Create 725 ~800MB secrets on downstream via kubeconfig
createRandomSecret() {
	name=$1
	tmpdir=$(mktemp -d)

	args=""
	for i in $(seq 1 8); do
		dd if=/dev/urandom of=${tmpdir}/tmp-$i bs=80k count=1 2>/dev/null
		args="${args} --from-file=key$i=${tmpdir}/tmp-$i"
	done

	KUBECONFIG=/path/to/downstream-kubeconfig.yaml \
		kubectl -n cattle-system create secret generic $name $args

	rm -rf ${tmpdir}
}
  1. Unregister cluster from Rancher, wait for rancher-cleanup to complete
  2. Re-register/import downstream cluster into Rancher, confirm success & active status
  3. Repeat steps 5-7 a few times while monitoring metrics, observe increasing memory
    • Create 725 ~800MB secrets on downstream via kubeconfig
    • Unregister cluster from Rancher, wait for rancher-cleanup to complete
    • Re-register/import downstream cluster into Rancher, confirm success & active status

Steps to validate

  1. Bring up HA Rancher on debug image, install Monitoring (Local->Apps->Charts)
  2. Monitor metrics via Prometheus:
    • go_memstats_heap_inuse_bytes{container="rancher"}
    • container_memory_working_set_bytes{container="rancher"}
  3. Create downstream cluster, install kubectl, save kubeconfig
  4. Set environment variable CATTLE_AGENT_IMAGE to rancher/rancher-agent:v2.7.9 (for downstream)
  5. Import downstream into Rancher, confirm success & active status
  6. Create 725 ~800MB secrets on downstream via kubeconfig (see above function)
  7. Unregister cluster from Rancher, wait for rancher-cleanup to complete
  8. Re-register/import downstream cluster into Rancher, confirm success & active status
  9. Repeat steps 6-8 a few times while monitoring metrics, observe increasing usage is not present
    • Create 725 ~800MB secrets on downstream via kubeconfig
    • Unregister cluster from Rancher, wait for rancher-cleanup to complete
    • Re-register/import downstream cluster into Rancher, confirm success & active status

Left: 2.7.9, Right: 2.7.11(debug)
s2
s1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants