Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubleshooting memory usage for caches #57

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

aruiz14
Copy link

@aruiz14 aruiz14 commented Jan 18, 2024

This PR includes some changes to ease troubleshooting memory issues in rancher/rancher.

  1. By making the heap profile to split cache Reflector List() call into 3 different flames, depending on the context/client that uses it, for management context, user context in the local cluster (which may be a subset of the objects in the management context cache) and downstream contexts (aggregated).
  2. Adds Prometheus metrics to count the number of started informers in each cache, as well as the number of items stored, by kind and context (identifying each of them by upstream vs. downstream, source and cluster name, if available).

Examples:

# HELP cache_started_count Number of started caches per factory
# TYPE cache_started_count gauge
cache_started_count{context="mgmt_context_authserver"} 8
cache_started_count{context="mgmt_context_core-at-ListenAndServe"} 1
cache_started_count{context="mgmt_context_multicluster_manager"} 60
cache_started_count{context="user_context_multicluster_manager_local"} 12
# HELP cache_store_count Number of items in the cache store
# TYPE cache_store_count gauge
...
cache_store_count{context="user_context_multicluster_manager_c-m-jwkvxjkf",kind="Secret./v1"} 22
cache_store_count{context="user_context_multicluster_manager_c-m-jwkvxjkf",kind="ServiceAccount./v1"} 51
cache_store_count{context="user_context_multicluster_manager_local",kind="APIService.apiregistration.k8s.io/v1"} 43
cache_store_count{context="user_context_multicluster_manager_local",kind="ClusterRole.rbac.authorization.k8s.io/v1"} 141
...

Identifying each of them by upstream vs. downstream, source and cluster
name, if available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant