-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
K3S emitting duplicated metrics in all endpoints (Api server, kubelet, kube-proxy, kube-scheduler, etc) #67
Comments
Procedure for obtaining raw metrics exposed by K3S.The procedure described here SUSE/doc-caasp#166 (comment) can be used to manually query https metrics endpoints.
The procedure specified above creates a service account with not enough privileges to query directly kubelet metrics endpoints. ---
apiVersion: v1
kind: ServiceAccount
metadata:
name: monitoring
namespace: kube-system
secrets:
- name: monitoring-secret-token
---
apiVersion: v1
kind: Secret
metadata:
name: monitoring-secret-token
namespace: kube-system
annotations:
kubernetes.io/service-account.name: monitoring
type: kubernetes.io/service-account-token
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: monitoring-clusterrole
namespace: kube-system
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- pods
verbs: ["get", "list"]
- nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: monitoring-clusterrole-binding
namespace: kube-system
roleRef:
kind: ClusterRole
name: monitoring-clusterrole
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
name: monitoring
namespace: kube-system Following script can be used to automatically extract metrics from kubelet, kube-proxy and apiserver endpoints and compare the results: #!/bin/bash
# Get token
TOKEN=$(kubectl -n kube-system get secrets monitoring-secret-token -ojsonpath='{.data.token}' | base64 -d)
APISERVER=$(kubectl config view | grep server | cut -f 2- -d ":" | tr -d " ")
# Get apiserver
curl -ks $APISERVER/metrics --header "Authorization: Bearer $TOKEN" | grep -v "# " > apiserver.txt
# Get list of nodes of k3s cluster from api server and iterate over it
for i in `kubectl get nodes -o json | jq -r '.items[].status.addresses[0].address'`; do
echo "Getting metrics from node $i"
curl -ks https://$i:10250/metrics --header "Authorization: Bearer $TOKEN" | grep -v "# " > kubelet_$i.txt
curl -ks https://$i:10250/metrics/cadvisor --header "Authorization: Bearer $TOKEN" | grep -v "# " > kubelet_cadvisor_$i.txt
curl -ks http://$i:10249/metrics | grep -v "# " > kubeproxy_$i.txt
done
# Get kube-controller and kube-scheduler
for i in `kubectl get nodes -o json | jq -r '.items[] | select(.metadata.labels."node-role.kubernetes.io/master" != null) | .status.addresses[0].address'`; do
echo "Getting metrics from master node $i"
curl -ks https://$i:10259/metrics --header "Authorization: Bearer $TOKEN" | grep -v "# " > kube-scheduler_$i.txt
curl -ks https://$i:10257/metrics --header "Authorization: Bearer $TOKEN" | grep -v "# " > kube-controller_$i.txt
done Analyzing the resultsExecuting the previous script, the following files contains the metrics extracted from each of the exposed ports in each of the nodes of the cluster: apiserver.txt
ConclusionTo get all k3s metrics it is enough with collecting metrics from kubelet endpoints ( |
Possible solution.Enabling only monitoring of kubelet endpoints Changes to be implemented:
|
Final solution setting job label to "kubelet" for all metrics scrapped for k3s components through kubelet port. Selecting a different name such as "k3s" (initial proposed solution) makes that all default kube-prometheus-stack dashboards using kubelet metrics (container metrics), need to be updated. For example: The following dashboards use "job=kubelet" when filtering the metrics. |
@ricsanfre First this repo and the accompanying website are awesome. Thanks for your efforts. Regarding this issue I want to let you know that I've solved it in a little bit of a different manner that ensures that the So instead of disabling all the components in the Helm chart I actually keep them enabled but instruct all but the e.g. This is how I defined my kubeApiServer:
serviceMonitor:
metricRelabelings:
- action: drop
regex: .*
sourceLabels:
- __name__ I have a similar snippet for With this the Chart is still creating the rules and dashboards without ingesting duplicate metrics. Only metrics from the Now the rules and dashboards created by the chart refer to a job that needs to be replaced with In the The end result is
The only drawback is that although Prometheus doesn't ingest duplicate metrics it still ends up scraping multiple end points and dropping the metrics from these endpoints which of course means relatively higher CPU and memory usage. |
One idea that just occurred to me to address the drawback is to set the |
@sherif-fanous, thank you so much for sharing your ideas. Would it be possible to share your |
The relevant sections of my values.yaml. Keep in mind this is a k3s single node cluster running on TrueNAS Scale. You might have a slightly different setup than mine especially regarding kubeApiServer:
serviceMonitor:
interval: 1d
metricRelabelings:
- action: drop
regex: .*
sourceLabels:
- __name__
kubeControllerManager:
endpoints:
- 192.168.4.59
serviceMonitor:
https: true
insecureSkipVerify: true
interval: 1d
metricRelabelings:
- action: drop
regex: .*
sourceLabels:
- __name__
kubeEtcd:
enabled: false
kubelet:
serviceMonitor:
metricRelabelings:
- action: drop
regex: apiserver_request_duration_seconds_bucket;(0.15|0.2|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2|3|3.5|4|4.5|6|7|8|9|15|25|40|50)
sourceLabels:
- __name__
- le
kubeProxy:
enabled: false
kubeScheduler:
endpoints:
- 192.168.4.59
serviceMonitor:
https: true
insecureSkipVerify: true
interval: 1d
metricRelabelings:
- action: drop
regex: .*
sourceLabels:
- __name__ The apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
annotations:
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
argocd.argoproj.io/sync-wave: '32'
finalizers:
- resources-finalizer.argocd.argoproj.io
name: kube-prometheus-stack
namespace: argo-cd
spec:
destination:
namespace: kube-prometheus-stack
server: https://kubernetes.default.svc
project: default
source:
chart: kube-prometheus-stack
repoURL: https://prometheus-community.github.io/helm-charts
targetRevision: 58.2.1
sources:
- chart: kube-prometheus-stack
plugin:
name: config-management-plugin-template
parameters:
- name: generate-command
string: >-
sed -E -i 's/job="(apiserver|kube-scheduler|kube-controller-manager)"/job="kubelet"/g' ./ifanous/home-lab/kubernetes/apps/kube-prometheus-stack/template/truenas-mini-x-plus/all.yaml && sed -E -i 's/job=\\"(apiserver|kube-scheduler|kube-controller-manager)\\"/job=\\"kubelet\\"/g' ./ifanous/home-lab/kubernetes/apps/kube-prometheus-stack/template/truenas-mini-x-plus/all.yaml && sed -E -i 's/sum\(up\{cluster=\\"\$cluster\\", job=\\"kubelet\\"\}\)/sum\(up\{cluster=\\"\$cluster\\",job=\\"kubelet\\", metrics_path=\\"\/metrics\\"\}\)/g' ./ifanous/home-lab/kubernetes/apps/kube-prometheus-stack/template/truenas-mini-x-plus/all.yaml && cat ./ifanous/home-lab/kubernetes/apps/kube-prometheus-stack/template/truenas-mini-x-plus/all.yaml
- name: init-command
string: >-
mkdir -p ./ifanous/home-lab/kubernetes/apps/kube-prometheus-stack/template/truenas-mini-x-plus/ && helm template . --create-namespace --namespace prometheus-stack --values ./ifanous/home-lab/kubernetes/apps/kube-prometheus-stack/helm/values/base/helm-kube-prometheus-stack-values.yaml --values ./ifanous/home-lab/kubernetes/apps/kube-prometheus-stack/helm/values/overlays/truenas-mini-x-plus/helm-kube-prometheus-stack-values.yaml >
./ifanous/home-lab/kubernetes/apps/kube-prometheus-stack/template/truenas-mini-x-plus/all.yaml
repoURL: https://prometheus-community.github.io/helm-charts
targetRevision: 58.2.1
- path: kubernetes/apps/kube-prometheus-stack/kustomize/overlays/truenas-mini-x-plus
repoURL: git@github.com:ifanous/home-lab.git
targetRevision: HEAD
- ref: root
repoURL: git@github.com:ifanous/home-lab.git
syncPolicy:
automated:
prune: true
selfHeal: true
retry:
backoff:
duration: 5s
factor: 2
maxDuration: 3m
limit: 5
syncOptions:
- CreateNamespace=true
- ServerSideApply=true P.S. My repo is private so you won't be able to access it but still everything you need is in this thread, just replace every reference to my repo with yours. You also need to setup Argo CD to use a CMP plugin. At a high level here's what I'm doing in my Argo CD values.yaml configs:
cmp:
create: true
plugins:
config-management-plugin-template:
generate:
args:
- |
echo "Starting generate phase for application $ARGOCD_APP_NAME" 1>&2;
echo "Executing $PARAM_GENERATE_COMMAND" 1>&2;
eval $PARAM_GENERATE_COMMAND;
echo "Successfully completed generate phase for application $ARGOCD_APP_NAME" 1>&2;
command: [/bin/sh, -c]
init:
args:
- |
echo "Starting init phase for application $ARGOCD_APP_NAME" 1>&2;
echo "Starting a partial treeless clone of repo ifanous/home-lab.git" 1>&2; mkdir ifanous 1>&2; cd ifanous 1>&2; git clone -n --depth=1 --filter=tree:0 https://$IFANOUS_HOME_LAB_HTTPS_USERNAME:$IFANOUS_HOME_LAB_HTTPS_PASSWORD@github.com/ifanous/home-lab.git 1>&2; cd home-lab/ 1>&2; git sparse-checkout set --no-cone $ARGOCD_APP_NAME 1>&2; git checkout 1>&2;
echo "Successfully completed a partial treeless clone of repo ifanous/home-lab.git" 1>&2;
echo "Executing $PARAM_INIT_COMMAND" 1>&2;
cd ../../ 1>&2; eval $PARAM_INIT_COMMAND;
echo "Successfully completed init phase for application $ARGOCD_APP_NAME" 1>&2;
command: ["/bin/sh", "-c"]
repoServer:
extraContainers:
- args:
- '--logformat=json'
- '--loglevel=debug'
command:
- /var/run/argocd/argocd-cmp-server
env:
- name: IFANOUS_HOME_LAB_HTTPS_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: argocd-repo-creds-ifanous-home-lab-https
- name: IFANOUS_HOME_LAB_HTTPS_USERNAME
valueFrom:
secretKeyRef:
key: username
name: argocd-repo-creds-ifanous-home-lab-https
image: alpine/k8s:1.29.2
name: config-management-plugin-template
resources:
limits:
memory: 512Mi
requests:
memory: 64Mi
securityContext:
runAsNonRoot: true
runAsUser: 999
volumeMounts:
- mountPath: /var/run/argocd
name: var-files
- mountPath: /home/argocd/cmp-server/plugins
name: plugins
- mountPath: /home/argocd/cmp-server/config/plugin.yaml
name: argocd-cmp-cm
subPath: config-management-plugin-template.yaml
- mountPath: /tmp
name: cmp-tmp |
Thank you very much! |
Bug Description
Kuberentes Documentation - System Metrics details which Kubernetes components expose metrics in Prometheus format:
These components are:
/metrics
endpoint at TCP 10257)/metrics
endpoint at TCP 10249)/metrics
at Kubernetes API port)/metrics
endpoint at TCP 10259)/metrics
,/metrics/cadvisor
,/metrics/resource
and/metrics/probes
endpoints at TCP 10250)K3S distribution has a special behavior since in each node only one process is deployed (
k3s-server
running on master nodes ork3s-agent
running on worker nodes) with all k8s components sharing the same memory.K3s is emitting the same metrics, from all k8s components deployed in the node, at all '/metrics' endpoints available (api-server, kubelet (TCP 10250), kube-proxy (TCP 10249), kube-scheduler (TCP 10251), kube-controller-manager (TCP 10257). Thus, collecting from all port produces metrics duplicates.
kubelet additional metrics (endpoints
/metrics/cadvisor
,/metrics/resource
and/metrics/probes
) are only available at TCP 10250.Enabling the scraping of all different metrics TCP ports (kubernetes components) causes the ingestion of duplicated metrics. Duplicated metrics in Prometheus need to be removed in order to reduce memory and CPU consumption.
Context Information
As stated in issue #22, there was a known issue in K3S: k3s-io/k3s#2262, where duplicated metrics are emitted by the three components (kube-proxy, kube-scheduler and kube-controller-manager).
The proposed solution by Rancher Monitoring(k3s-io/k3s#2262), was to avoid the scrape of duplicated metrics and activate only the service monitoring of one of the components. (i.e. kube-proxy).
That solution was implemented (see #22 (comment)) and it solved the main issue (out-of-memory).
Endpoints currently being scrapped by Prometheus are
Duplicated metrics
After deeper analysis on the metrics scrapped by Prometheus, it is clear that K3S is emitting duplicated metrics in all endpoints.
Example 1: API-server metrics emitted by kube-proxy, kubelet and api-server endpoints running on master server
Example 2: kubelet metrics emitted by kube-proxy, kubelet and api-server
Example3: kubepoxy metrics:
kubeproxy_sync_proxy_rules_duration_seconds_bucket{le="0.001"}
The text was updated successfully, but these errors were encountered: