Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus missing kubelet metrics on AKS #6846

Closed
toddnelson-sap opened this issue Jan 13, 2020 · 9 comments · Fixed by #7171
Closed

Prometheus missing kubelet metrics on AKS #6846

toddnelson-sap opened this issue Jan 13, 2020 · 9 comments · Fixed by #7171

Comments

@toddnelson-sap
Copy link

@toddnelson-sap toddnelson-sap commented Jan 13, 2020

Description
After installing Kyma 1.9, Prometheus is up and running and available in the "Stats & Metics" link in the Kyma console UI. Browsing the Grafana / Prometheus UI for the "Kubernetes Pods" resources shows the expected page. When you query for a pod, however, the "current" metrics for the pod are missing, including Memory, CPU, and Network.

Cloud Provider: Azure
Kyma version: 1.9
kubectl version:
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"1da9875156ba0ad48e7d09a5d00e41489507f592", GitTreeState:"clean", BuildDate:"2019-11-14T05:19:20Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

Expected result

Current Pod metics for Memory, CPU, and Network are visible in the Grafana / Prometheus user interface.

Actual result

Current Pod metics for Memory, CPU, and Network are missing in the Grafana / Prometheus user interface.

Steps to reproduce

  • Log into Kyma 1.9 console
  • https://console.dev2.agogos.ml
  • Click on link for "Stats & Metics" in the main Kyma menu
  • opens https://grafana.dev2.agogos.ml/
  • Click "Home", "Kubernetes / Pods"
  • Query for a pod:
  • datasource: Prometheus
  • Namespace: kyma-system
  • Pod: almost any pod will do, for example: logging-promtrail-9sn7t
  • The UI shows Requested and Limit settings for Memory and CPU, but is missing Current metrics for Memory, CPU and Network

Troubleshooting

There's an error reported from Prometheus about a missing config file. I don't know if this is related to the above issue or not.

kubectl describe pod -n kyma-system prometheus-monitoring-prometheus-0

      Started:   Thu, 09 Jan 2020 17:37:26 -0700
    Last State:  Terminated
      Reason:    Error
      Message:    caller=main.go:657 msg="Starting TSDB ..."
level=info ts=2020-01-10T00:37:20.080Z caller=web.go:450 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-01-10T00:37:20.085Z caller=head.go:514 component=tsdb msg="replaying WAL, this may take awhile"
level=info ts=2020-01-10T00:37:20.085Z caller=head.go:562 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2020-01-10T00:37:20.086Z caller=main.go:672 fs_type=EXT4_SUPER_MAGIC
level=info ts=2020-01-10T00:37:20.086Z caller=main.go:673 msg="TSDB started"
level=info ts=2020-01-10T00:37:20.086Z caller=main.go:743 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2020-01-10T00:37:20.086Z caller=main.go:526 msg="Stopping scrape discovery manager..."
level=info ts=2020-01-10T00:37:20.086Z caller=main.go:540 msg="Stopping notify discovery manager..."
level=info ts=2020-01-10T00:37:20.087Z caller=main.go:522 msg="Scrape discovery manager stopped"
level=info ts=2020-01-10T00:37:20.087Z caller=main.go:562 msg="Stopping scrape manager..."
level=info ts=2020-01-10T00:37:20.087Z caller=manager.go:814 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-01-10T00:37:20.087Z caller=main.go:556 msg="Scrape manager stopped"
level=info ts=2020-01-10T00:37:20.087Z caller=manager.go:820 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-01-10T00:37:20.087Z caller=main.go:536 msg="Notify discovery manager stopped"
level=info ts=2020-01-10T00:37:20.096Z caller=notifier.go:602 component=notifier msg="Stopping notification manager..."
level=info ts=2020-01-10T00:37:20.096Z caller=main.go:727 msg="Notifier manager stopped"
level=error ts=2020-01-10T00:37:20.096Z caller=main.go:736 err="error loading config from \"/etc/prometheus/config_out/prometheus.env.yaml\": couldn't load configuration (--config.file=\"/etc/prometheus/config_out/prometheus.env.yaml\"): open /etc/prometheus/config_out/prometheus.env.yaml: no such file or directory"```
@toddnelson-sap

This comment has been minimized.

Copy link
Author

@toddnelson-sap toddnelson-sap commented Jan 13, 2020

Screen Shot 2020-01-13 at 10 42 52 AM

@pbochynski pbochynski added this to the Backlog_Huskies milestone Jan 14, 2020
@pbochynski pbochynski added the bug label Jan 14, 2020
@a-thaler a-thaler changed the title Prometheus Missing Metrics on Kyma 1.9 Prometheus missing kubelet metrics on AKS Jan 14, 2020
@hisarbalik

This comment has been minimized.

Copy link
Contributor

@hisarbalik hisarbalik commented Jan 14, 2020

Current monitoring deployment can't scrape metrics from kubelet on AKS, we are testing a patch to solve problem on AKS deployments

@a-thaler

This comment has been minimized.

Copy link
Contributor

@a-thaler a-thaler commented Jan 14, 2020

The consequence is that exactly these kind of metrics like cpu/memory are missing. All other metrics should work well. That problem is specific to AKS.

@codmleep

This comment has been minimized.

Copy link

@codmleep codmleep commented Jan 16, 2020

@a-thaler will this fix be included in a 1.9 patch or with the 1.10 release?

@hisarbalik

This comment has been minimized.

Copy link
Contributor

@hisarbalik hisarbalik commented Jan 23, 2020

Fix will be delivered with next release, as a workaround please use following patch for service monitor kubelet exporter monitoring-kubelet

to edit service monitor instance execute following command
kubectl edit servicemonitors.monitoring.coreos.com -n kyma-system monitoring-kubelet

instance configuration should look like below

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: monitoring-kubelet
    chart: monitoring-8.3.3
    heritage: Tiller
    release: monitoring
  name: monitoring-kubelet
  namespace: kyma-system
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    port: http-metrics
    scheme: http
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    path: /metrics/cadvisor
    port: http-metrics
    scheme: http
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kubelet
@toddnelson-sap

This comment has been minimized.

Copy link
Author

@toddnelson-sap toddnelson-sap commented Jan 24, 2020

This patched worked successfully on our Kyma 1.9 instance on AKS. After applying the patch I see Prometheus memory and cpu pod metrics in Grafana, yay! Thanks so much for your help!

@suleymanakbas91

This comment has been minimized.

Copy link
Member

@suleymanakbas91 suleymanakbas91 commented Jan 28, 2020

Monitoring integration tests are failing on nightly aks because of this issue.

@hisarbalik

This comment has been minimized.

Copy link
Contributor

@hisarbalik hisarbalik commented Feb 6, 2020

Kubelet deployment configuration vary depending on managed kubernetes deployment, this issue occurs on Microsoft AKS, prometheus can't scrape metrics from kubelet with an error HTTP 401.

Following patch will solve this issue on Microsoft AKS deployments

Step 1: Delete existing ServiceMonitor instance with following command

kubectl delete servicemonitors.monitoring.coreos.com monitoring-kubelet -n kyma-system

Step 2: Apply new ServiceMonitor deployment for AKS with following command

cat <<EOF | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: monitoring-kubelet
    chart: monitoring-8.3.3
    heritage: Tiller
    release: monitoring
  name: monitoring-kubelet
  namespace: kyma-system
spec:
  endpoints:
    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      honorLabels: true
      port: http-metrics
      scheme: http
    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      honorLabels: true
      path: /metrics/cadvisor
      port: http-metrics
      scheme: http
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
      - kube-system
   selector:
     matchLabels:
       k8s-app: kubelet
EOF
@a-thaler

This comment has been minimized.

Copy link
Contributor

@a-thaler a-thaler commented Feb 7, 2020

As discussed, please check the consequences of having "chart: monitoring-8.3.3" statically assigned. Do we need to update the values when updating the chart?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

6 participants
You can’t perform that action at this time.