Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io_kubernetes_pod_name Missing in new cluster #2232

Closed
ericuldall opened this Issue Nov 28, 2016 · 14 comments

Comments

Projects
None yet
2 participants
@ericuldall
Copy link

ericuldall commented Nov 28, 2016

What did you do?
Installed prometheus on a new GKE Cluster

What did you expect to see?
Same behavior as the first cluster I installed it on

What did you see instead? Under which circumstances?
The io_kubernetes_pod_name label is nowhere to be found

Environment
New env is GKE 1.4
Old env is GKE 1.3

  • System information:

    Runs from prom/prometheus Docker image

  • Prometheus version:

    GKE 1.4 is running prom 1.4
    GKE 1.3 is running prom 1.0.1

  • Prometheus configuration file:

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |-
    global:
      scrape_interval: 15s
    scrape_configs:
    - job_name: 'kubernetes-node-exporter'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - source_labels: [__meta_kubernetes_role]
        action: replace
        target_label: kubernetes_role
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100'
        target_label: __address__
    - job_name: 'kubernetes-apiservers'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-nodes'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-service-endpoints'
      scheme: https
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)(?::\d+);(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_service_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
    - job_name: 'kubernetes-services'
      scheme: https
      metrics_path: /probe
      params:
        module: [http_2xx]
      kubernetes_sd_configs:
      - role: service
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox
      - source_labels: [__param_target]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_service_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        target_label: kubernetes_name
    - job_name: 'kubernetes-pods'
      scheme: https
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: (.+):(?:\d+);(\d+)
        replacement: ${1}:${2}
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_pod_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
  • Query:
sort_desc(sum(container_memory_usage_bytes{image!=""}) by (io_kubernetes_container_name, image))

The query is not returning io_kubernetes_container_name on the new 1.4 cluster. Any idea why?

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Nov 29, 2016

The label io_kubernetes_container_name was renamed to container_name in kubernetes (more specifically kubelet/cadvisor) version 1.4.x.

@ericuldall

This comment has been minimized.

Copy link
Author

ericuldall commented Nov 29, 2016

That worked! Do you know if there exists a document on what was renamed in the new version(s)?

@ericuldall

This comment has been minimized.

Copy link
Author

ericuldall commented Nov 29, 2016

Also, I'm trying to get metrics for kubelet_running_container_count and it seems I can't access the container_name from there:

kubelet_running_container_count{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_instance_type="n1-standard-2",beta_kubernetes_io_os="linux",cloud_google_com_gke_nodepool="default-pool",failure_domain_beta_kubernetes_io_region="us-central1",failure_domain_beta_kubernetes_io_zone="us-central1-b",instance="gke-sites-stage-default-pool-e51db3eb-bf6c",job="kubernetes-nodes",kubernetes_io_hostname="gke-sites-stage-default-pool-e51db3eb-bf6c"}

Is there something I can add to my scrape job to associate that data, or perhaps a way to join via query?

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Nov 30, 2016

IIRC metrics from cadvisor we're namespaced by the kubelet and that namespacing has been removed, unfortunately there is no explicit documentation of this breaking change in any of the kubernetes/cadvisor changelogs. The kubelet_running_container_count metric still seems to be present upstream, however a lot of the key/value labels have been stripped as every new combination creates a new time-series in Prometheus, which created unnecessary load on Prometheus. The information you are querying for however, can be exposed via kube-state-metrics. Maybe that helps you?

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Nov 30, 2016

Aside from that I recommend having a look at the new sample kubernetes config, as there was a small fix recently. And you don't need a separate job configuration for the node-exporter, you can skip that section and just add a service in kubernetes that selects them, then the endpoints discovery will automatically pick it up.

@ericuldall

This comment has been minimized.

Copy link
Author

ericuldall commented Dec 1, 2016

Would I still need to run the node-exporter daemon set?

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Dec 1, 2016

Yes, the node-exporter as a DaemonSet and then a Service that selects those Pods from the DaemonSet, and then it will be discovered by the Endpoints Discovery.

@ericuldall

This comment has been minimized.

Copy link
Author

ericuldall commented Dec 2, 2016

Is there any way to scrape HPA or Deployment metrics currently?

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Dec 5, 2016

As there are essentially unlimited paths of pod creation we choose to only allow Pods directly. But you can map every use case to the existing discovery mechanisms. The easiest way is to use endpoints discovery as you now did with the DaemonSet. So in the case of Deployments, you would create a Service for each Deployment, that way an Endpoints object is kept up to date (headless Services/Endpoints can also be used to discover targets outside of your kubernetes cluster). If Services are not an option for you, then you can discover all Pods using Pod Discovery, and use a label from the Pods to give them a job label, to group them. Hope this all makes sense, let me know if you need more information.

@ericuldall

This comment has been minimized.

Copy link
Author

ericuldall commented Dec 6, 2016

This helps my basic understanding.

I think there are some very valuable metrics not being tracked in the case of deployments. For instance, I may be able to see how many pods are currently running, but what I might not see is that my deployment is trying to schedule more pods.

NAME                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE

prometheus           1         1         1            0           13d

If i'm simply looking at a graph of pods that exist, I may not notice anything is off, but having access to a metric for desired pods, current pods, up to date pods and available pods gives me a much better understanding of what my application is trying to do.

If I have an application that has 200 desired pods, but only 100 available, that might mean there is an issue scheduling pods (maybe host machine is out of cpu/memory) and I can take action on metrics like that. Not sure if the service setup you mention gives way to such metrics.

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Dec 6, 2016

You can get all the information you are looking for regarding deployments using kube-state-metrics.

@ericuldall

This comment has been minimized.

Copy link
Author

ericuldall commented Dec 6, 2016

We installed the kube-state-metrics deployment and service, but we're not seeing the metrics come back from prometheus.

Here's the configmap section for the service-endpoints:

- job_name: 'kubernetes-service-endpoints'
  scheme: https
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: (.+)(?::\d+);(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_service_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name

Does that seem proper?

@ericuldall

This comment has been minimized.

Copy link
Author

ericuldall commented Dec 6, 2016

Sorry, we found the issue was with scheme: https, that is not supported by kube-state-metrics. I think this is just what we've been missing! Thanks for all your help.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.