Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upInsufficient logging of SD problems #3810
Comments
This comment has been minimized.
This comment has been minimized.
|
For now, I have isolated this server and can introspect it more. I'll need to restart it before the end of the week. |
This comment has been minimized.
This comment has been minimized.
|
Also note that a SIGHUP (visible in the logs) did not un-stick this. I don't know if the DNS SD issues are related, it could be that it's stuck overall. |
This comment has been minimized.
This comment has been minimized.
|
Sooo, partially the problem seems to have been related to DNS. I restarted a DNS cache and the Kubernetes discovery has recovered now (logging nothing about either problems or discovery, which is really a problem in itself). I'm still seeing DNS resolution errors (that don't say what the name was that it could not resolve, that makes things not very helpful. I'll update the issue title and description accordingly. |
matthiasr commentedFeb 7, 2018
•
edited
See below for the original issue that I was trying to investigate. The situation was confused by lack of logging about issues talking to Kubernetes, and by not including the information which DNS names could not be resolved.
What did you do?
Have some DNS issues, that also affected the Kubernetes discovery.
What did you expect to see?
Prometheus logs about issues with the Kubernetes discovery. Prometheus includes enough information in logs to identify the job and (for DNS SD) address that failed.
What did you see instead? Under which circumstances?
No messages about Kubernetes discovery at all. Repeated messages about some issue with DNS but with no further details.
==================
What did you do?
Use Prometheus 2.1 with Kubernetes Discovery. Deploy so that pods get cycled around.
What did you expect to see?
Prometheus updates scrape targets.
What did you see instead? Under which circumstances?
At least one Prometheus server stopped updating targets completely, apparently across multiple jobs.
Environment
System information:
Linux 4.4.10+soundcloud x86_64
Prometheus version:
config & logs
Prometheus configuration file:--- global: scrape_interval: 30s scrape_timeout: 30s evaluation_interval: 30s external_labels: cluster: k2 owner: prodeng zone: db rule_files: - "/data1/prometheus/rules/prometheus_base.rules.yml" - "/data1/prometheus/rules/k8s.rules.yml" - "/data1/prometheus/rules/k8s_user_alerts.rules.yml" - "/data1/prometheus/rules/node.rules.yml" - "/data1/prometheus/rules/k8s-log-archive.rules.yml" - "/data1/prometheus/rules/log-archive.rules.yml" - "/data1/prometheus/rules/kafka.rules.yml" scrape_configs: - job_name: prometheus static_configs: - targets: - localhost:9090 scrape_interval: 15s - relabel_configs: - action: keep source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_pod_annotation_prometheus_io_port regex: ".+;(?:[0-9]+,?)+|;" - action: keep source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_pod_label_system regex: ".+;k8s|;" - source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scheme target_label: __scheme__ regex: "(https?)" - source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ regex: "(.+)" - source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ regex: "(.+?)(?::\\d+)?;(\\d+)" replacement: "$1:$2" - target_label: cluster replacement: k2 - source_labels: - __meta_kubernetes_pod_label_system target_label: system - source_labels: - __meta_kubernetes_pod_label_env target_label: env - source_labels: - __meta_kubernetes_pod_label_component target_label: component - source_labels: - __meta_kubernetes_pod_label_version target_label: version - source_labels: - __meta_kubernetes_pod_label_track target_label: track - source_labels: - __meta_kubernetes_pod_name target_label: instance - source_labels: - __meta_kubernetes_namespace target_label: namespace - action: drop source_labels: - __meta_kubernetes_pod_label_component regex: ingress|kube-state-metrics - action: replace source_labels: - __meta_kubernetes_pod_label_system - __meta_kubernetes_pod_label_component target_label: job regex: "([^;]+);([^;]+)" replacement: "$1-$2" job_name: k8s-cluster-components kubernetes_sd_configs: - api_server: https:// role: pod tls_config: ca_file: "/data1/prometheus/k8s-certificates/k2/ca.crt" cert_file: "/data1/prometheus/k8s-certificates/k2/client.crt" key_file: "/data1/prometheus/k8s-certificates/k2/client.key" - honor_labels: true metric_relabel_configs: - action: replace source_labels: - pod target_label: pod_name - action: replace source_labels: - container target_label: container_name - action: replace source_labels: - label_system target_label: system - action: replace source_labels: - label_env target_label: env - action: replace source_labels: - label_component target_label: component - action: replace source_labels: - label_version target_label: version - action: replace source_labels: - label_track target_label: track job_name: k8s-kube-state-metrics kubernetes_sd_configs: - api_server: https:// role: pod tls_config: ca_file: "/data1/prometheus/k8s-certificates/k2/ca.crt" cert_file: "/data1/prometheus/k8s-certificates/k2/client.crt" key_file: "/data1/prometheus/k8s-certificates/k2/client.key" relabel_configs: - action: keep source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_pod_annotation_prometheus_io_port regex: ".+;(?:[0-9]+,?)+|;" - action: keep source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_pod_label_system regex: ".+;k8s|;" - action: keep source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_pod_label_component regex: ".+;kube-state-metrics|;" - source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scheme target_label: __scheme__ regex: "(https?)" - source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ regex: "(.+)" - source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ regex: "(.+?)(?::\\d+)?;(\\d+)" replacement: "$1:$2" - target_label: cluster replacement: k2 - source_labels: - __meta_kubernetes_pod_label_system target_label: system - source_labels: - __meta_kubernetes_pod_label_env target_label: env - source_labels: - __meta_kubernetes_pod_label_component target_label: component - source_labels: - __meta_kubernetes_pod_label_version target_label: version - source_labels: - __meta_kubernetes_pod_label_track target_label: track - source_labels: - __meta_kubernetes_pod_name target_label: instance - source_labels: - __meta_kubernetes_namespace target_label: namespace - relabel_configs: - action: replace source_labels: - __address__ target_label: node regex: "([\\w.-]*?):\\d+" replacement: "$1" job_name: apiserver-node dns_sd_configs: - names: - refresh_interval: 10s - scheme: https relabel_configs: - action: replace source_labels: - __address__ target_label: node regex: "([\\w.-]*?):\\d+" replacement: "$1" tls_config: ca_file: "/data1/prometheus/k8s-certificates/k2/ca.crt" cert_file: "/data1/prometheus/k8s-certificates/k2/client.crt" key_file: "/data1/prometheus/k8s-certificates/k2/client.key" insecure_skip_verify: true job_name: apiserver dns_sd_configs: - names: - refresh_interval: 10s - relabel_configs: - action: replace source_labels: - __address__ target_label: node regex: "([\\w.-]*?):\\d+" replacement: "$1" job_name: kube-scheduler dns_sd_configs: - names: - refresh_interval: 10s - relabel_configs: - action: replace source_labels: - __address__ target_label: node regex: "([\\w.-]*?):\\d+" replacement: "$1" job_name: kube-controller-manager dns_sd_configs: - names: - refresh_interval: 10s - scheme: https relabel_configs: - action: replace source_labels: - __address__ target_label: node regex: "([\\w.-]*?):\\d+" replacement: "$1" tls_config: ca_file: "/data1/prometheus/k8s-certificates/k2/etcd-ca.crt" cert_file: "/data1/prometheus/k8s-certificates/k2/etcd-client.crt" key_file: "/data1/prometheus/k8s-certificates/k2/etcd-client.key" insecure_skip_verify: true job_name: etcd dns_sd_configs: - names: - refresh_interval: 10s - relabel_configs: - action: replace source_labels: - __address__ target_label: node regex: "([\\w.-]*?):\\d+" replacement: "$1" job_name: kubelet dns_sd_configs: - names: - refresh_interval: 10s - relabel_configs: - action: replace source_labels: - __address__ target_label: node regex: "([\\w.-]*?):\\d+" replacement: "$1" job_name: kubelet-node dns_sd_configs: - names: - refresh_interval: 10s - relabel_configs: - action: replace source_labels: - __address__ target_label: node regex: "([\\w.-]*?):\\d+" replacement: "$1" job_name: k8s-logs-kafka dns_sd_configs: - names: - refresh_interval: 10s - relabel_configs: - action: replace source_labels: - __address__ target_label: node regex: "([\\w.-]*?):\\d+" replacement: "$1" job_name: k8s-logs-kafka-node dns_sd_configs: - names: - refresh_interval: 10s - relabel_configs: - action: replace source_labels: - __address__ target_label: node regex: "([\\w.-]*?):\\d+" replacement: "$1" job_name: k8s-logs-zookeeper dns_sd_configs: - names: - refresh_interval: 10s - job_name: k8s-logs-consumer-lag kubernetes_sd_configs: - api_server: https:// role: pod tls_config: ca_file: "/data1/prometheus/k8s-certificates/k2/ca.crt" cert_file: "/data1/prometheus/k8s-certificates/k2/client.crt" key_file: "/data1/prometheus/k8s-certificates/k2/client.key" relabel_configs: - action: keep source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_pod_annotation_prometheus_io_port regex: ".+;(?:[0-9]+,?)+|;" - action: keep source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_pod_label_system regex: ".+;kafka-notifier|;" - action: keep source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_pod_label_env regex: ".+;k8s-logs|;" - source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scheme target_label: __scheme__ regex: "(https?)" - source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ regex: "(.+)" - source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ regex: "(.+?)(?::\\d+)?;(\\d+)" replacement: "$1:$2" - target_label: cluster replacement: k2 - source_labels: - __meta_kubernetes_pod_label_system target_label: system - source_labels: - __meta_kubernetes_pod_label_env target_label: env - source_labels: - __meta_kubernetes_pod_label_component target_label: component - source_labels: - __meta_kubernetes_pod_label_version target_label: version - source_labels: - __meta_kubernetes_pod_label_track target_label: track - source_labels: - __meta_kubernetes_pod_name target_label: instance - source_labels: - __meta_kubernetes_namespace target_label: namespace - metric_relabel_configs: - action: drop source_labels: - __name__ regex: kafka2hdfs_kafka_.* job_name: k8s-logs-log-archive-kafka2hdfs kubernetes_sd_configs: - api_server: https:// role: pod tls_config: ca_file: "/data1/prometheus/k8s-certificates/k2/ca.crt" cert_file: "/data1/prometheus/k8s-certificates/k2/client.crt" key_file: "/data1/prometheus/k8s-certificates/k2/client.key" relabel_configs: - action: keep source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_pod_annotation_prometheus_io_port regex: ".+;(?:[0-9]+,?)+|;" - action: keep source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_pod_label_component regex: ".+;log-?archive.*|;" - source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scheme target_label: __scheme__ regex: "(https?)" - source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ regex: "(.+)" - source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ regex: "(.+?)(?::\\d+)?;(\\d+)" replacement: "$1:$2" - target_label: cluster replacement: k2 - source_labels: - __meta_kubernetes_pod_label_system target_label: system - source_labels: - __meta_kubernetes_pod_label_env target_label: env - source_labels: - __meta_kubernetes_pod_label_component target_label: component - source_labels: - __meta_kubernetes_pod_label_version target_label: version - source_labels: - __meta_kubernetes_pod_label_track target_label: track - source_labels: - __meta_kubernetes_pod_name target_label: instance - source_labels: - __meta_kubernetes_namespace target_label: namespace - job_name: docker-registry static_configs: - targets: - :9100 alerting: alert_relabel_configs: - source_labels: - severity regex: critical|warning target_label: severity replacement: info alertmanagers: - dns_sd_configs: - names: -