Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes-cluster issue with 0.20.0 (malformed HTTP response) #1752

Closed
sbadia opened this Issue Jun 17, 2016 · 12 comments

Comments

Projects
None yet
5 participants
@sbadia
Copy link

sbadia commented Jun 17, 2016

What did you do?

Just upgraded from prometheus 0.19.3 to 0.20.0,

--- a/etc/prometheus.yml
+++ b/etc/prometheus.yml
@@ -20,7 +20,7 @@ scrape_configs:
   # metrics_path defaults to '/metrics'
   # scheme defaults to 'http'.

-  target_groups:
+  static_configs:
     - targets: ['localhost:9090']

 - job_name: 'kubernetes-cluster'

I only changed target_groups => static_configs in my prometheus.yml configuration file.

What did you expect to see?

With release 0.20.0 the kubernetes-sd/kubernetes-cluster doesn't works anymore (or I missed a config option…).

What did you see instead? Under which circumstances?

On webui, Targets > kubernetes-cluster > Endpoint my nodes fail with this error malformed HTTP response

  • Endpoint: http://10.1.2.2:10250/metrics
  • Error: Get http://10.1.2.2:10250/metrics: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

Maybe linked to 206bcfc ?

Environment

  • System information:

Linux 3.16.0-4-amd64 x86_64

  • Prometheus version:
prometheus, version 0.20.0 (branch: HEAD, revision: aeab25c)
  build user:       root@914cf42a3e15
  build date:       20160617-08:22:59
  go version:       go1.6.1
  • Prometheus configuration file:
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
  # scrape_timeout is set to the global default (10s).

# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
    - "/opt/prometheus/etc/rules/alerts"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'

  # Override the global default and scrape targets from this job every 5 seconds.
  scrape_interval: 25s
  scrape_timeout: 10s

  # metrics_path defaults to '/metrics'
  # scheme defaults to 'http'.

  static_configs:
    - targets: ['localhost:9090']

- job_name: 'kubernetes-cluster'

  # This TLS & bearer token file config is used to connect to the actual scrape
  # endpoints for cluster components. This is separate to discovery auth
  # configuration (`in_cluster` below) because discovery & scraping are two
  # separate concerns in Prometheus.
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true

  relabel_configs:
  - source_labels: [__meta_kubernetes_role]
    action: keep
    regex: (?:apiserver|node)
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_role]
    action: replace
    target_label: kubernetes_role

# Scrape config for service endpoints.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/scrape`: Only scrape services that have a value of `true`
# * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
# to set this to `https` & most likely set the `tls_config` of the scrape config.
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: If the metrics are exposed on a different port to the
# service then set this appropriately.
- job_name: 'kubernetes-service-endpoints'

  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true

  relabel_configs:
  - source_labels: [__meta_kubernetes_role, __meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: endpoint;true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: (.+)(?::\d+);(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_role]
    action: replace
    target_label: kubernetes_role
  - source_labels: [__meta_kubernetes_service_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name

- job_name: 'kubernetes-service-endpoints-2'

  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true

  relabel_configs:
  - source_labels: [__meta_kubernetes_role, __meta_kubernetes_service_annotation_prometheus_io_scrape2]
    action: keep
    regex: endpoint;true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme2]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path2]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port2]
    action: replace
    target_label: __address__
    regex: (.+)(?::\d+);(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_role2]
    action: replace
    target_label: kubernetes_role
  - source_labels: [__meta_kubernetes_service_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name

- job_name: 'kubernetes-pods'

  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true

  relabel_configs:
  - source_labels: [__meta_kubernetes_role, __meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: pod;true
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: (.+):(?:\d+);(\d+)
    replacement: ${1}:${2}
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_pod_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name

- job_name: 'blackbox-exporter'
  metrics_path: /probe

  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true

  relabel_configs:
  - source_labels: [__meta_kubernetes_role, __meta_kubernetes_service_annotation_prometheus_io_blackbox]
    action: keep
    regex: endpoint;true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_blackbox_module]
    action: replace
    target_label: __param_module
    regex: (.*)
    replacement: ${1}
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_blackbox_target]
    action: replace
    target_label: __param_target
    regex: (.*)
    replacement: ${1}
  - source_labels: [__address__]
    action: replace
    target_label: __address__
    replacement: '127.0.0.1:9115'
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_service_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name

- job_name: 'consul'

  consul_sd_configs:
  - server:   'consul:8500'

  relabel_configs:
  - source_labels: ['__meta_consul_node']
    regex:         '(.*)'
    target_label:  'hostname'
    replacement:   '$1'
  - source_labels: ['__meta_consul_tags']
    regex:         ',,'
    action:        drop
@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jun 17, 2016

Looks like it's trying to scrape an https endpoint via http - you'll need to set the scheme in config to https I think,

@sbadia

This comment has been minimized.

Copy link
Author

sbadia commented Jun 17, 2016

Ok, indeed, thanks, I finally have a working setup using this diff, is it intended? (changes between 0.19.3 => 0.20.0)

--- i/etc/prometheus.yml
+++ w/etc/prometheus.yml
@@ -30,6 +30,7 @@ scrape_configs:
   # configuration (`in_cluster` below) because discovery & scraping are two
   # separate concerns in Prometheus.
   tls_config:
+    insecure_skip_verify: true
     ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
   bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

@@ -42,6 +43,10 @@ scrape_configs:
   - source_labels: [__meta_kubernetes_role]
     action: keep
     regex: (?:apiserver|node)
+  - source_labels: [__scheme__]
+    action: replace
+    replacement: 'https'
+    target_label: __scheme__
   - action: labelmap
     regex: __meta_kubernetes_node_label_(.+)
   - source_labels: [__meta_kubernetes_role]
@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jun 17, 2016

Is that a diff from the example config? If so, then the dropping insecure_skip_verify is intentional but relabelling __scheme__ should probably have remained.

@sbadia

This comment has been minimized.

Copy link
Author

sbadia commented Jun 20, 2016

Hum, ok, no it's a diff from my initial post and configuration #1752 (comment)

@mgoodness

This comment has been minimized.

Copy link

mgoodness commented Jun 24, 2016

I'm running into this as well. Shouldn't Prometheus be scraping the Kubelet metrics on port 10255? That is the default read-only port, and the one Heapster uses, so (at least in my case) it is exposed on all worker nodes. Port 10250, meanwhile, is the default Kubelet "control" port and is (or should be) only externally accessible by controller nodes.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 24, 2016

Can confirm, also ran into the port issue. Not sure whether it relates to the port changes we did, but it seems like a regression in general.

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jun 24, 2016

You can of course change the port via relabelling. As I said above the relabelling of scheme to https shouldn't have been removed which is likely the cause of this issue. The read only port is not enabled by default & secure environments dont enable it, scraping over kubelet port instead.

@mgoodness

This comment has been minimized.

Copy link

mgoodness commented Jun 24, 2016

I have confirmed that, using the example configuration in this repo, v0.19.3 scrapes port 10255 (the http, read-only port), while v0.20.0 scrapes port 10250 (which is https). Heapster's default configuration is also to scrape 10255.

If Prometheus wants to scrape 10250, then the scheme must be relabeled to https and insecure_skip_verify must be enabled. Looking at the history, neither of those seem to have been required until this version.

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jun 24, 2016

There has been discussion on removing the read only port for a while (kubernetes/kubernetes#12968) as there is potential for info leakage as no encryption, auth, etc. Port 10255 is an arbitrary port number that cannot be discovered. Adding an option for it gives no functional benefit over relabelling.

As for the TLS bit, the problem is there are too many variations in cluster set up. In secure envs, the cert of the CA used to sign both the api server & kubelet certs is distributed to all pods automatically so the connection to the kubelet over discovered port can be validated. That should remain the default IMO, connecting to read only port or disabling TLS verification should be opt in with knowledge of the potential (albeit probably minimal) risks.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 24, 2016

That all makes sense to me. Thanks for the explanation.

@vbasavani

This comment has been minimized.

Copy link

vbasavani commented Feb 3, 2018

global:
  scrape_interval: 15s
  scrape_timeout: 15s
  evaluation_interval: 15s
scrape_configs:
- job_name: prometheus
  scrape_interval: 15s
  scrape_timeout: 15s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - ip-10-100-1-34:9090
- job_name: kubernetes-pods
  scrape_interval: 15s
  scrape_timeout: 15s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
  - api_server: https://api.internal.example.com
    role: pod
    basic_auth:
      username: admin
      password: <secret>
    tls_config:
      ca_file: /opt/prometheus-2.1.0.linux-amd64/ca.crt
      server_name: kubernetes
      insecure_skip_verify: false
    namespaces:
      names: []
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    separator: ;
    regex: "true"
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_namespace, __meta_kubernetes_pod_label_name]
    separator: /
    regex: (.*)
    target_label: job
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_node_name]
    separator: ;
    regex: (.*)
    target_label: node
    replacement: $1
    action: replace
- job_name: kubernetes-nodes
  scrape_interval: 15s
  scrape_timeout: 15s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
  - api_server: https://api.internal.example.com
    role: node
    basic_auth:
      username: admin
      password: <secret>
    namespaces:
      names: []
  relabel_configs:
  - separator: ;
    regex: __meta_kubernetes_node_label_(.+)
    replacement: $1
    action: labelmap
```

I am getting the below error.
```
Get https://x.x.x.x:10250/metrics: x509: cannot validate certificate for x.x.x.x because it doesn't contain any IP SANs
```
can anybody help me on how to resolve this issue?
@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.