Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrape target not honoring https scheme #5108

Open
ev1lm0nk3y opened this Issue Jan 17, 2019 · 10 comments

Comments

Projects
None yet
5 participants
@ev1lm0nk3y
Copy link

ev1lm0nk3y commented Jan 17, 2019

Bug Report

What did you do?
Attempt to scrape a Kubernetes API endpoint using scheme https and a bearer_token.

What did you expect to see?
A successful scrape over https, like https://<api_endpoint_ip_address>:443/..., not http://<api_endpoint_ip_address>:443/...

What did you see instead? Under which circumstances?
A scrape error for the airflow-metrics target, while the federated_prometheus target has no problems.

Get http://<api_endpoint_ip_address>:443/api/v1/namespaces/default/services/<service>:8080/proxy/admin/metrics/: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

curl commands to validate the issue
Works:

curl -k https://<api_endpoint_ip_address>:443/api/v1/namespaces/default/services/<service>:8080/proxy/admin/metrics/ -H "Authorization: Bearer ${TOKEN}"

Doesn't Work:

$ curl -k http://<api_endpoint_ip_address>:443/api/v1/namespaces/default/services/<service>:8080/proxy/admin/metrics/ -H "Authorization: Bearer ${TOKEN}"

curl: (56) Recv failure: Connection reset by peer

Environment

  • System information:
    for local testing Darwin 17.7.0 x86_64
    usually I'm running prometheus in a GKE cluster running node and master versions 1.11.5

  • Scrape Target GKE master version:
    airflow cluster: 1.9.7
    federated GKE cluster: 1.11.5

  • Prometheus version:
    v2.4.0 on the GKE cluster
    v2.6.1 locally

  • Prometheus configuration file:
    This is a sample configuration that I'm running locally that shows the 2 different scrape targets with similar configurations. The same error is seen both locally and in GKE.

global:
  scrape_interval: 30s
  scrape_timeout: 10s
  evaluation_interval: 30s
scrape_configs:
- job_name: airflow-metrics
  honor_labels: true
  metrics_path: /api/v1/namespaces/default/services/<service>:8080/proxy/admin/metrics
  scheme: https
  params:
    match[]:
      - '{__name__=~"airflow.*"}'
  static_configs:
  - targets:
    - <api_endpoint_ip_address>
  bearer_token: <blah>
  tls_config:
    insecure_tls_verify: true
- job_name: federated_prometheus
  honor_labels: true
  static_configs:
    - targets:
      - <gke_api_endpoint_ip>
  scrape_interval: 60s
  scrape_timeout: 60s
  params:
    match[]:
      - '{namespace="default"}'
  metrics_path: /api/v1/namespaces/default/services/prometheus:9090/proxy/federate
  scheme: https
  bearer_token: <blah>
  tls_config:
    insecure_skip_verify: true
  • Logs:
12:06 $ prometheus --log.level=debug 
level=info ts=2019-01-17T17:06:50.679835Z caller=main.go:243 msg="Starting Prometheus" version="(version=2.6.1, branch=non-git, revision=non-git)"
level=info ts=2019-01-17T17:06:50.68023Z caller=main.go:244 build_context="(go=go1.11.4, user=brew@HighSierra-2.local, date=20190116-23:28:29)"
level=info ts=2019-01-17T17:06:50.680255Z caller=main.go:245 host_details=(darwin)
level=info ts=2019-01-17T17:06:50.680276Z caller=main.go:246 fd_limits="(soft=256, hard=unlimited)"
level=info ts=2019-01-17T17:06:50.680294Z caller=main.go:247 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-01-17T17:06:50.680886Z caller=main.go:561 msg="Starting TSDB ..."
level=info ts=2019-01-17T17:06:50.681758Z caller=web.go:429 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-01-17T17:06:51.003115Z caller=main.go:571 msg="TSDB started"
level=info ts=2019-01-17T17:06:51.003168Z caller=main.go:631 msg="Loading configuration file" filename=prometheus.yml
level=debug ts=2019-01-17T17:06:51.00729Z caller=manager.go:213 component="discovery manager scrape" msg="Starting provider" provider=string/0 subs=[airflow-metrics]
level=debug ts=2019-01-17T17:06:51.00772Z caller=manager.go:213 component="discovery manager scrape" msg="Starting provider" provider=string/1 subs=[federated_prometheus]
level=info ts=2019-01-17T17:06:51.007758Z caller=main.go:657 msg="Completed loading of configuration file" filename=prometheus.yml
level=info ts=2019-01-17T17:06:51.007773Z caller=main.go:530 msg="Server is ready to receive web requests."
level=debug ts=2019-01-17T17:06:51.007784Z caller=manager.go:231 component="discovery manager scrape" msg="discoverer channel closed" provider=string/0
level=debug ts=2019-01-17T17:06:51.007835Z caller=manager.go:231 component="discovery manager scrape" msg="discoverer channel closed" provider=string/1
level=debug ts=2019-01-17T17:07:06.926837Z caller=scrape.go:825 component="scrape manager" scrape_pool=airflow-metrics target="https://<api_endpoint_ip_address>:443/api/v1/namespaces/default/services/<service>:8080/proxy/admin/metrics?match%5B%5D=%7B__name__%3D~%22airflow.%2A%22%7D" msg="Scrape failed" err="Get http://<api_endpoint_ip_address>:443/api/v1/namespaces/default/services/<service>:8080/proxy/admin/metrics/?match%5B%5D=%7B__name__%3D~%22airflow.%2A%22%7D: net/http: HTTP/1.x transport connection broken: malformed HTTP response \"\\x15\\x03\\x01\\x00\\x02\\x02\""
^Clevel=warn ts=2019-01-17T17:07:16.21424Z caller=main.go:405 msg="Received SIGTERM, exiting gracefully..."
@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jan 18, 2019

Hmm, this is strange. Can you share a screenshot of the Service Discovery page with all targets as well as the Targets page?

@ev1lm0nk3y

This comment has been minimized.

Copy link
Author

ev1lm0nk3y commented Jan 18, 2019

Targets:
screen shot 2019-01-18 at 9 44 49 am

Service Discovery:
screen shot 2019-01-18 at 9 45 20 am

These are from my local prometheus deployment.

@ev1lm0nk3y

This comment has been minimized.

Copy link
Author

ev1lm0nk3y commented Jan 18, 2019

While I think there is still a problem, I believe that I have resolved the immediate issue. The metrics_path that I was using caused a 301 to be returned from the target, specifically needing to add a trailing / to the path. Because the scrape target is actually the kubernetes API endpoint, not the service itself, the 301 from the service came back as an http:// redirect. Here's the verbose curl output

$ curl -v -k https://<api_endpoint_ip_address>:443/api/v1/namespaces/default/services/<service>:8080/proxy/admin/metrics -H "Authorization: Bearer $TOKEN"
*   Trying <api_endpoint_ip_address>...
* TCP_NODELAY set
* Connected to <api_endpoint_ip_address> (<api_endpoint_ip_address>) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Request CERT (13):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=<api_endpoint_ip_address>
*  start date: Sep 18 12:37:35 2018 GMT
*  expire date: Sep 17 12:37:35 2023 GMT
*  issuer: CN=1d0c9ada-8bf7-408b-92fb-a4f715e32eb5
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fee73800400)
> GET /api/v1/namespaces/default/services/<service>:8080/proxy/admin/metrics HTTP/2
> Host: <api_endpoint_ip_address>
> User-Agent: curl/7.54.0
> Accept: */*
> Authorization: Bearer  blahblahblah
> 
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 301 
< audit-id: b8d54cfb-8290-405f-86a4-fd33884e5f7d
< content-type: text/html; charset=utf-8
< date: Fri, 18 Jan 2019 18:45:38 GMT
< location: http://<api_endpoint_ip_address>/api/v1/namespaces/default/services/<service>:8080/proxy/admin/metrics/
< server: gunicorn/19.9.0
< content-length: 279
< 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
* Connection #0 to host <api_endpoint_ip_address> left intact
<p>You should be redirected automatically to target URL: <a href="http://<api_endpoint_ip_address>/admin/metrics/">http://<api_endpoint_ip_address>/admin/metrics/</a>.  If not click the link.

So this explains why if I have a scrape config with

metrics_path: /api/v1/namespaces/default/services/<service>:8080/proxy/admin/metrics/

I don't get the malformed 301 and all of a sudden I have a working endpoint.

I guess this just turns into a "please generate greater debug logging" request.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jan 21, 2019

Interesting finding!

@travisgroth

This comment has been minimized.

Copy link

travisgroth commented Jan 22, 2019

Possible suggestion - disable redirect following or error on 301/302 unless explicitly requested via configuration; like curl.

I'm not sure how easy this is to achieve in the http client, but that would make corner cases like this fairly obvious.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 22, 2019

It's easy to disable it, but it'd count as a breaking change (even though this is something to avoid).

@roidelapluie

This comment has been minimized.

Copy link
Contributor

roidelapluie commented Feb 18, 2019

I agree to put this on the list for Prometheus 3.0.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 18, 2019

Hmm, could we handle the redirect better? There's basically the same issue over in the blackbox exporter too.

@roidelapluie

This comment has been minimized.

Copy link
Contributor

roidelapluie commented Mar 9, 2019

What do you mean?

A HTTPS target that redirects to HTTP might be valid.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 9, 2019

It'd valid, but inefficient. We should scrape the required target directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.