Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v0.19.1]K8S SD failing if the cluster is in OpenStack #1680

Closed
Krylon360 opened this Issue May 27, 2016 · 10 comments

Comments

Projects
None yet
5 participants
@Krylon360
Copy link

Krylon360 commented May 27, 2016

OpenStack utilizes 2 IP Addresses. A Private (Management IP) and a Public (Floating) IP.
with 0.19.1; Prometheus is only detecting the Private IP; thus it fails it's probe when hitting the Node Port and cAdvisor port.

screen shot 2016-05-27 at 11 47 20 am

If I hit that same path using the Public (FloatingIP) that's tied to that endpoint; the /metrics page pulls up just fine.

So; is this a Prometheus issue? or a Kubernetes issue?
I'm going through both repos to see where the problem might be coming from.
This worked fine using the 0.19.0 container; started failing with 0.19.1.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented May 27, 2016

Can you paste your configuration. In general for service-endpoint discovery in k8s nothing should have changed from 0.18.0. Even less so from 0.19.0 – see the diff between 0.19.0 and 0.19.1 here

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented May 27, 2016

@Krylon360

This comment has been minimized.

Copy link
Author

Krylon360 commented Jun 6, 2016

With OpenStack Networking; the IP isn't public perse ex: 8.8.8.8.
the Private IP is the management Network (Intercluster Communication).
a separate network is configured for External/Public addressing (Intra-Cluster Networking).
A Port bind is then created that maps the Private IP to the Public IP.
SSH / all other services the use the network use the public ip.

I'm pretty sure our config could use an update... I'll snip just the kubernetes section. We are running Prometheus, Blackbox-Exporter, Node-Exporter, Stats-D Exporter, AlertManager, and Grafana outside of the Kubernetes Cluster on a dedicated server; hence the BearerToken setting being flagged.

if the Git-Markdown jacks the formatting on the config below; here is a gist of the same config.
prometheus.yaml

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'service-kubernetes'

    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /bearer_token

    kubernetes_sd_configs:
    - api_servers:
      - ${API_SERVER}
      in_cluster: false
      tls_config:
        insecure_skip_verify: true
      bearer_token_file: /bearer_token

    relabel_configs:
    - source_labels: [__meta_kubernetes_role]
      action: keep
      regex: (?:apiserver|node)
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
      replacement: '$1'
    - source_labels: [__meta_kubernetes_role]
      regex: '(.*)'
      target_label: kubernetes_role
      replacement: '$1'

  - job_name: 'node'

    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /bearer_token

    kubernetes_sd_configs:
    - api_servers:
      - ${API_SERVER}
      in_cluster: false
      tls_config:
        insecure_skip_verify: true
      bearer_token_file: /bearer_token

    relabel_configs:
    - source_labels: [__meta_kubernetes_role, __meta_kubernetes_service_name]
      action: keep
      regex: endpoint;nodeexporter

  - job_name: 'kubernetes-service-endpoints'

    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /bearer_token

    kubernetes_sd_configs:
    - api_servers:
      - ${API_SERVER}
      in_cluster: false
      tls_config:
        insecure_skip_verify: true
      bearer_token_file: /bearer_token

    # TODO add additional relabeling
    relabel_configs:
    - source_labels: [__meta_kubernetes_role, __meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep
      regex: endpoint;true
    - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
      target_label: __address__
      regex: (.+)(?::\d+);(\d+)
      replacement: $1:$2
@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jun 6, 2016

Here is where I think this is having the problem. This currently uses the internal node IP if available, which shows that it expects Prometheus to be run in-cluster. Not sure of the best fix for this: make it configurable in SD or emit a target for each node IP & filter by relabelling?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 6, 2016

The additional IPs should be available as labels for relabelling.

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jun 6, 2016

Ah OK so single target with labels for each node IP - great idea.

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jun 7, 2016

#1712 makes all node IPs available from Kubernetes API server available for relabelling from meta labels __meta_kubernetes_node_address_<type> so you should be able to use __meta_kubernetes_node_address_external_ip in your relabel config. It still defaults to using internal IP.

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jun 7, 2016

Just an update: after review meta label will be meta_kubernetes_node_address_ExternalIP rather than what I said above.

@Krylon360

This comment has been minimized.

Copy link
Author

Krylon360 commented Jun 7, 2016

Awesome! Thanks everyone for the quick turnaround!

Bryce Walter
On Jun 7, 2016 5:40 AM, "Jimmi Dyson" notifications@github.com wrote:

Just an update: after review meta label will be
meta_kubernetes_node_address_ExternalIP rather than what I said above.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1680 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AATizoumhTJmA9U6Z1062_rYWv1I1zRnks5qJViSgaJpZM4Ioxba
.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.