Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus server high CPU usage for Label sorting #3944

Closed
mxork opened this Issue Mar 9, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@mxork
Copy link

mxork commented Mar 9, 2018

Using prometheus as monitor on kubernetes cluster. ~100 node_exporters, ~300 targets total. Running go tool pprof --gif $server/debug/pprof/profile > prof.gif shows high CPU usage for sorting Labels.

prof.gif

We're running prometheus as a fairly by-the-books kubernetes custom-metrics-apiserver, using k8s-prometheus-adapter.

The code (https://github.com/prometheus/prometheus/blob/master/pkg/labels/labels.go) seems to have two reasons for running slowly: one is passing through Sort interface, the other is the repeated comparison of strings with a possibly shared prefix. Using labels to represent paths in a hierarchy seems like a reasonable use case, and shouldn't lead to poor performance. If anyone has a one-liner to generate the list of labels being operated on, I can dump them.

If my diagnosis is correct, there are low-hanging solutions. One is to go:generate a sort procedure which does not pass through the interface barrier (several generators are floating around). This is simplest, but does not address the shared-prefix problem. Two is to implement a trie-sort, which does.

System information:

Centos 7.4
Linux 3.10.0-693.11.6.el7.x86_64 x86_64

Prometheus version:

2.1.0

Prometheus configuration file:

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093
rule_files:
- /etc/config/rules
- /etc/config/alerts
scrape_configs:
- job_name: prometheus
  static_configs:
  - targets:
    - localhost:9090
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  job_name: kubernetes-apiservers
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - action: keep
    regex: default;kubernetes;https
    source_labels:
    - __meta_kubernetes_namespace
    - __meta_kubernetes_service_name
    - __meta_kubernetes_endpoint_port_name
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  job_name: kubernetes-nodes
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - replacement: kubernetes.default.svc:443
    target_label: __address__
  - regex: (.+)
    replacement: /api/v1/nodes/${1}/proxy/metrics
    source_labels:
    - __meta_kubernetes_node_name
    target_label: __metrics_path__
  scheme: https
  tls_config:
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 9, 2018

Can you try with a newer version of Prometheus? You're probably hitting an issue with SD that was fixed.

@mxork

This comment has been minimized.

Copy link
Author

mxork commented Mar 9, 2018

Just bumped to 2.2.0. Node doesn't appear at all.

Thanks for the quick response! Glad sorting wasn't actually a bottleneck.

@mxork mxork closed this Mar 9, 2018

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.