Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes - excessive requests to apiserver - 1.3.0 #2282

Closed
rrichardson opened this Issue Dec 14, 2016 · 5 comments

Comments

Projects
None yet
4 participants
@rrichardson
Copy link

rrichardson commented Dec 14, 2016

What did you do?

Installed 1.3.0 with new config for 1.3.0  on HA, 3 master k8s 1.4.6

What did you expect to see?

The requests to the API server drop. 

What did you see instead? Under which circumstances?

Every 10 seconds (scrape time, I assume) spamming of the apiserver with a bunch requests from prometheus, which causes the apiserver to to do a considerable amount of work, and spin up past 90% cpu.

(local prometheus)                                         (apiserver)
ip-172-100-182-149.ec2.internal:35632 => ip-172-100-155-94.ec2.internal:https  3.18Kb  3.61Kb  3.25Kb <=  310Kb   235Kb   196Kb

Environment

Kubernetes 1.4.6 on debian jessie (kops 1.4)
  • System information:
Linux 4.4.26-k8s x86_64
  • Prometheus version:
prometheus, version 1.3.0 (branch: master, revision: 18254a172b1e981ed593442b2259bd63617d6aca)
  build user:       root@d363f050a0e0
  build date:       20161101-17:06:27
  go version:       go1.7.3
  • Alertmanager version:

  • Prometheus configuration file:

  prometheus.yml: |
    rule_files:
        - '/etc/prometheus/alert.rules'

    global:
      scrape_interval: 10s
      scrape_timeout: 10s
      evaluation_interval: 10s

    scrape_configs:

      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L37
      - job_name: 'kubernetes-nodes'
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:10255'
            target_label: __address__

      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L79
      - job_name: 'kubernetes-endpoints'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: (.+)(?::\d+);(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_name

      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L119
      - job_name: 'kubernetes-services'
        metrics_path: /probe
        params:
          module: [http_2xx]
        kubernetes_sd_configs:
          - role: service
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
            action: keep
            regex: true
          - source_labels: [__address__]
            target_label: __param_target
          - target_label: __address__
            replacement: blackbox
          - source_labels: [__param_target]
            target_label: instance
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            target_label: kubernetes_name

      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L156
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: (.+):(?:\d+);(\d+)
            replacement: ${1}:${2}
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name

  • Alertmanager configuration file:
insert configuration here (if relevant to the issue)
  • Logs:
time="2016-12-14T14:37:19Z" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:549" 
time="2016-12-14T14:37:26Z" level=info msg="Done checkpointing in-memory metrics and chunks in 6.279866183s." source="persistence.go:573" 
time="2016-12-14T14:37:49Z" level=info msg="Completed maintenance sweep through 2 archived fingerprints in 20.000336802s." source="storage.go:1223" 
time="2016-12-14T14:38:19Z" level=info msg="Completed maintenance sweep through 2 archived fingerprints in 20.000350998s." source="storage.go:1223" 
time="2016-12-14T14:38:49Z" level=info msg="Completed maintenance sweep through 2 archived fingerprints in 20.000438983s." source="storage.go:1223" 
time="2016-12-14T14:39:19Z" level=info msg="Completed maintenance sweep through 2 archived fingerprints in 20.000326916s." source="storage.go:1223" 
time="2016-12-14T14:39:50Z" level=info msg="Completed maintenance sweep through 2 archived fingerprints in 20.000609466s." source="storage.go:1223" 
time="2016-12-14T14:40:20Z" level=info msg="Completed maintenance sweep through 2 archived fingerprints in 20.275844038s." source="storage.go:1223" 
time="2016-12-14T14:40:50Z" level=info msg="Completed maintenance sweep through 2 archived fingerprints in 20.000485026s." source="storage.go:1223" 
time="2016-12-14T14:41:21Z" level=info msg="Completed maintenance sweep through 2 archived fingerprints in 20.038851221s." source="storage.go:1223" 
time="2016-12-14T14:41:54Z" level=info msg="Completed maintenance sweep through 2 archived fingerprints in 20.000343502s." source="storage.go:1223" 

@rrichardson

This comment has been minimized.

Copy link
Author

rrichardson commented Dec 14, 2016

I'm not sure if this is normal/expected. One area where my config deviates from the example k8s config is that it doesn't have a section where it scrapes the apiserver. Should it?

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Dec 14, 2016

Even if you are not scraping the apiserver, a reason why Prometheus would do requests is for Kubernetes SD purposes. However, requests for that reason would only occur when initiating the discovery and then every resyncPeriod which is every 10 minutes.

To figure out whether is comes from the discovery part or the actual scraping, can you tell us if you are scraping the apiserver, as in, what targets do you see in the web UI /targets @rrichardson ? But I am assuming you are not scraping it, as you would have either created a section in your config for it or would have created a custom Service in your cluster for discovery reasons.

This is pretty hard to debug for us but an excellent example that we should get started with instrumenting Prometheus with OpenTracing API (discussed here).

@rrichardson

This comment has been minimized.

Copy link
Author

rrichardson commented Dec 15, 2016

Thanks for your response. I have verified in targets that the node is being scraped, but not the apiserver itself. The request burst from prometheus to apiserver is periodic, but it is more like every 10 seconds than every 10 minutes.

OpenTracing is next on my list to install/configure into my cluster, I'll work on that now. :)

I'll do some more digging (if it were not an https endpoint I'd tcpdump it)

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Aug 1, 2017

This issue is quite old and we never found a reason why such repeated requests could be happening. There have also been no other reports and we've gone through several updates of the client library, potentially fixing this as a side effect.

Closing until it gets observed again.

@fabxc fabxc closed this Aug 1, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.