Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape delay when using remote write to influxdb with huge samples appended #4215

Open
jojohappy opened this Issue Jun 5, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@jojohappy
Copy link
Contributor

jojohappy commented Jun 5, 2018

Proposal

Use case. Why is this important?

Bug Report

What did you do?
I want to export data to the influxdb for long-term storage, so I use the remote_write in prometheus.

What did you expect to see?
image
image

What did you see instead? Under which circumstances?
image
image

Environment

  • System information:
/prometheus $ uname -srm
Linux 4.4.0-104-generic x86_64
  • Prometheus version:
/prometheus $ prometheus --version
prometheus, version 2.2.1 (branch: HEAD, revision: bc6058c81272a8d938c05e75607371284236aadc)
  build user:       root@149e5b3f0829
  build date:       20180314-14:15:45
  go version:       go1.10
  • Alertmanager version:

    insert output of alertmanager --version here (if relevant to the issue)

  • Prometheus configuration file:

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  scrape_timeout: 10s
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  external_labels:
      monitor: 'kube-remote'

rule_files:
  - "/etc/prom/rules.d/kubernetes.rules"

scrape_configs:
- job_name: 'prometheus'
  static_configs:
    - targets: ['localhost:9090']

- job_name: 'kubernetes-apiservers'

  kubernetes_sd_configs:
  - role: endpoints
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
    action: keep
    regex: default;kubernetes;https

- job_name: 'kubernetes-nodes'
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - role: node

  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - target_label: __address__
    replacement: kubernetes.default.svc:443
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics

- job_name: 'kubernetes-cadvisor'
  scheme: https

  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - role: node

  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - target_label: __address__
    replacement: kubernetes.default.svc:443
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

- job_name: 'kubernetes-service-endpoints'

  kubernetes_sd_configs:
  - role: endpoints

  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name
  - action: drop
    source_labels: ["__meta_kubernetes_service_label_k8s_app"]
    regex: "node-exporter"
  - action: drop
    source_labels: ["__meta_kubernetes_service_label_k8s_app"]
    regex: "^kube-state-metrics.+"

- job_name: 'kubernetes-pods'

  kubernetes_sd_configs:
  - role: pod

  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name
  - action: drop
    source_labels: ["__meta_kubernetes_pod_label_app"]
    regex: "frostmourne"
  - action: drop
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_namespace]
    regex: data

- job_name: 'node-exporter'

  scheme: http

  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
        - devops

  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name
  - source_labels: [__meta_kubernetes_pod_node_name]
    action: replace
    target_label: node_name
  - source_labels: [__meta_kubernetes_pod_host_ip]
    action: replace
    target_label: instance
  - action: "keep"
    source_labels: ["__meta_kubernetes_service_label_k8s_app"]
    regex: "node-exporter"
  metric_relabel_configs:
  - source_labels: [node_name]
    action: replace
    target_label: node

- job_name: 'kube-state-metrics'

  scheme: http

  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
        - kube-system

  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name
  - action: "keep"
    source_labels: ["__meta_kubernetes_service_label_k8s_app"]
    regex: "^kube-state-metrics.+"
  metric_relabel_configs:
  - source_labels: [pod]
    action: replace
    target_label: pod_name
  - source_labels: [container]
    action: replace
    target_label: container_name
- job_name: 'frostmourne'
  scrape_interval: 1s
  kubernetes_sd_configs:
  - role: pod
    namespaces:
      names:
        - devops

  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
      - action: "keep"
        source_labels: ["__meta_kubernetes_pod_label_app"]
        regex: "frostmourne"
    - job_name: 'pit_kafka'
      consul_sd_configs:
      - server: 'consul.int.xiaohongshu.com:8500'
        services:
        - pit_kafka
      relabel_configs:
      - source_labels: [__meta_consul_node]
        action: replace
        target_label: node_name
      - source_labels: [__meta_consul_service]
        action: replace
        target_label: pit_kafka_service
    remote_write:
    - url: http://<ip>:9000/api/v1/prom/write?db=prometheus
      remote_timeout: 30s
      queue_config:
        capacity: 300000
        max_shards: 30
        max_samples_per_send: 3000
        batch_send_deadline: 5s
        max_retries: 10
        min_backoff: 100ms
        max_backoff: 1000ms

  • Alertmanager configuration file:
insert configuration here (if relevant to the issue)
  • Logs:
insert Prometheus and Alertmanager logs relevant to the issue here
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 13, 2018

Could you explain what the graphs you're sharing are, and in particular what metrics back them?

@jojohappy

This comment has been minimized.

Copy link
Contributor Author

jojohappy commented Jun 14, 2018

The first graph Scrape delay (counts with 1m scrape interval) represents the gap between two scrape process using expression prometheus_target_interval_length_seconds{instance="$instance",quantile="0.99"} - $scrape_interval, in my case the scrape_interval is 15s

The second graph Samples Appended per second represents the speed rate of head sample append to tsdb using expression sum(rate(prometheus_tsdb_head_samples_appended_total{instance="$instance"}[1m])) by (instance)

All the metrics are provided by Prometheus self.

When I use remote_write in Prometheus, the interval of scrape will be increase and the speed rate of head sample append will be down. Without remote_write, it seems to be ok.

@jojohappy

This comment has been minimized.

Copy link
Contributor Author

jojohappy commented Jun 14, 2018

I have submit a PR to improve this problem, but I also should do more and more testing to verify it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.