Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus not supporting option to disable crash recovery #3634

Closed
vaibhavpatil123 opened this Issue Dec 29, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@vaibhavpatil123
Copy link

vaibhavpatil123 commented Dec 29, 2017

I have a prometheus server deployed on openshift. We are monitoring different pods, kublets etc. Ingestion rate is ~10K/second.
Polling/scraping interval: 10s
Memory assigned: 8GiB
CPU: 6 cores
Disk type: NFS/HDD
storage.local.target-heap-size: 5.7GiB
retention period: 1h
With above config, prometheus frequently runs into rushed mode(with urgency score 1) and resulting less scraping of data. If we are trying to stop it(using SIGTERM) it takes too much time(in openshift grace period of shutdown is 15m), and still does not gets shutdown gracefully. This causes crash recovery of data which again causes delay in start for 15-20m.

Can you advice, whether i can disable with crash recovery of data, so that i can work with some loss of data?

P.S. I am selectively sending metrics to influc DB using prometheus-sop.

Details of env:

deployed on Openshift 3.5

Prometheus version:v 1.8.2

Prometheus config.yml:

global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:

  • job_name: kubernetes-nodes
    scrape_interval: 1m
    scrape_timeout: 10s
    metrics_path: /metrics
    scheme: https
    kubernetes_sd_configs:
    • api_server: null
      role: node
      namespaces:
      names: []
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: false
      relabel_configs:
    • separator: ;
      regex: _meta_kubernetes_node_label(.+)
      replacement: $1
      action: labelmap
    • separator: ;
      regex: (.*)
      target_label: address
      replacement: kubernetes.default.svc:443
      action: replace
    • source_labels: [__meta_kubernetes_node_name]
      separator: ;
      regex: (.+)
      target_label: metrics_path
      replacement: /api/v1/nodes/${1}/proxy/metrics
      action: replace
  • job_name: kubernetes-pods
    scrape_interval: 10s
    scrape_timeout: 10s
    metrics_path: /metrics
    scheme: http
    kubernetes_sd_configs:
    • api_server: null
      role: pod
      namespaces:
      names:
      • XYZ
      • ABC
      • DEF
        relabel_configs:
    • source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      separator: ;
      regex: "true"
      replacement: $1
      action: keep
    • source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
      separator: ;
      regex: (.+)
      target_label: metrics_path
      replacement: $1
      action: replace
    • source_labels: [address, __meta_kubernetes_pod_annotation_prometheus_io_port]
      separator: ;
      regex: ([^:]+)(?::\d+)?;(\d+)
      target_label: address
      replacement: $1:$2
      action: replace
    • separator: ;
      regex: _meta_kubernetes_pod_label(.+)
      replacement: $1
      action: labelmap
    • source_labels: [__meta_kubernetes_namespace]
      separator: ;
      regex: (.*)
      target_label: kubernetes_namespace
      replacement: $1
      action: replace
    • source_labels: [__meta_kubernetes_pod_name]
      separator: ;
      regex: (.*)
      target_label: kubernetes_pod_name
      replacement: $1
      action: replace
      remote_write:
  • url: http://prometheus-sop.ft-aramse:9096
    remote_timeout: 30s
    queue_config:
    capacity: 100000
    max_shards: 1000
    max_samples_per_send: 100
    batch_send_deadline: 5s
    max_retries: 10
    min_backoff: 30ms
    max_backoff: 100ms
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 29, 2017

It makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.