Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upprometheus not supporting option to disable crash recovery #3634
Comments
This comment has been minimized.
This comment has been minimized.
|
It makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided. |
brian-brazil
closed this
Dec 29, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
lock
bot
locked and limited conversation to collaborators
Mar 23, 2019
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
vaibhavpatil123 commentedDec 29, 2017
I have a prometheus server deployed on openshift. We are monitoring different pods, kublets etc. Ingestion rate is ~10K/second.
Polling/scraping interval: 10s
Memory assigned: 8GiB
CPU: 6 cores
Disk type: NFS/HDD
storage.local.target-heap-size: 5.7GiB
retention period: 1h
With above config, prometheus frequently runs into rushed mode(with urgency score 1) and resulting less scraping of data. If we are trying to stop it(using SIGTERM) it takes too much time(in openshift grace period of shutdown is 15m), and still does not gets shutdown gracefully. This causes crash recovery of data which again causes delay in start for 15-20m.
Can you advice, whether i can disable with crash recovery of data, so that i can work with some loss of data?
P.S. I am selectively sending metrics to influc DB using prometheus-sop.
Details of env:
deployed on Openshift 3.5
Prometheus version:v 1.8.2
Prometheus config.yml:
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
role: node
namespaces:
names: []
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: false
relabel_configs:
regex: _meta_kubernetes_node_label(.+)
replacement: $1
action: labelmap
regex: (.*)
target_label: address
replacement: kubernetes.default.svc:443
action: replace
separator: ;
regex: (.+)
target_label: metrics_path
replacement: /api/v1/nodes/${1}/proxy/metrics
action: replace
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
role: pod
namespaces:
names:
relabel_configs:
separator: ;
regex: "true"
replacement: $1
action: keep
separator: ;
regex: (.+)
target_label: metrics_path
replacement: $1
action: replace
separator: ;
regex: ([^:]+)(?::\d+)?;(\d+)
target_label: address
replacement: $1:$2
action: replace
regex: _meta_kubernetes_pod_label(.+)
replacement: $1
action: labelmap
separator: ;
regex: (.*)
target_label: kubernetes_namespace
replacement: $1
action: replace
separator: ;
regex: (.*)
target_label: kubernetes_pod_name
replacement: $1
action: replace
remote_write:
remote_timeout: 30s
queue_config:
capacity: 100000
max_shards: 1000
max_samples_per_send: 100
batch_send_deadline: 5s
max_retries: 10
min_backoff: 30ms
max_backoff: 100ms