Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus 2.0: remote storage being full caused prometheus behavior-ed incorrectly #3796
Comments
juliusv
added
the
component/remote storage
label
Feb 5, 2018
This comment has been minimized.
This comment has been minimized.
|
Hi could you update to the latest prometheus and see if that fixes the issue? |
This comment has been minimized.
This comment has been minimized.
|
This should be fixed by the new WAL based remote_write code in 2.8. |
tomwilkie
closed this
Mar 4, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
tangyong commentedFeb 5, 2018
•
edited
What did you do?
We are doing prometheus monitoring using grafana, consul and remote storage. While cratedb disk has been full and after we cleaned cratedb disk, we found that prometheus behavior-ed incorrectly.
grafana <- prometheus-> cratedb adapter -> cratedb , and prometheus uses consul service discovery.
What did you expect to see?
We think that , while cratedb disk has been full and after we cleaned cratedb disk,
What did you see instead? Under which circumstances?
Instead, we saw the following,
... msg="Error refreshing service" ...
Environment
System information:
Prometheus version:
Version: 2.0.0
Revision: 0a74f98
Branch: HEAD
BuildUser: root@615b82cb36b6
BuildDate: 20171108-07:11:59
GoVersion: go1.9.2
version: 0.12.0
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 15s
external_labels:
prometheusName: NJXZ-P1
alerting:
alertmanagers:
scheme: http
timeout: 10s
rule_files:
scrape_configs:
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
consul_sd_configs:
tag_separator: ','
scheme: http
services:
relabel_configs:
separator: ;
regex: (.*)
target_label: job
replacement: $1
action: replace
separator: ;
regex: (.*)
target_label: instance
replacement: $1
action: replace
separator: ;
regex: ',(.),(.),(.),(.),(.*),'
target_label: appId
replacement: $1
action: replace
separator: ;
regex: ',(.),(.),(.),(.),(.*),'
target_label: ldc
replacement: $2
action: replace
separator: ;
regex: ',(.),(.),(.),(.),(.*),'
target_label: env
replacement: $3
action: replace
separator: ;
regex: ',(.),(.),(.),(.),(.*),'
target_label: ip
replacement: $4
action: replace
separator: ;
regex: ',(.),(.),(.),(.),(.*),'
target_label: software
replacement: $5
action: replace
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
labels:
appId: CTDSA
env: SIT
ip: xx.yy.zz.kk
ldc: NJXZ
software: JBOSSSERVER
labels:
appId: CTDSA
env: SIT
ip: xx.yy.zz.kk
ldc: NJXZ
software: JBOSSSERVER
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
labels:
appId: CTDSA
env: SIT
ip: xx.yy.zz.kk
ldc: NJXZ
software: pushgateway
labels:
appId: CTDSA
env: SIT
ip: xx.yy.zz.kk
ldc: NJXZ
software: pushgateway
I felt alertmanager is not relevant to the issue, so please allow me ignore it.
level=info ts=2018-02-05T02:09:14.673576977Z caller=queue_manager.go:338 component=remote msg="Currently resharding, skipping."
level=warn ts=2018-02-05T02:09:15.828729838Z caller=queue_manager.go:225 component=remote msg="Remote storage queue full, discarding sample. Multiple subsequent messages of this kind may be suppressed."
level=info ts=2018-02-05T02:09:22.132293894Z caller=queue_manager.go:253 component=remote msg="Stopping remote storage..."
level=error ts=2018-02-05T02:09:35.918461623Z caller=consul.go:283 component="target manager" discovery=consul msg="Error refreshing service" service=promether-exporter err="Get http://10.27.136.227:9996/v1/catalog/service/promether-exporter?index=116543&wait=30000ms: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"