Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus crash when remote_storage_adapter comes up after failure (panic: runtime error: makeslice: len out of range) #2969
Comments
This comment has been minimized.
This comment has been minimized.
|
Thanks for reporting! I'll take a look. |
This was referenced Jul 19, 2017
tomwilkie
added
component/remote storage
kind/bug
priority/P1
labels
Jul 25, 2017
fabxc
closed this
in
#2973
Jul 28, 2017
tomwilkie
referenced this issue
Sep 14, 2017
Merged
Prevent number of remote write shards from going negative. #3170
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
lock
bot
locked and limited conversation to collaborators
Mar 23, 2019
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
bkupidura commentedJul 19, 2017
•
edited
What did you do?
We are using remote_storage_adapter implementation (https://github.com/prometheus/prometheus/tree/master/documentation/examples/remote_storage/remote_storage_adapter) to push metrics to InfluxDB.
Sometimes when remote_storage_adapter is back after failure (
docker service scale remote_storage_adapter=0 && sleep 300 && docker service scale remote_storage_adapter=1) prometheus server crash.Logs
time="2017-07-18T15:14:20Z" level=warning msg="Error sending 100 samples to remote storage: Post http://remote_storage_adapter:9201/write: dial tcp: lookup remote_storage_adapter on 127.0.0.11:53: no such host" source="queue_manager.go:500"
time="2017-07-18T15:14:26Z" level=info msg="Remote storage resharding from 118 to 47 shards." source="queue_manager.go:351"
time="2017-07-18T15:14:36Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:14:46Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:14:56Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:15:06Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:15:16Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:15:26Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:15:36Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:15:46Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:15:46Z" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:633"
time="2017-07-18T15:15:48Z" level=info msg="Done checkpointing in-memory metrics and chunks in 1.235390018s." source="persistence.go:665"
time="2017-07-18T15:15:56Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:16:06Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:16:16Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:16:26Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:16:36Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:16:46Z" level=info msg="Remote storage resharding from 47 to 2 shards." source="queue_manager.go:351"
time="2017-07-18T15:16:56Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:17:06Z" level=info msg="Currently resharding, skipping." source="queue_manager.go:354"
time="2017-07-18T15:17:16Z" level=info msg="Remote storage resharding from 2 to -1 shards." source="queue_manager.go:351"
panic: runtime error: makeslice: len out of range
goroutine 193 [running]:
github.com/prometheus/prometheus/storage/remote.(*QueueManager).newShards(0xc42028a700, 0xffffffffffffffff, 0x1)
/go/src/github.com/prometheus/prometheus/storage/remote/queue_manager.go:396 +0x40
github.com/prometheus/prometheus/storage/remote.(*QueueManager).reshard(0xc42028a700, 0xffffffffffffffff)
/go/src/github.com/prometheus/prometheus/storage/remote/queue_manager.go:375 +0xcf
github.com/prometheus/prometheus/storage/remote.(*QueueManager).reshardLoop(0xc42028a700)
/go/src/github.com/prometheus/prometheus/storage/remote/queue_manager.go:364 +0x105
created by github.com/prometheus/prometheus/storage/remote.(*QueueManager).Start
/go/src/github.com/prometheus/prometheus/storage/remote/queue_manager.go:265 +0x85
Environment
System: Ubuntu 16.04
Prometheus is running on top of docker-swarm.
Prometheus cmd line:
/opt/prometheus/prometheus -config.file /srv/prometheus/prometheus.yml -web.listen-address 0.0.0.0:9090 -storage.local.engine persisted -storage.local.retention 360h -storage.local.target-heap-size 3221225472 -storage.local.num-fingerprint-mutexes 4096 -storage.local.path /data/data/1/
Prometheus version:
prometheus, version 1.6.3 (branch: master, revision: c580b60)
build user: root@a6410e65f5c7
build date: 20170522-09:15:06
go version: go1.8.1
Prometheus config