Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus 2.1 abnormally shutdown with SIGBUS #3781
Comments
This comment has been minimized.
This comment has been minimized.
|
Were you running anything that could alter the files in the data directory? |
This comment has been minimized.
This comment has been minimized.
|
Nope. Another range query |
This comment has been minimized.
This comment has been minimized.
|
Okay, so it's not mmap reading out of bounds. This is likely a hardware fault then. |
This comment has been minimized.
This comment has been minimized.
dfredell
commented
Feb 7, 2018
|
I have been getting the same kind of errors (I think). I'm running on a similar platform. Environment Prometheus 2.1 was running in a Joyent private cloud docker container on SmartOS. Prometheus is running via Containerpilot. Consul-template updating the config and reloading via SIGHUP. Docker File
System information: bash-4.4# uname -a Prometheus version: prometheus, version 2.1.0 (branch: HEAD, revision: 85f23d8) Log
If you want the full log I can get that too. I was not poking the Prometheus or docker at the time of its failures. I have 6 separate instances of prometheus running in an identical way and only one is dying frequently like this. Last night when no human is working, prometheus kept getting SIGBUS as it tried to start. This went on for like an hour (about 20 times) then it was finally able to start. Prometheus had been running for 30hr, I had SIGBUS errors two days ago and tried giving docker more memory evidently didn't help. |
This comment has been minimized.
This comment has been minimized.
|
Have you tried running that Prometheus on a different machine? This is smelling like a Joyent issue, given both of you are using that platform. |
This comment has been minimized.
This comment has been minimized.
dfredell
commented
Feb 8, 2018
|
I haven't tried a different machine, and don't think I really can. My scrape targets are in a private triton network. |
This comment has been minimized.
This comment has been minimized.
dfredell
commented
Feb 12, 2018
•
|
I tried downgrading to prometheus 2.0.0 and that didn't totally help. One stack is happy but a different one is now getting SIGBUS errors every few hours. I also noticed that prometheus was taking 1072% of the memory
The Go Lang issue golang/go#21586 talks about I made a prometheus build on go1.10rc2 to see if that fixes my issue. |
This comment has been minimized.
This comment has been minimized.
dfredell
commented
Feb 13, 2018
|
I did try starting up the docker on a prometheus build with go1.10rc2 and it seams to have the same issue. It started and prometheus would just sit there. The webpage wasn't loading and the process was just taking up more and more memory. As if prometheus was loading all the old data metrics into memory. I had 6G in the /data directory. When I deleted the directory prometheus booted right up, no hesitation.
|
simonpasquier
added
the
kind/more-info-needed
label
Aug 7, 2018
This comment has been minimized.
This comment has been minimized.
|
Do you still the issue if you use the latest Prometheus version? |
This comment has been minimized.
This comment has been minimized.
dfredell
commented
Aug 7, 2018
|
@simonpasquier |
This comment has been minimized.
This comment has been minimized.
|
@dfredell thanks! |
simonpasquier
closed this
Aug 8, 2018
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
roengram commentedFeb 1, 2018
What did you do?
Federation test on Prometheus 2.1
What did you expect to see?
Prometheus running stably
What did you see instead? Under which circumstances?
Prometheus shutdown with SIGBUS
Environment
Prometheus 2.1 was running on a Joyent container (64GB RAM, 800GB SSD)
System information:
Linux 3.13.0 x86_64
Prometheus version:
prometheus, version 2.1.0 (branch: HEAD, revision: 85f23d8)
build user: root@6e784304d3ff
build date: 20180119-12:01:23
go version: go1.9.2
Alertmanager version:
N/A
Prometheus configuration file:
N/A
Full log: https://drive.google.com/open?id=1NMifcYiNZXz8aQlkLRs6K3TOr3lv7APw
Detail
Prometheus was scraping 1000 metrics from 1000 targets with 40 sec interval. When I invoked federation endpoint with
match[]={label1=\"v1\"}continuously, scrape started to fail, and then 1~2 min later Prometheus died with the above log.