Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upBug: inaccurate low disk space warning during inode exhaustion, newly autodiscovered hosts not collected #2864
Comments
This comment has been minimized.
This comment has been minimized.
|
It would also be useful if you could provide the logs also.
Also, check the number of inodes that are free. Sometimes large
installations run out of inodes.
…On Tue 20 Jun, 2017, 7:03 PM e271828-, ***@***.***> wrote:
After noticing that newly added hosts weren't reflected in the stats, we
saw Prometheus was nearly pegging two CPUs and throwing many error messages
in the logs around low disk space and writes failing. Around 875MB was
still free according to df.
Handling of low disk space conditions should be better (why were newly
autodiscovered hosts not being reported?) and there seems to be a bug in
the TSDB writer. Unfortunately we don't have full forensics, but if it
recurs we'll snapshot the host.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2864>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHA3H9O-a-0Bm_e7ZSv3sLeznTR-2HSFks5sF8oagaJpZM4N_lwJ>
.
|
This comment has been minimized.
This comment has been minimized.
|
@gouthamve inode exhaustion is possible. We'll check for that if it happens again. I see a 12 hour old prometheus install monitoring 2,000 hosts is using 194,455 files. That seems excessive. |
e271828-
changed the title
Bug(s): false low disk space warning, newly autodiscovered hosts not being collected with space low
Bug(s): inaccurate low disk space warning (maybe inode exhaustion?); newly autodiscovered hosts not collected when space low
Jun 20, 2017
e271828-
changed the title
Bug(s): inaccurate low disk space warning (maybe inode exhaustion?); newly autodiscovered hosts not collected when space low
Bug: inaccurate low disk space warning during inode exhaustion, newly autodiscovered hosts not collected
Jun 20, 2017
This comment has been minimized.
This comment has been minimized.
|
We've recreated the scenario and confirmed it was inode exhaustion. At minimum the log messages should reflect that. Old data should also just be purged in the disk-full state rather than dropping new host data on the ground. |
This comment has been minimized.
This comment has been minimized.
|
The error thrown in the case of inode exhaustion by the OS is the same as that of disk full. This is caused due to the structure of the storage. We create a file per timeseries (we will have as many files as the number of ts) and then append new data to that file. We cannot drop old data and add new data as both belong to the same file and we ran out of inodes. You could fix this by formatting the disk to have more inodes. This issue has been fixed in |
This comment has been minimized.
This comment has been minimized.
|
Sounds like there's not much more we can do here. |
brian-brazil
closed this
Jul 6, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
e271828- commentedJun 20, 2017
•
edited
After noticing that newly added hosts weren't reflected in the stats, we saw Prometheus was nearly pegging two CPUs (normal is 10-60%) and throwing many error messages in the logs around low disk space and writes failing. Around 875MB was still free according to
df.Handling of low disk space conditions should be better (why were newly autodiscovered hosts not being reported?) and there seems to be a bug in the TSDB writer. Unfortunately we don't have full forensics, but if it recurs we'll snapshot the host.