Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upclean_tombstones not releasing the extra disk space used after it fails #4200
Comments
brian-brazil
added
the
component/local storage
label
May 28, 2018
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@albertvaka I will try to replicate and find the culprit. |
This comment has been minimized.
This comment has been minimized.
|
@albertvaka when deleting series the tsdb backend first reads a given block in memory removes the series and writes the block to disk again. At this point it needs some space to write the new block before deleting the old one. As a workaround you can use the tsdb scan tool I implemented recently which will scan and remove the overlapping blocks by leaving the biggest block which should include all your existing time series before triggering the deletion. I have attached the tool so you can try it. Still WIP so use at your own risk tsdb scan /path/to/data |
This comment has been minimized.
This comment has been minimized.
|
Thanks! However, I think we should make sure that everything is cleaned up after an error, so this doesn't happen again (seems it can be a pretty common case). |
This comment has been minimized.
This comment has been minimized.
|
yes I am looking into this right now. |
This comment has been minimized.
This comment has been minimized.
|
From what I understand, during The deletion of old blocks happens here https://github.com/prometheus/tsdb/blob/master/db.go#L865, Cleaning up the new blocks before this line https://github.com/prometheus/tsdb/blob/master/db.go#L853 should fix it. (Someone has to verify, I may be wrong) |
This comment has been minimized.
This comment has been minimized.
|
Just checked again and I think this needs to be handled in the |
This comment has been minimized.
This comment has been minimized.
|
the code suggest that on error the compaction should clean after itself. so this partial write shouldn't really happen. @albertvaka any chance to send all folders that include any timestamps in the meta.json between 1525068000000 , 1525262400000 so I can check these? |
This comment has been minimized.
This comment has been minimized.
|
We ended up deleting the data altogether, so I can't check your scan tool nor provide the corrupt files, sorry :( |
This comment has been minimized.
This comment has been minimized.
|
@krasi-georgiev Consider that currently there are So the overlap problem might be @gouthamve any comments? PS: maybe we can take this discussion to |
This comment has been minimized.
This comment has been minimized.
|
@albertvaka no worries after a short chat on IRC with @codesome he found the issue and a PR will follow soon. |
codesome
referenced this issue
May 31, 2018
Merged
Cleanup new blocks on 'CleanTombstones' faliure #341
This comment has been minimized.
This comment has been minimized.
|
happy to say that the fix is now merged and will be included in the next release |
krasi-georgiev
closed this
Jun 6, 2018
krasi-georgiev
referenced this issue
Aug 24, 2018
Closed
After unsuccessful Tombstones deletion Prometheus won't start anymore #3782
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
albertvaka commentedMay 28, 2018
•
edited
Bug Report
Apparently
clean_tombstonesallocates additional space during the cleanup, and I guess it doesn't actually reduce the total space used until it finishes. The problem comes when the cleaning fails.I did a first call to
/api/v2/admin/tsdb/clean_tombstonesbecause the disk was filling up, and since it took some extra space it failed with the following error:The extra space used, however, was not freed afterwards. Now my disk was 100% full, so I resized it and gave it 2x the size. Then I ran
clean_tombstonesagain. This time, it failed again with a different (and more worrying) error:I don't know how to recover from this one, and also now I'm with about 3x the disk usage I had when I started because of
clean_tombstonesnot releasing the extra space used when it fails.What can I do?