New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exceeding max size of 4GiB #3190
Comments
That looks like some insane load, can you share more numbers as to how many hosts, services you are monitoring? If possible can you also provide the number of series, which can be found from the following metrics: AFAICS it can be 2 things: you have |
i have delete old data, when restart i got an error "Opening storage failed: head truncate failed: truncating at 1505788200000 not aligned" I monitoring only 10 hosts but these hosts have total about 250000 series. I have start new prometheus process with clear date path, after one hour there is no new error log, I will monitor my promtheus host and report new errors i find. thanks a lot |
Hmm, 700G in 12hr for 250K series is very high. Are the series constantly changing or are they mostly static? |
constantly changing. |
Oh damn, if I understand it right:
This is a prometheus anti-pattern and you should look into scraping the hosts instead of pushing data. And I am not sure one Prometheus server can handle the load of 40K servers. Can you get values of: |
about 3-4 hours |
upgrade to rc.0, still got the error "exceeding max size of 4GiB". |
Did you close it again unintentionally. Could you share your logs? |
I closed the issue。
Got "exceeding max size of 4GiB" until my 1.4T disk full. What cause my index larger than 4GiB? And what should I do to fix this? |
A high amount of series can actually cause this. We just did not anticipate that happening in practical setups to soon. If the issue reappears we might just extend our code to allow for larger file sizes. We should also be more graceful about handling failed compactions by cleaning up the |
hope for the new update. |
I also encountered the same problem, but the data I monitored were not so much. Last 30min: Storage space will be taken up soon(2TB) The default parameter values are basically used at the moment. "storage.tsdb.min-block-duration" does the time of this parameter need to be set shorter? Prometheus Logs:
|
I compare my two Prometheus servers, although the Prom-A gets less metrics per second, but the dynamic time series is much more.
Once the index exceeds the 4GB, the meta.json file is not generated. The number of dynamically increased time series causes the size of the index file. |
Index files will grow to 2GB in 3 minutes |
Same here on 2.0.0:
du data
Runtime parameters
|
Dropping min/max block duration to 30m/1d respectively seems to have helped for now |
I've tried it too. It's very effective. THX. @deejay1 |
I am also facing this issue on my federated cluster Logs from the server:
Tons of directories with .tmp 59G
Inside 01C3CFE5NFXQ2QXQPSWD45BZKA.tmp
lots of files with 512M inside chunks
Is this issue different from #3487 ? |
Is it safe to delete those .tmp directory? What is the impact of removing *.tmp directory |
I had a quick look recently about these and it seems that are created when running the compaction. unless Prometheus has crashed these shouldn't exist after the compaction is complete. |
Closed in #3705 |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
What did you do?
pull metric from my metric transfer
the transfer if translate over old monitor metrics to prometheus type
we have 250000metrics/minutes
after 12 hour running, the date dir is become about 700GB
what shold I do to solve this fail ?
What did you expect to see?
success compact
What did you see instead? Under which circumstances?
get a lot of compaction failed error
Environment
Linux 2.6.32-696.3.2.el6.v6.2.x86_64 x86_64
prometheus, version 2.0.0-beta.4 (branch: HEAD, revision: 1b80f63)
build user: root@c5b3f10ffc51
build date: 20170914-11:22:50
go version: go1.8.3
The text was updated successfully, but these errors were encountered: